ECS 154B -- Spring 2009 -- Lab #4 Due by midnight June 4th Objectives * Learn more about caches by writing a simulator and studying real traces * Do some research by using the cache simulator to try out some new and different ideas Description In this lab you will write a cache simulator, and then use it to calculate the hit rates for various program traces. Once you have convinced yourself that the simulator works correctly, you will try out some creative new ideas to see how well they work. Details You are to write (in C, or some other ordinary computer language) a simulator that will take as input a series of addresses, and calculate the overall hit rate of the reference stream. The size, associativity, and line size of the cache should be modifiable, so that you can try out different configurations. Once you have got this basic cache simulator working, you are going to modify your program to simulate something new and interesting. You can simulate one of the advanced cache configurations we talked about in class (pseudo-associative cache, way prediction, prefetch buffers, etc.) or you can try something completely new. For example, papers have been published recently that indicate that values that enter the cache are often not read again - streamming data would exhibit this kind of behavior, for instance. Some researchers have looked at putting an incoming piece of data into the cache and marking it as LRU, so that it is the first one evicted on a subsequent miss. The idea here is that if the data is actually read, it will be marked MRU and won't be evicted, and if it is not read right away, it will be kicked out leaving more important data in the cache. However, there are cases when this doesn't work so well - can you think of a way to modify this approach? Another possible idea is to modify the replacement strategy so that writes stay in the cache a little bit longer. Why do this? Well, we know the stack grows and shrinks, and when it shrinks the lines in the cache that represented the top of the stack are going to be marked dirty. They don't really need to be written out to memory, though, since they have been "poppped" off the stack already. Can you think of a way to deal with this problem? One way I have thought about is to have "sticky" bits associated with each cache line - if a cache line is dirty and enters the LRU state, then the sticky bits are checked and if they are set, something else will be evicted instead of that dirty line. So the sticky bits should probably really be a counter, that you decrement whenever the line is in an LRU state and doesn't get evicted ... but how big should the counter be? When should it be changed? Will this approach do any good at all? There are a whole bunch of interesting ideas you can experiment with. I would encourage you to think creatively, and if you want to run your idea by me I would be happy to talk to you about it. Program Traces There are many address traces available, in many different formats. These traces contain the (virtual) address reference stream generated by a processor (usually a MIPS processor). The long traces are stored in a special compressed gzipped format, which requires a tool to bring back to life. These traces capture the entire reference stream of the run of the program, and are in many cases billions of references long. These traces are on the instructional machines in /home/farrens/academic/154B/traces. The long trace runs of the entire SPEC92 integer suite of programs are in /home/farrens/academic/154B/traces/spec92.int, and (most of) the floating point progs are in /home/farrens/academic/154B/traces/spec92.fp. To process any of the traces, you have to know the format of the output. There are 3 main types of output produced and consumed by our various tools. Each of the types use a [ref-type address] format, where the ref_type indicates the type of memory reference and the address field contains the address of the reference. In the "aggie" format, the ref_type is an ascii '1' for an instruction fetch, a '2' for a load, and a '3' for a store. This format is human-readable, but can be substantially slower to process since there are so many more bytes involved. In the "binary" 5-byte [ref_type address] format, the ref_type is a single ascii character and indicates the type of memory reference (0 = load, 1 = store, 2 = instruction fetch), while the address field contains the 4-byte integer address of the fetch. This format is not directly readable by humans. The "dinero" format is essentially identical to the "aggie" format, except the meanings of the ref_types are different. In this format, a '0' indicates a data read, a '1' a data write, and a '2' an instruction fetch. There is a 4th format, called the "pixie" format, which is faster to process, but is probably beyond the scope of this project at this point. (I will be glad to tell you about it in person, if your are having problems with the other formats.) The long spec traces require special handling, because they are stored in a special way. We wrote a compression program which dramatically reduces the size of the trace, and then gzip the output of that program as well. So, to use the long traces, you need to do the following: gunzip < {tracename} | inflate \\[-a|-b|-d|-p\] | {whatever} where {tracename} is the name of the trace you are working with, {whatever} is "more", or your cache analysis program, or whatever, and the 4 flags for the "inflate" program specify the format of the output ("aggie", "binary", "dinero", and "pixie", respectively). The "inflate" program is in ~farrens/academic/154B/traces/compress.tools. Hints Obviously it's better to start with a subset of the traces to debug your program ...