ECS201A Homework #2
Due Thursday October 23rd (10/23/08) in class
We talked in class about the importance of using the correct benchmark when
evaluating performance. We discussed the problems with synthetic benchmarks,
and also the importance of using optimized vs non-optimized code. In this
assignment you are going to learn a bunch of stuff, whether you want to or not ... :-)
1a.) Compile the Whetstone benchmark with optimization on, and
figure out the instruction mix for the top 20 most frequently executed
instructions. Repeat this experiment with the optimization off, and then
compare and contrast the two results. You might find this to be a good place
to start:
http://en.wikipedia.org/wiki/Dhrystone
1b.) Repeat problem 1a on one of the SPEC2000 floating point benchmarks.
I would recommend choosing one of the following: 177.mesa, 179.art,
183.equake, or 188.ammp (these are all written in C, unlike the other FP
benchmarks which are written in Fortran.) You will probably find this web page
helpful (although at first it will seem exceedingly incomprehensible):
http://www.spec.org/cpu/docs/execution_without_SPEC_tools.txt
1c.) Now compare and contrast the optimized and unoptimized instruction mixes
for the two programs (Whetstone and your SPECFP benchmark).
2a,b,c) Repeat problem 1 using Dhrystone, and comparing it with one of the SPEC2000
integer benchmarks.
You can do problems 1 and 2 on any load/store machine, and you can either use a
cycle-level simulator or a real machine. The SPEC2000 benchmarks can be found
on the instructional machines in this directory:
~farrens/academic/201A/benchmarks/SPEC2K/cdrom/benchspec/CFP2000
Integer benchmarks are here:
~farrens/academic/201A/benchmarks/SPEC2K/cdrom/benchspec/CINT2000
3) Some of the instruction set architectures in figure B.2 destroy operands in
the course of computation, which can impact performance. For each architecture
in figure B.2, write the code sequence to compute C=A+B followed by D=A-E. In
the code you write, mark each operand that is destroyed during execution and
mark each "overhead" instruction that is included just to overcome this loss of
data. Assume that A, B, C, D, and E reside in memory. Also assume that
instruction opcodes are represented in 8 bits, memory addresses are 64 bits,
and register addresses are 6 bits. What is the total code size, the number of
bytes of instructions and data moved to or from memory, the number of overhead
instructions, and the number of overhead data bytes for each of your code
sequences?
4) Find an instruction set manual for some older machine (many are on the web).
Summarize the instruction set with the discriminating characteristics used in
Figures B.3 and B.4. Write the code sequence from problem 3, using this
instruction set. The size of the data need not be the same as in problem 3 if
the word size is smaller in the older machine.
5) It has been suggested that adding a register-memory addressing mode to a
load-store machine might be useful. In other words, add an instruction so that
sequences that look like this:
LOAD Rx, n(Ry)
ADD Rz,Rz,Rx
would instead look like this:
ADD Rz, n(Ry)
Adding this instruction will cause the clock cycle to increase by 7% (but will
not affect the CPI). Using
the instruction frequency distribution for gcc from figure B.27,
a) What percentage of the loads must be eliminated for the machine with the new
instruction to have at least the same performance as before?
b) Give an example of a multiple instruction sequence where a load of Rw
followed immediately by the use of Rw could not be replaced by a single
instruction of the type proposed.
6) Let's look at adding this new instruction to the classic MIPS 5-stage
pipeline. We do not wish to pay the 7% cycle time penalty, so we are going to
change the instruction set such that all memory addressing will be
restricted to being register indirect (in other words, you will no longer be
able to add a constant to the contents of the register). This would mean that
our previous instruction LOAD Rx, n(Ry) becomes LOAD Rx,Ry. But it does give
us the ability to do ADD Rw,Rz,(Rx) (Rw=Rz+Mem(Rx)).
a) List the (new) order of the five stages of the modified RISC pipeline that
will support register-memory operations implemented exclusively by register
indirect addressing. (In other words, FDEMW becomes ...)
b) Describe what new forwarding paths are needed for the rearranged pipeline by
stating the source, destination, and information transferred on each needed new
path.
c) For the reordered RISC pipeline, what new data
hazards are created by this addressing mode? Give an instruction sequence
illustrating each new hazard.
d)List all ways that the new RISC pipeline can
have a different instruction count for a given program than the original RISC
pipeline. Give a pair of specific instruction sequences, one for the original
pipeline and one for the rearranged pipeline, to illustrate each way.
e) Assume all instructions take 1 clock cycle per
stage. List all ways that
the register-memory RISC can have a different CPI for a given program as
compared to the original RISC pipeline.