Improving CISC Instruction Decoding Performance Using a Fill Unit

Mark Smotherman, Manoj Franklin


Current superscalar processors require substantial instruction fetch and decode bandwidth to keep multiple functional units utilized. A hardware assist, called a fill unit, can collect microoperations into a decoded instruction cache, and future code fetches can bypass the decoding logic. This approach is investigated using the x86 architecture, and a speedup of approximately a factor of two over a P6-like decoding structure is seen for the three SPEC benchmarks investigated. This design is accompanied by a microengine-register allocation scheme that prevents the increased supply of microoperations from placing excessive demands on the register renaming hardware.


CISC, instruction decoding, fill unit, dynamic execution, register renaming

Talk Overheads (140394 bytes)