The Performance Impact of Incomplete Bypassing in Processor Pipelines

Pritpal S. Ahuja, Douglas W. Clark, Anne Rogers


Pipelined processors employ hardware bypassing to eliminate certain pipeline hazards. Bypassing is logically simple but can be costly, especially in wide issue and deeply pipelined machines. In this paper bypassing is studied in detail, with an emphasis on designs in which the bypassing network is not complete. Cycle-level simulations of a model of integer and floating-point pipelines running some of the SPEC92 benchmarks show that at least half of the instructions executed used a bypassed register result from a previous instruction. Missing bypasses induce interlock stalls. The paper reports measurements of the performance inpact of a number of pipeline configurations with incomplete bypassing networks. This impact ranges from a slowdown of just a few percent for a configuration with one late bypass missing to a slowdown of almost a factor of two for the integer pipe with no bypassing at all. Two types of code alterations reduce the new interlock stalls. A simple code transformation, the interchange of operands in instructions that perform commutative operations, cuts the performance loss from interlock stalls in certain configurations between about 20 and 50 percent. The second transformation is to re-schedule code within basic blocks to avoid any missing bypasses. In five individual experiments with a small number of configurations and two benchmarks, this rescheduling saved 25 to 50 percent of the interlock stalls. In certain configurations both transformations can be applied.


pipeline, forwarding, bypassing, interlocks, code-scheduling

Talk Overheads (163796 bytes)