The Performance Impact of Incomplete Bypassing in Processor Pipelines
Pritpal S. Ahuja, Douglas W. Clark, Anne Rogers
Abstract
Pipelined processors employ hardware bypassing to eliminate certain
pipeline hazards. Bypassing is logically simple but can be costly,
especially in wide issue and deeply pipelined machines. In this paper
bypassing is studied in detail, with an emphasis on designs in which
the bypassing network is not complete. Cycle-level simulations of a
model of integer and floating-point pipelines running some of the
SPEC92 benchmarks show that at least half of the instructions executed
used a bypassed register result from a previous instruction. Missing
bypasses induce interlock stalls. The paper reports measurements of
the performance inpact of a number of pipeline configurations with
incomplete bypassing networks. This impact ranges from a slowdown of
just a few percent for a configuration with one late bypass missing to
a slowdown of almost a factor of two for the integer pipe with no
bypassing at all. Two types of code alterations reduce the new
interlock stalls. A simple code transformation, the interchange of
operands in instructions that perform commutative operations, cuts the
performance loss from interlock stalls in certain configurations
between about 20 and 50 percent. The second transformation is to
re-schedule code within basic blocks to avoid any missing bypasses. In
five individual experiments with a small number of configurations and
two benchmarks, this rescheduling saved 25 to 50 percent of the
interlock stalls. In certain configurations both transformations can be
applied.
Keywords
pipeline, forwarding, bypassing, interlocks, code-scheduling
Talk
Overheads (163796 bytes)