Using Branch Handling Hardware to Support Profile-Driven Optimization
Thomas M. Conte,Burzin A. Patel, J. Stan Cox
conte@ece.scarolina.edu
Abstract
Profile-based optimizations can be used for instruction scheduling, loop
scheduling, data preloading, function in-lining, and instruction cache
performance enhancement. However, these techniques have not been embraced by
software vendors because programs instrumented for profiling run 2-30 times
slower, an awkward compile-run-recompile sequence is required, and a test input
suite must be collected and validated for each program. This paper proposes
using existing branch handling hardware to generate profile information in real
time. Techniques are presented for both one-level and two-level branch
hardware organizations. The approach produces high accuracy with small
slowdown in execution (0.4%-4.6%). This allows a program to be profiled while
it is used, eliminating the need for a test input suite. THis practically
removes the inconvenience of profiling. With contemporary processors driven
increasingly by compiler support, hardware-based profiling is important for
high-performance systems.
Talk
Overheads (0 bytes)