2 Level Type
2 Level Type
Abstract. High-performance microarchitectures use, among other structures, deep pipelines to help speed up exe-
cution. The importance of a good branch predictor to the effectiveness of a deep pipeline in the presence of condi-
tional branches is well-known. In fact, the literature contains proposals for a number of branch prediction schemes.
Some are static in that they use opcode information and profiling statistics to make predictions. Others are dynamic
in that they use run-time execution history to make predictions.
This paper proposes a new dynamic branch predictor, the Two-Level Adaptive Paining scheme, which alters the
branch prediction algorithm on the basis of information collected at run-time.
Several configurations of the Two-Level Adaptive Training Branch Predictor are introduced, simulated, and
compared to simulations of other known static and dynamic branch prediction schemes. Two-Level Adaptive
Training Branch Prediction achieves 97 percent accuracy on nine of the ten SPEC benchmarks, compared to less
than 93 percent for other schemes. Since a prediction miss requires flushing of the speculative execution already in
progress, the relevant metric is the miss rate. The miss rate is 3 percent for the Two-Level Adaptive Training scheme
vs. 7 percent (best case) for the other schemes. This represents more than a 100 percent improvement in reducing
the number of pipeline hushes required.
The state transition diagrams of the finite-state ma- output of λ is determined before execution for a given
chines used in this study for updating the pattern his- branch history pattern. That is, the same branch predic-
tory in the pattern table entry are shown in Figure 2. tions are made if the same history pattern appears at
The automaton Last-Time stores in the pattern history different times during execution. Two-Level Adaptive
bit only the outcome of the last execution of the branch Training, on the other hand, updates the appropriate
when the history pattern appeared. The next time the pattern history information with the actual result of
same history pattern appears the prediction will be what each branch. As a result, given the same branch history
happened last time. Only one bit is needed to store the pattern, different pattern history information can be
pattern history information. The automaton Al records found in the pattern table; therefore, there can be dif-
the results of the last two times the same history pattern ferent inputs to the prediction decision function for
appeared. Only when there is no taken branch recorded, Two-Level Adaptive Training. Predictions of Two-
the next execution of the branch when the history reg- Level Adaptive Training change adaptively in accor-
ister has the same history pattern will be predicted as dance with the program execution behavior.
not taken; otherwise, the branch will be predicted as Since the pattern history bits change in Two-Level
taken. The automaton A2 is a saturating up-down Adaptive Training, the predictor can adjust to the cur-
counter, which is also used, but differently, in Lee and rent branch execution behavior of the program to make
Smith’s Branch Target Buffer design [13]. The counter proper predictions. With the updates, Two-Level
is incremented when the branch is taken and is decre- Adaptive Training can still be highly accurate over
mented when the branch is not taken. The next execu- many different programs and data sets. Static Training,
tion of the branch will be predicted as taken when the on the contrary, may not predict well if changing data
counter value is greater than or equal to two; otherwise, sets results in different execution behavior.
the branch will be predicted as not taken. Automata
A3and A4 are both similar to A2.
Both Static Training and Two-Level Adaptive 3. Implementation Methods
Training are dynamic branch predictors, because their
predictions are based on run-time information, i.e. the 3.1 Implementations of the Per-address History
dynamic branch history. The major difference between Register Table
these two schemes is that the pattern history informa-
tion in the pattern table changes dynamically in Two- It is not feasible to have a big enough history register
Level Adaptive Training but is preset in Static Training table for each static branch to have its own history reg-
from profiling. In Static Training, the input to the pre- ister in real implementations. Therefore, two ap-
diction decision function, λ, for a given branch history proaches are proposed for implementing the Per-
pattern is determined before execution. Therefore, the address History Register Table.
4.2 SimuIation Model used for training and testing. If Data is not specified,
no training data set is needed for the schemes, as in
Several configurations were simulated for the Two- Two-Level Adaptive Training schemes or Lee and
Level Adaptive Training scheme. For the per-address Smith’s Branch Target Buffer designs. The configura-
history register table (PHRT), two practical implemen- tion and scheme of each simulation model in this study
tations, the associative HRT (AHRT) and the hash are listed in Table 2.
HRT (HHRT), along with the ideal HRT (IHRT) were Since about 60 percent of branches are taken ac-
simulated. In order to distinguish the different schemes, cording to our simulation results, the contents of the
the naming convention for the branch prediction history register usually should contain more 1’s than
schemes is Scheme(History(Size, Entry_Content), Pat- 0’s. Accordingly, all the bits in the history register of
tern(Size, Entry_Content), Data). Scheme specifies the each entry in the HRT are initialized to l’s at the begin-
scheme, for example, Two-Level Adaptive Training ning of program execution. During execution, when an
(AT), Static Training (ST), or Lee and Smith’s Branch entry is re-allocated to a different static branch, the
Target Buffer design (LS). In History(Size, En- history register is not re-initialized.
try_Content), History is the implementation for keeping The pattern history bits in the pattern table entries
history information of branches, for example, IHRT, are also initialized at the beginning of execution.
AHRT, or HHRT. Size specifies the number of entries Since taken branches are more likely, for those pat-
in the implementation, and Entry Content specifies the tern tables using automata, Al, A2, A3, and A4, all
content in each entry. The content of an entry in the entries are initialized to state 3. For Last-Time, all
history register table can be any automaton shown in entries are initialized to state 1 such that the branches
Figure 2 or a history register. In Pattern(Size, En- at the beginning of execution will be more likely to be
try_Content), Pattern is the implementation for keeping predicted taken.
history information for history patterns, Size specifies In addition to the Two-Level Adaptive Training
the number of entries in the implementation, and En- schemes, Lee and Smith’s Static Training schemes and
try_Content specifies the content in each entry. The Branch Target Buffer designs, and some dynamic and
content of an entry in the pattern history table can be static branch prediction schemes were simulated for
any automaton shown in Figure 2. For Lee and Smith’s comparison purposes. Lee and Smith’s Static Training
Branch Target Buffer designs, the Pattern part is not scheme is similar to the Two-Level Adaptive Training
included, because there is no pattern history informa- scheme with an IHRT but with the important difference
tion kept in their designs. Data specifies how the data that the prediction for a given pattern is pre-determined
sets are used. When Data is specified as Same, the by profiling. The two practical approaches for the HRT
same data set is used for both training and testing. were also simulated for Static Training with the same
When Data is specified as Diff, different data sets are accessing method introduced above.
Lee and Smith’s Branch Target Buffer designs were 5.1 Two-Level Adaptive Training
simulated with automata A2, A3, A4, and Last-Time.
The static branch prediction schemes simulated include The Two-Level Adaptive Training schemes were
the Always Taken, Backward Taken and Forward Not simulated with different state transition automata, dif-
taken, and a simple profiling scheme. The profiling ferent HRT implementations, and different history reg-
scheme is done by counting the frequency of taken and ister lengths to show their effects on prediction accu-
not-taken for each static branch in the profiling execu- racy. The simulations of the Two-Level Adaptive
tion. The predicted direction of a branch is the one the Training scheme using an IHRT demonstrate the accu-
branch takes most frequently. Since the same data set racy the scheme can achieve without history table miss
was used for profiling and execution in this study, the effect and is used as a comparison to Lee and Smith’s
prediction accuracy was calculated by taking the ratio of Static Training scheme which also uses the ideal history
the sum of the larger number in the two numbers for two register table.
possible directions of every static branch over the total
number of the dynamic conditional branch instructions.
5.1.1 Effect of State Transition Automata
5. Simulation Results Figure 5 shows the efficiency of different state transi-
tion automata. Four state transition automata, A2, A3,
The simulation results presented in this section were A4, and Last-Time were simulated. A1 is not included,
run with the Two-Level Adaptive Raining schemes, the because early experiments indicated it was inferior to
Static Training Schemes, the Branch Target Buffer the other four-state automata, A2, A3, and A4. The
designs, and some static branch prediction schemes. scheme using Last-Time performs about l percent worse
Figures 5 through 10 show the prediction accuracy than the ones using the other automata which achieve
across the nine benchmarks. On the horizontal axis, the similar accuracy around 97 percent. The four-state fi-
category labeled as “Tot G Mean” shows the geometric nite-state machines maintain more history information
mean across all the benchmarks, “Int G Mean” shows than the Last Time which only records what happened
the geometric mean across all integer benchmarks, and last time; A2, A3, and A4 are therefore more tolerant to
“FP G Mean” shows the geometric mean across all noise in the execution history.
floating point benchmarks. The vertical axis shows the In order to show the curves clearly in the following
prediction accuracy scaled from 76 percent to 100 per- figures, each scheme is shown with the state transition
cent. This section concludes with a comparison be- automata A2 which usually performs the best among the
tween different branch prediction schemes. state transition automata used in this study.
Figure 6: Two-Level Adaptive Training schemes using different history register table implementations.
In order to show the effects of the training data sets, the Static Training schemes can achieve with that data
the simulation results for the schemes (with Same in set, because the best predictions for branches are
their names) which were trained and tested on the same known beforehand.
data set and those for the schemes (with Diff in their Five of nine benchmarks were trained with other
names) which were trained and tested on different data applicable data sets. The other four benchmarks, eqn-
sets are both presented. All the testing data sets are the tott, matrix300, fpppp, and tomcatv, are excluded be-
same as those used by other schemes in order for a fair cause there are no other applicable data sets or the ap-
comparison. In the schemes which were trained and plicable data sets are too similar to each other. The data
executed on the same data set, the results are the best sets used in training and testing are shown in Table 3.