Tools Tutorial
Tools Tutorial
This tutorial describes the Crystal Ball tools: Batch Fit Correlation Matrix Tornado Chart Bootstrap Decision Table Scenario Analysis Two-dimensional Simulation For each tool, there is a general description, an introduction tutorial, and a description of all dialogs, fields, and options.
In this tutorial
1
Overview
Crystal Ball tools are Visual Basic programs that extend the functionality of Crystal Ball. They cover two aspects of Crystal Ball modeling: setup and analysis.
Setup tools
Batch Fit Automatically fits selected continuous probability distributions to multiple data series.
Correlation Matrix Rapidly defines and automates correlations of assumptions. Tornado Chart Individually analyzes the impact of each model variable on a target outcome.
Analysis tools
Bootstrap Addresses the reliability and accuracy of forecast statistics.
Decision Table Evaluates the effects of alternate decisions in a simulation model. Scenario Analysis Displays what inputs created particular outputs. Two-dimensional Simulation Independently addresses uncertainty and variability using two-dimensional simulation. This manual describes each tool, provides a step-by-step example for using each tool, and describes the windows, dialogs, and options for each tool.
Tutorial
Batch Fit is intended to help you create assumptions when you have historical data for several variables. It selects which distribution best fits each series of historical data, and gives you the distribution and its associated parameters for you to use in your model. This tool can also give you a table of goodness-offit statistics for each distribution so you can compare the fit of the best distribution to the fits of the other distributions. The tool can also give you a matrix of correlations calculated between multiple data series so you can easily see which series are related and to what degree. Excel Note: Due to limitations with older versions of Excel, the calculation of correlations between data series has been disabled in Excel 95. To use the Batch Fit tool, your data series must be contiguous (in adjacent rows or columns) in either rows or columns. You can select any combination of the continuous probability distributions to fit to all your data series.
1
To run Batch Fit: 1. In Excel with Crystal Ball loaded, open the workbook Magazine Sales.xls. 2. Select CBTools > Batch Fit. The Select Distributions (Step 1 Of 3) dialog appears. 3. Make sure all the distributions are in the Selected Distributions list. 4. Click on Next. The Select Input Options (Step 2 Of 3) dialog appears. 5. Click on the Select Cells icon to the right of the Location Of Data Series field. 6. Select the Sales Data worksheet. 7. Select cells A1 through D361. 8. Click on the return icon to return to the tool dialog. 9. Select Data In Columns. The default is Data In Columns. 10. Select the Fitness Criteria: Chi-Square. The default is Chi-Square. 11. Check the First Row Contains Headers option. 12. Click on Next. The Select Output Options (3 Of 3) dialog appears. 13. Select the Create Output On The Active Worksheet option. 14. Click on the Select Cells icon to the right of the Specify Upper Left Corner Of The Output field. 15. Select the Sales Model tab. 16. Select cell B13. 17. Click on the return icon to return to the tool dialog. 18. Make sure the Format Output option is checked. 19. Check the Show Table Of Goodness-of-fit Statistics option. 20. To include a correlation analysis of the data:
Tutorial
a.
b. Enter a threshold value in the Define Correlations Above field. The threshold must be between 0 and 1 (inclusive). The tool displays the correlation matrix at the end of the results section. 21. Set the Output Orientation to Data In Columns. 22. Click on Start. The tool fits each selected distribution to each data series. The results appear on the Sales Model worksheet to the right of the existing table. 23. Copy assumption data into the worksheet. a. Select cells C15 through F15. b. Select Cell > Copy Data. Crystal Ball Note: This function copies Crystal Ball data only, not the cell value. c. Select cells C5 through F5. The assumptions copy to the first row of the table. e. f. Select cells C15 through F15. Select Cell > Clear Data. This deletes the original Crystal Ball assumptions. 24. In the Run > Run Preferences > Sampling dialog, set: Random Number Generation to use the Same Sequence Of Random Numbers and a seed value of 999 Monte Carlo simulation
When using this tool, use these options to make the resulting simulations comparable. 25. In the Run > Run Preferences > Trials dialog, set the Maximum Number Of Trials to 500. 26. Click on OK.
1
27. Select Run > Run.
Tutorial
Correlation matrix
In Crystal Ball, you enter correlations one at a time using the Correlation dialog. Instead of manually entering the correlations this way, you can use the Correlation Matrix tool to define a matrix of correlations between assumptions in one simple step. This saves time and effort when building your spreadsheet model, especially for models with many correlated assumptions. The correlation matrix is either an upper or lower triangular matrix with ones along the diagonal. When entering coefficients, think of the matrix as a multiplication table. If you follow one assumption along its horizontal row and the second along its vertical column, the value in the cell where they meet is their correlation coefficient. The matrix contains only the correlation coefficients you enter.
8 Crystal Ball Tools Tutorial
Figure 3 Correlation matrix If you enter inconsistent correlations, Crystal Ball tries to adjust the correlations so they dont conflict. Excel Note: Due to limitations with older versions of Excel, the calculation of correlations between data series has been disabled in Excel 95.
3. Run a simulation by selecting Run > Start. The forecast statistics for the simulation are shown below.
Tutorial
Figure 4 Uncorrelated simulation statistics 4. Select CBTools > Correlation Matrix. The Select Assumptions dialog appears. 5. Include all the assumptions in the correlation matrix by moving all the assumptions from the Available Assumptions field to the Selected Assumptions field by either: Double-clicking on each assumption to move. Selecting each assumption to move and clicking on >> to move it. Making an extended selection using the <Shift> or <Ctrl> keys.
6. Click on Next. The Specify Options dialog appears. 7. Set the following options: Location Of Matrix set to Create A Temporary Correlation Matrix On A New Worksheet Orientation set to Upper Triangular Matrix
10
1
9. Enter the following correlation coefficients into the matrix.
Crystal Ball Tool Note: Leaving a cell blank is not the same as entering a zero. Values that are not specified in the matrix will be filled in with estimates of appropriate values when the simulation runs. 10. Click on Load The Matrix. The tool loads the correlation coefficients from the matrix into your Crystal Ball model. Crystal Ball Tool Note: If a Matrix Successfully Loaded message doesnt appear, press <Tab> or <Return> to exit the current cell and then click on Load The Matrix again. Crystal Ball Note: Loading a full correlation matrix for all assumptions can take several minutes or even longer for large matrices. 11. Reset the simulation. 12. Rerun the simulation. The forecast statistics for the correlated simulation are shown below.
11
Tutorial
Figure 5 Correlated simulation statistics The standard deviation is now much higher than the original simulation due to the correlations. The original model without the correlations ignored this risk factor and its effects.
12
1
Quickly pre-screening the variables in your model to determine which ones are good candidates to define as assumptions or decision variables. You can do this by testing the precedent variables of any formula cell.
Tornado chart
The tool tests the range of each variable at percentiles you specify and then calculates the value of the forecast at each point. The tornado chart illustrates the swing between the maximum and minimum forecast values for each variable, placing the variable that causes the largest swing at the top and the variable that causes the smallest swing at the bottom. The top variables have the most effect on the forecast, and the bottom variables have the least effect on the forecast. Forecast target Forecast axis values
Assumptions and other variables Variable values at forecast minimums and maximums
Figure 6 Tornado chart The bars next to each variable represent the forecast value range across the variable tested, as discussed above. Next to the bars are the values of the variables that produced the greatest swing in the forecast values. The bar colors indicate the direction of the relationship between the variables and the forecast.
13
Tutorial
For variables that have a positive effect on the forecast, the upside of the variable (shown in blue) is to the right of the base case and the downside of the variable (shown in red) is to the left side of the base case. For variables that have a reverse relationship with the forecast, the bars are reversed. When a variables relationship with the forecast is not strictly increasing or decreasing, it is called non-monotonic. In other words, if the minimum or maximum values of the forecast range do not occur at the extreme endpoints of the testing range for the variable, the variable has a non-monotonic relationship with the forecast. maximum minimum
Figure 7 A non-monotonic variable If one or more variables are non-monotonic, all the variable bars are the same color all the way across.
Spider chart
The spider chart illustrates the differences between the minimum and maximum forecast values by graphing a curve through all the variable values tested. Curves with steep slopes, positive or negative, indicate that those variables have a large effect on the forecast, while curves that are almost horizontal
14
1
have little or no effect on the forecast. The slopes of the lines also indicate whether a positive change in the variable has a positive or negative effect on the forecast. Forecast target
Crystal Ball Tool Note: There is a maximum of 250 variables for these charts.
15
Tutorial
4. Click on Next. The Specify Input Variables (Step 2 Of 3) dialog appears. 5. Click on Add Assumptions. 6. Remove Material 2 Strength and Material 3 Strength. a. c. Select an assumption to remove. Repeat steps 6a and 6b for the second assumption to remove. b. Click on Remove.
The last two assumptions have no impact on the target forecast. If you leave them in the list, they will appear in the charts even though they are unrelated to the target forecast. 7. Click on Next. The Specify Options (Step 3 Of 3) dialog appears. 8. Set the following options: Testing range is set to 10% to 90% Testing points is set to 5 For Base Case is set to Use Existing Cell Values Tornado Method is set to Percentiles Of The Variables Both Tornado Chart and Spider Chart are selected Show 20 Top Variables
9. Click on Start. The tool creates the tornado and spider charts on their own workbooks.
16
1
The last two assumptions, Wire Diameter and Spring Deflection, are the least influential assumptions. Since their effects on the Material 1 Reliability are very small, you might ignore their uncertainty or eliminate them from the spreadsheet.
Caveats
While tornado and spider charts are very useful, there are some caveats: Since the tool tests each variable independently of the others, the tool doesnt consider correlations defined between the variables. The results shown in the tornado and spider charts depend significantly on the particular base case used for the variables. To confirm the accuracy of the results, run the tool multiple times with different base cases. This characteristic makes the one-at-a-time perturbation method less robust than the correlation-based method built into Crystal Ball's sensitivity chart. Hence, the sensitivity chart is preferable, since it computes sensitivity by sampling the variables all together while a simulation is running.
Bootstrap tool
Bootstrap is a simple technique that estimates the reliability or accuracy of forecast statistics or other sample data. Classical methods used in the past relied on mathematical formulas to describe the accuracy of sample statistics. These methods assume that the distribution of a sample statistic approaches a normal distribution, making the calculation of the statistics standard error or confidence interval relatively easy. However,
17
Tutorial
when a statistics sampling distribution is not normally distributed or easily found, these classical methods are difficult to use or are invalid.
Figure 9 Sampling distribution of a mean statistic In contrast, bootstrapping analyzes sample statistics empirically by repeatedly sampling the data and creating distributions of the different statistics from each sampling. The term bootstrap comes from the saying, to pull oneself up by ones own bootstraps, since this method uses the distribution of statistics themselves to analyze the statistics accuracy. There are two bootstrap methods available with this tool: One-simulation method Simulates the model data once (creating the original sample), and then repeatedly resamples those simulation trials (the original sample values). Resampling creates a new sample from the original sample with replacement. It then creates a distribution of the statistics calculated from each resample. This method assumes only that the original simulation data accurately portrays the true forecast distribution, which is likely if the sample is large enough. This method isnt as accurate as the multiple-simulation method, but it takes significantly less time to run.
Glossary Term: with replacement Returns the selected value to the sample before selecting another value, letting the selector possibly reselect the same value.
18
1
Multiple-simulation method Repeatedly simulates the model, and then creates a distribution of the statistics from each simulation. This method is more accurate than the onesimulation method, but it might take a prohibitive amount of time. Crystal Ball Tool Note: When you use the multiple-simulation method, the tool temporarily turns off the Use Same Sequence Of Random Numbers option. Statistical Note: In statistics literature, the one-simulation method is also called the non-parametric bootstrap, and the multi-simulation method is also called the parametric bootstrap. Since the bootstrap technique doesnt assume that the sampling distribution is normally distributed, you can use it to estimate the sampling distribution of any statistic, even an unconventional one such as the minimum or maximum endpoint of a forecast. You can also easily estimate complex
19
Tutorial
statistics, such as the correlation coefficient of two data sets, or combinations of statistics, such as the ratio of a mean to a variance. One-simulation method
Simulate forecast (create original sample) Create resample from original sample (with replacement) Compute statistics for resample no Have you reached the number of bootstrap samples? yes Form distribution for all sample statistics
Multiple-simulation method
no
Have you reached the number of bootstrap samples? yes Form distribution for all resample statistics
Statistical Note: To estimate the accuracy of Latin hypercube statistics, you must use the multiple-simulation method.
Bootstrap example
In the Crystal Ball Examples folder there is a Futura Apartments.xls workbook you can use to experiment with the Bootstrap tool. This spreadsheet model forecasts the profit and loss for an apartment complex.
20
1
To run Bootstrap: 1. In Excel with Crystal Ball loaded, open the spreadsheet Futura Apartments.xls. 2. Select CBTools > Bootstrap. The Specify Target dialog appears. 3. Set the target by selecting Profit Or Loss from the forecast list. 4. Click on Next. The Specify Options (Step 2 of 3) dialog appears. 5. Make sure the one-simulation method and the statistics options are selected. 6. Click on Next. The Specify Options (Step 3 of 3) dialog appears. 7. Set the following options: Bootstrap samples is set to 200 Trials per sample is set to 500 Show only target forecast
8. Click on Start. The bootstrap tool displays a forecast chart of the distributions for each statistic and creates a spreadsheet summarizing the data.
21
Tutorial
1
For percentiles, the Bootstrap tool displays the percentile sampling distributions on the overlay and trend charts. To display the individual percentile forecast charts select Run > Forecast Windows. Crystal Ball Note: If you have the Probability Above A Value option selected in the Run Preferences > Options dialog, the percentiles will be reversed in meaning, so that the 1st percentile represents the uppermost 1% and the 99th percentile represents the lowest 1%. The forecast charts visually indicate the accuracy of each statistic. A narrow and symmetrical distribution is better than a wide and skewed distribution.
Figure 12 Bootstrap forecast chart of mean The statistic view further lets you analyze the statistics sampling distribution. If the standard deviation (standard error of the statistic) or coefficient of variability is very large, the statistic might not be reliable and might require more trials. This
23
Tutorial
example has a relatively low standard error and coefficient of variability, so the forecast mean is an accurate estimate of the actual mean.
Figure 13 Bootstrap forecast statistics of mean The results workbook has a correlation matrix showing the correlations between the various statistics. High correlation between certain statistics, such as between the mean and the standard deviation, usually indicates a highly skewed distribution. You can also use the Bootstrap tool to analyze the distribution of percentiles, but you should run at least 1,000 bootstrap samples and 1,000 trials per sample to obtain good sampling distributions for these statistics.1
1. Efron, Bradley, and Robert J. Tibshirani. Monographs on Statistics and Applied Probability, vol. 57: An Introduction to the Bootstrap. New York: Chapman & Hall, 1993. 24 Crystal Ball Tools Tutorial
1
The Decision Table tool runs multiple simulations to test different values for one or two decision variables. The tool tests values across the range of the decision variables and puts the results in a table that you can analyze using Crystal Ball forecast, trend, or overlay charts. The Decision Table tool is useful for investigating how changes in the values of a few decision variable affect the forecast results. For models that contain more than a handful of decision variables, or where you are trying to optimize the forecast results, use OptQuest for Crystal Ball. OptQuest is a wizard-based program that enhances Crystal Ball by automatically finding optimal solutions to simulation models. This program is available with Crystal Ball 2000.2, Professional Edition. Table 1: Comparison and contrast between the Decision Table tool and OptQuest.
Decision Table tool OptQuest
Runs multiple Crystal Ball simulations for different values of decision variables All displayed in a table No One or two Small Only displays the best solutions. Yes Unlimited Small to large
25
Tutorial
1. In Excel with Crystal Ball loaded, open the spreadsheet Oil Field Development.xls. 2. In the Run > Run Preferences > Sampling dialog, set: Random Number Generation to use the Same Sequence Of Random Numbers and An Initial Seed Value of 999 Monte Carlo simulation
When using this tool, use these options to make the resulting simulations comparable. 3. In the Run Preferences dialog, click on OK. 4. Select CBTools > Decision Table. The Specify Target dialog appears. 5. Select the NPV forecast. 6. Click on Next. The Select One Or Two Decisions dialog appears. 7. Move Wells To Drill and Facility Size to the Chosen Decision Variables list. a. Select Wells To Drill in the Available Decision Variables field. Repeat steps 6a and 6b for the Facility Size.
b. Click on >>. c. 8. Click on Next. The Specify Options dialog appears. 9. Set the following options: Number of values to test for Wells To Drill is 6 Number of values to test for Facility Size is 7 Number of trials per simulation is 500 Show only target forecast
10. Click on Start. The tool runs a simulation for each combination of decision variable values. It compiles the results in a table of forecast cells indexed by the decision variables.
26
1
Interpreting the results
For this example, the Decision Table tool ran 42 simulations, one for each combination of wells to drill and facility sizes. The simulation that resulted in the best mean NPV was the combination of 12 wells and a facility size of 150 mbd.
Figure 14 Decision table for Oil Field Development results To view one or more of the forecasts in the decision table, select the cells and select Run > Forecast Charts. To compare one or more forecasts on the same chart, select the cells and click on Trend Chart or Overlay Chart.
Figure 15 Trend chart of 150 mbd forecasts You can create the above trend chart by selecting all the forecast cells in the Facility Size (150.00) row of the results table and clicking on Trend Chart. This chart shows that the forecast with the highest mean NPV also has the largest uncertainty
Crystal Ball Tools Tutorial 27
Tutorial
compared to other forecasts with smaller NPVs of the same facility size. This indicates a higher risk that you could avoid with a different number of wells (although the lower risk is accompanied with a lower NPV). Crystal Ball Note: If you have the Probability Above A Value option selected in the Run Preferences > Options dialog, the percentiles will be reversed in meaning, so that the 1st percentile represents the uppermost 1% and the 99th percentile represents the lowest 1%.
28
Figure 16 Toxic Waste Site spreadsheet To run Scenario Analysis: 1. In Excel with Crystal Ball loaded, open the workbook Toxic Waste Site.xls. 2. Select CBTools > Scenario Analysis. The Specify Target dialog appears. 3. Select the Risk Assessment forecast. 4. Click on Next. The Specify Options dialog appears. 5. In the Range Of Forecast Results section, specify a percentile range between 0 and 100 percent. 6. In the While Running section, select Show Only Target Forecast. 7. In the Simulation Control section, enter 1000 as the maximum number of trials to run.
29
Tutorial
8. Click on Start. The tool creates a table of all the forecast values within the range specified in step 5, along with the corresponding value of each assumption for each trial.
30
1
Another way to analyze the Scenario Analysis results to see an overall trend from the 0th percentile to the 100th percentile is to create an Excel chart of all the values in the generated table. For example, you might create a chart to analyze the Toxic Waste Site results table: Excel Note: This procedure was written for Excel 97 or Excel 2000. Steps for other versions of Excel might differ. 1. Select the range B1 - F1003. 2. Select Insert > Chart. 3. In the first dialog of the Chart wizard, select a Line chart. 4. Click on Next. 5. In the second dialog of the Chart wizard, select the Series tab. 6. In the Series tab, watch the sample chart as you remove the following data series, in the following order: d. Concentration Of Contaminant In Water e. Body Weight Notice the series of Volume Of Water Per Day. Unlike the other series you removed, it increased steadily as the risk increased (since the forecast results were sorted). 7. Remove the Volume Of Water Per Day series from the chart. Notice the CPF value also increases as the risk increases. These correlated trends indicate that the Volume Of Water Per Day and CPF are directly proportional to the risk. For other models, these correlations might not be so obvious. For further analysis, use the sensitivity analysis feature in Crystal Ball.
Tutorial
field and the prime interest rate in 12 months. You can describe an uncertainty assumption with a probability distribution. Theoretically, you can eliminate uncertainty by gathering more information. Practically, information can be missing because you havent gathered it or because it is too costly to gather. variability Assumptions that change because they describe a population with different values. Examples of variability include the individual body weights in a population or the daily number of products sold over a year. You can describe a variability assumption with a discrete distribution (or approximate one with a continuous distribution). Variability is inherent in the system, and you cannot eliminate it by gathering more information. For many types of risk assessments, it is important to clearly distinguish between uncertainty and variability.2 Separating these concepts in a simulation lets you more accurately detect the variation in a forecast due to lack of knowledge and the variation caused by natural variability in a measurement or population. In the same way that a one-dimensional simulation is generally better than single-point estimates for showing the true probability of risk, a two-dimensional simulation is generally better than a one-dimensional simulation for characterizing risk. The Two-dimensional Simulation tool runs an outer loop to simulate the uncertainty values, and then freezes the uncertainty values while it runs an inner loop (of the whole model) to simulate the variability. This process repeats for some small number of outer simulations, providing a portrait of how the forecast distribution varies due to the uncertainty.
2. Hoffman, F. O. and J. S. Hammonds. Propagation of uncertainty in risk assessments: The need to distinguish between uncertainty due to lack of knowledge and uncertainty due to variability, Risk Analysis, vol. 14, no. 5. pp 707-712, 1994. 32 Crystal Ball Tools Tutorial
1
The primary output of this process is a chart depicting a series of cumulative frequency distributions. You can interpret this chart as the range of possible risk curves associated with a population. Crystal Ball Tool Note: When using this tool, set the Seed Value option in the Crystal Ball Run Preferences dialog so that the resulting simulations are more comparable.
When using this tool, use these options to make the resulting simulations comparable. 3. Select CBTools > 2D Simulation. The Specify Target dialog appears. 4. Select the Risk Assessment forecast. 5. Click on Next. The Specify Assumptions dialog appears. 6. Move Body Weight and Volume Of Water Per Day to the Variability list. a. Select Body Weight. b. Click on >>.
33
Tutorial
c.
This separates the assumptions into the two types: uncertainty and variability. 7. Click on Next. The Specify Options dialog appears. 8. Set the following options. Outer simulation runs set to 100 Inner simulation runs set to 1,000 Show only target forecast
9. Click on Start. The simulations start. The tool first single-steps one trial to generate a new set of values for the uncertainty assumptions. Then it freezes these assumptions and runs a simulation for the variability assumptions in the inner loop. The tool retrieves the Crystal Ball forecast information after each inner loop runs. The tool then resets the simulation and repeats the process until the outer loop has run for the specified number of simulations.
34
1
Interpreting the results
The results of the simulations appear in a table containing the forecast means, the uncertainty assumption values, and the statistics (including percentiles) of the forecast distribution for each simulation.
Figure 18 Two-dimensional Simulation results table The tool also graphs the results of the two-dimensional simulations on an overlay chart and a trend chart. The overlay chart shows the risk curves for the simulations for different sets of uncertainty assumption values. In the chart below, most of the risk curves are clustered densely toward the center while a few outlier curves are scattered to the right, showing the small probability of having a much greater risk.
35
Tutorial
Statistical Note: In risk analysis literature, the curves are often called the alternate realizations of the population risk assessment.
Figure 19 Overlay chart of risk curves Another helpful output is a trend chart depicting certainty bands for the percentiles of the risk curves. The band width shows the amount of uncertainty at each percentile level for all the distributions.
36
1
You can focus in on a particular percentile level, such as the 95th percentile, by viewing the statistics of the 95th percentile forecast.
Crystal Ball Note: If you have the Probability Above A Value option selected in the Run Preferences > Options dialog, the percentiles will be reversed in meaning, so that the 1st percentile represents the uppermost 1% and the 99th percentile represents the lowest 1%. Compare the results of the two-dimensional simulation to a onedimensional simulation (with both uncertainty and variability co-mingling together) of the same risk model. The mean of the 95th percentiles, 1.79E-4, is lower than the 95th percentile risk of the one-dimensional simulation shown below at 2.03E-4. This
37
Tutorial
indicates the tendency of the one-dimensional simulation results to overestimate the population risk, especially for highly skewed distributions.
38