0% found this document useful (0 votes)
25 views64 pages

Data Science Analytics Finals Reviewer

Uploaded by

Crystal Valero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views64 pages

Data Science Analytics Finals Reviewer

Uploaded by

Crystal Valero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Regression analysis in Excel

In statistical modeling, regression analysis is used to estimate the relationships between two
or more variables:

Dependent variable (aka criterion variable) is the main factor you are trying to understand and
predict.

Independent variables (aka explanatoryvariables, or predictors) are the factors that might
influence the dependent variable.

Regression analysis helps you understand how the dependent variable changes when one of
the independent variables varies and allows to mathematically determine which of those
variables really has an impact.

Technically, a regression analysis model is based on the sum of squares, which is a


mathematical way to find the dispersion of data points. The goal of a model is to get the
smallest possible sum of squares and draw a line that comes closest to the data.

In statistics, they differentiate between a simple and multiple linear regression. Simple linear
regression models the relationship between a dependent variable and one independent
variables using a linear function. If you use two or more explanatory variables to predict the
dependent variable, you deal with multiple linear regression. If the dependent variable is
modeled as a non-linear function because the data relationships do not follow a straight line,
use nonlinear regression instead.

As an example, let's take sales numbers for umbrellas for the last 24 months and find out the
average monthly rainfall for the same period. Plot this information on a chart, and the regression
line will demonstrate the relationship between the independent variable (rainfall) and dependent
variable (umbrella sales):

Linear regression equation

Mathematically, a linear regression is defined by this equation:

y = bx + a + ε

Where:

​ x is an independent variable.
​ y is a dependent variable.
​ ais the Y-intercept, which is the expected mean value of y when all x variables are equal
to 0. On a regression graph, it's the point where the line crosses the Y axis.
​ b is the slope of a regression line, which is the rate of change for y as x changes.
​ ε is the random error term, which is the difference between the actual value of a
dependent variable and its predicted value.

The linear regression equation always has an error term because, in real life, predictors are
never perfectly precise. However, some programs, including Excel, do the error term calculation
behind the scenes. So, in Excel, you do linear regression using the least squares method and
seek coefficients a and b such that:

y = bx + a

For our example, the linear regression equation takes the following shape:
Umbrellas sold = b * rainfall + a

There exist a handful of different ways to find a and b. The three main methods to perform linear
regression analysis in Excel are:

​ Regression tool included with Analysis ToolPak


​ Scatter chart with a trendline
​ Linear regression formula

Below you will find the detailed instructions on using each method.

How to do linear regression in Excel with Analysis


ToolPak
This example shows how to run regression in Excel by using a special tool included with the
Analysis ToolPak add-in.

Enable the Analysis ToolPak add-in

Analysis ToolPak is available in all versions of Excel 365 to 2003 but is not enabled by default.
So, you need to turn it on manually. Here's how:

​ In your Excel, click File > Options.


​ In the Excel Options dialog box, select Add-ins on the left sidebar, make sure Excel
Add-ins is selected in the Manage box, and click Go.
​ In the Add-ins dialog box, tick off Analysis Toolpak, and click OK:

This will add the Data Analysis tools to the Data tab of your Excel ribbon.

Run regression analysis

In this example, we are going to do a simple linear regression in Excel. What we have is a list of
average monthly rainfall for the last 24 months in column B, which is our independent variable
(predictor), and the number of umbrellas sold in column C, which is the dependent variable. Of
course, there are many other factors that can affect sales, but for now we focus only on these

two variables:
With Analysis Toolpak added enabled, carry out these steps to perform regression analysis in
Excel:

​ On the Data tab, in the Analysis group, click the Data Analysis button.

​ Select Regression and click OK.

​ In the Regression dialog box, configure the following settings:


​ Select the Input Y Range, which is your dependent variable. In our case, it's
umbrella sales (C1:C25).
​ Select the Input X Range, i.e. your independent variable. In this example, it's
the average monthly rainfall (B1:B25).
​ If you are building a multiple regression model, select two or more adjacent columns
with different independent variables.
​ Check the Labels box if there are headers at the top of your X and Y ranges.
​ Choose your preferred Output option, a new worksheet in our case.
​ Optionally, select the Residuals checkbox to get the difference between the
predicted and actual values.

​ Click OK and observe the regression analysis output created by Excel.

Interpret regression analysis output

As you have just seen, running regression in Excel is easy because all calculations are
preformed automatically. The interpretation of the results is a bit trickier because you need to
know what is behind each number. Below you will find a breakdown of 4 major parts of the
regression analysis output.

Regression analysis output: Summary Output


This part tells you how well the calculated linear regression equation fits your source data.

Here's what each piece of information means:

Multiple R. It is the Correlation Coefficient that measures the strength of a linear relationship
between two variables. The correlation coefficient can be any value between -1 and 1, and its
absolute value indicates the relationship strength. The larger the absolute value, the stronger
the relationship:

​ 1 means a strong positive relationship


​ -1 means a strong negative relationship
​ 0 means no relationship at all

R Square. It is the Coefficient of Determination, which is used as an indicator of the goodness of


fit. It shows how many points fall on the regression line. The R2 value is calculated from the total
sum of squares, more precisely, it is the sum of the squared deviations of the original data from
the mean.

In our example, R2 is 0.91 (rounded to 2 digits), which is fairy good. It means that 91% of our
values fit the regression analysis model. In other words, 91% of the dependent variables
(y-values) are explained by the independent variables (x-values). Generally, R Squared of 95%
or more is considered a good fit.

Adjusted R Square. It is the R square adjusted for the number of independent variable in the
model. You will want to use this value instead of R square for multiple regression analysis.

Standard Error. It is another goodness-of-fit measure that shows the precision of your
regression analysis - the smaller the number, the more certain you can be about your regression
equation. While R2 represents the percentage of the dependent variables variance that is
explained by the model, Standard Error is an absolute measure that shows the average
distance that the data points fall from the regression line.

Observations. It is simply the number of observations in your model.


Regression analysis output: ANOVA

The second part of the output is Analysis of Variance (ANOVA):

Basically, it splits the sum of squares into individual components that give information about the
levels of variability within your regression model:

​ df is the number of the degrees of freedom associated with the sources of variance.
​ SS is the sum of squares. The smaller the Residual SS compared with the Total SS, the
better your model fits the data.
​ MS is the mean square.
​ F is the F statistic, or F-test for the null hypothesis. It is used to test the overall
significance of the model.
​ Significance F is the P-value of F.

The ANOVA part is rarely used for a simple linear regression analysis in Excel, but you should
definitely have a close look at the last component. The Significance F value gives an idea of
how reliable (statistically significant) your results are. If Significance F is less than 0.05 (5%),
your model is OK. If it is greater than 0.05, you'd probably better choose another independent
variable.

Regression analysis output: coefficients

This section provides specific information about the components of your analysis:

The most useful component in this section is Coefficients. It enables you to build a linear
regression equation in Excel:

y = bx + a

For our data set, where y is the number of umbrellas sold and x is an average monthly rainfall,
our linear regression formula goes as follows:

Y = Rainfall Coefficient * x + Intercept


Equipped with a and b values rounded to three decimal places, it turns into:

Y=0.45*x-19.074

For example, with the average monthly rainfall equal to 82 mm, the umbrella sales would be
approximately 17.8:

0.45*82-19.074=17.8

In a similar manner, you can find out how many umbrellas are going to be sold with any other
monthly rainfall (x variable) you specify.

Regression analysis output: residuals

If you compare the estimated and actual number of sold umbrellas corresponding to the monthly
rainfall of 82 mm, you will see that these numbers are slightly different:

​ Estimated: 17.8 (calculated above)


​ Actual: 15 (row 2 of the source data)

Why's the difference? Because independent variables are never perfect predictors of the
dependent variables. And the residuals can help you understand how far away the actual values

are from the predicted values:

For the first data point (rainfall of 82 mm), the residual is approximately -2.8. So, we add this
number to the predicted value, and get the actual value: 17.8 - 2.8 = 15.
How to make a linear regression graph in Excel

If you need to quickly visualize the relationship between the two variables, draw a linear
regression chart. That's very easy! Here's how:

​ Select the two columns with your data, including headers.


​ On the Inset tab, in the Chats group, click the Scatter chart icon, and select the
Scatter thumbnail (the first one):

This will insert a scatter plot in your worksheet, which will resemble this one:
​ Now, we need to draw the least squares regression line. To have it done, right click
on any point and choose Add Trendline… from the context menu.

​ On the right pane, select the Linear trendline shape and, optionally, check Display
Equation on Chart to get your regression formula:
As you may notice, the regression equation Excel has created for us is the same as
the linear regression formula we built based on the Coefficients output.
​ Switch to the Fill & Line tab and customize the line to your liking. For example, you
can choose a different line color and use a solid line instead of a dashed line (select

Solid line in the Dash type box):


At this point, your chart already looks like a decent regression graph:

Still, you may want to make a few more improvements:

​ Drag the equation wherever you see fit.


​ Add axes titles (Chart Elements button > Axis Titles).
​ If your data points start in the middle of the horizontal and/or vertical axis like in this
example, you may want to get rid of the excessive white space. The following tip
explains how to do this: Scale the chart axes to reduce white space.
And this is how our improved regression graph looks like:

Important note! In the regression graph, the independent variable should always be on
the X axis and the dependent variable on the Y axis. If your graph is plotted in the
reverse order, swap the columns in your worksheet, and then draw the chart anew. If you
are not allowed to rearrange the source data, then you can switch the X and Y axes
directly in a chart.

How to do regression in Excel using formulas

Microsoft Excel has a few statistical functions that can help you to do linear regression analysis
such as LINEST, SLOPE, INTERCEPT, and CORREL.

The LINEST function uses the least squares regression method to calculate a straight line that
best explains the relationship between your variables and returns an array describing that line.
For now, let's just make a formula for our sample dataset:

=LINEST(C2:C25, B2:B25)

Because the LINEST function returns an array of values, you must enter it as an array formula.
Select two adjacent cells in the same row, E2:F2 in our case, type the formula, and press Ctrl +
Shift + Enter to complete it.

The formula returns the b coefficient (E1) and the a constant (F1) for the already familiar linear
regression equation:

y = bx + a

If you avoid using array formulas in your worksheets, you can calculate a and b individually with
regular formulas:

Get the Y-intercept (a):

=INTERCEPT(C2:C25, B2:B25)
Get the slope (b):

=SLOPE(C2:C25, B2:B25)

Additionally, you can find the correlation coefficient (Multiple R in the regression analysis
summary output) that indicates how strongly the two variables are related to each other:

=CORREL(B2:B25,C2:C25)

The following screenshot shows all these Excel regression formulas in action:
Analyze Time Series Data in Excel
Understanding Time Series Data
Time series data is a bunch of observations or measurements taken at different times. This time
order makes it different from looking at things all at once, providing a dynamic perspective on
the evolutions of a phenomenon. This type of data is commonly used in various fields such as
stock prices, temperature readings, monthly sales figures, and daily website traffic statistics.

Characteristics of Time Series Data

● Temporal Order: Time series data follows a clear sequence, with each data point
corresponding to a specific point in time.
● Seasonality: Certain patterns or trends may repeat at regular intervals, reflecting
seasonal variations or recurring cycles.
● Irregularity: Unpredictable and random fluctuations, known as irregular components, may
be present in time series data.
● Trends and Patterns: Time series data frequently exhibits trends, cycles, or other
patterns that reflect underlying dynamics or recurring phenomena.

The Significance of Time Series Analysis


Understanding time series data is not merely an academic exercise; it is a powerful tool for
making sense of the past and predicting the future. Here’s why time series analysis is
indispensable:

● Forecasting Future Trends – By analyzing historical patterns, businesses can make


informed predictions about future trends, aiding in strategic planning and resource
allocation.
● Resource Optimization – Knowing when demand is likely to surge or decline, helps in
optimizing resource allocation, preventing underutilization or overstocking.
● Risk Management – Time series analysis allows for the identification of potential risks
and uncertainties, enabling organizations to implement effective risk management
strategies.
● Economic Planning – Governments and policymakers leverage time series data to
evaluate economic trends, plan for future developments, and implement policies aligned
with expected trajectories.

Simply put, time series data tells a story about changes, trends, and unusual events over time.
Analyzing this data helps decision-makers make informed choices for the future, using lessons
from the past.
Data Preparation for Time Series Analysis
Time series analysis involves examining and modelling data points collected over time to
identify patterns, trends, and make predictions. However, before delving into the analysis itself,
it is crucial to ensure that the time series data is clean, well-organized, and free from anomalies.
This section will discuss the essential steps required to prepare time series data, addressing
issues such as missing values, outliers, and irregularities.

Cleaning Time Series Data

● Exploratory Data Analysis – Identify the time variable, assess data distributions, and gain
insights into the overall data patterns.
● Duplicate Record Removal – Check for and eliminate duplicate records. Duplicate
entries can distort analyses, and their presence may be indicative of data entry errors or
system malfunctions.

Handling Missing Values

● Detection of Missing Values – Use statistical measures and visualization to identify


missing values within the time series. Understand the extent and patterns of missingness
to inform the imputation strategy.
● Imputation Strategies – Select appropriate imputation methods based on the nature of
the missing data. Common techniques include mean or median imputation, forward or
backward filling, or more sophisticated methods such as time-series-specific imputation
algorithms.

Managing Outliers

● Outlier Identification – Employ statistical techniques, visualization tools, or domain


knowledge to identify outliers.
● Outlier Handling – Choose an appropriate strategy to handle outliers, whether through
transformation, removal, or capping extreme values. The decision should align with the
specific goals of the analysis and the nature of the outliers.

Addressing Irregularities

● Time Irregularities – Inspect the time sequence for irregularities such as gaps or
overlaps. Ensure a consistent time frequency and address any irregularities by adjusting
timestamps or interpolating missing time points.
● Decomposition of Components – Decompose the time series into its underlying
components, including seasonal and trend elements.
Documentation and Logging

● Record-Keeping – Document all steps taken during data preparation. This


documentation serves as a reference for reproducibility and assists in communicating
the data processing steps to others.
● Logging Anomalies – Maintain a log of any anomalies, outliers, or unique observations
encountered during data preparation. This log can guide subsequent analyses and
contribute valuable insights into the dataset’s characteristics.

In summary, thorough data preparation is fundamental for accurate and meaningful time series
analysis. Addressing missing values, outliers, and irregularities ensures that the data accurately
represents the underlying patterns, allowing for more reliable insights and predictions.

Time Series Data Visualization in Excel


Excel provides a user-friendly interface and a variety of chart types that can effectively convey
the temporal patterns present in time series datasets. In this section, we will explore the robust
charting and graphing capabilities of Microsoft Excel for visualizing time series data.

● Line Chart – Excel’s Line Chart is a fundamental tool for visualizing time series data. It
connects data points with a line, making it easy to observe trends, fluctuations, and
patterns over time.
● Scatter Plot – Scatter plots in Excel allow the display of individual data points, offering a
clear representation of how each observation contributes to the overall time series. This
is particularly useful for identifying outliers or anomalies.
● Area Chart – Area charts can be employed to illustrate cumulative changes over time.
They are effective in showcasing trends and variations while providing a sense of the
overall magnitude of the time series.

Descriptive Analysis of Time Series Data


In this section, we will delve into the essential aspects of descriptive analysis for time series
data. Understanding the statistical characteristics of time series data is fundamental for gaining
insights into the behavior and variability. We will explore the common measures used in
descriptive analysis and guide calculating these statistics using Excel.

● Mean (Average): The arithmetic mean represents the central tendency of the data. It is
calculated by summing all values and dividing by the number of observations.
Excel Function: =AVERAGE(data_range)

● Median: The median is the middle value in a dataset when it is ordered. It is less
sensitive to extreme values than the mean and provides a robust measure of central
tendency.

Excel Function: =MEDIAN(data_range)

● Standard Deviation: The standard deviation measures the dispersion or variability of


data points around the mean. A higher standard deviation indicates greater variability.

Excel Function: =STDEV(data_range)

● Skewness: Describes the asymmetry of the distribution. Positive skewness indicates a


longer right tail, while negative skewness implies a longer left tail.

Excel Function: =SKEW(data_range)

● Kurtosis: Measures the “tailedness” of the distribution. A higher kurtosis suggests


heavier tails, potentially indicating more extreme values.

Excel Function: =KURT(data_range)

Visualization of Descriptive Statistics

● Box Plots – Create box plots in Excel to visualize the distribution, central tendency, and
variability of the time series data. Box plots display the median, quartiles, and potential
outliers.
● Histograms – Excel’s histogram tool allows for the visualization of the frequency
distribution of time series data. This provides insights into the shape of the distribution.

Interpreting Descriptive Statistics

● Trends and Seasonality – Analyze the mean and standard deviation over time to identify
trends and seasonality patterns within the time series.
● Outliers – Examine skewness and kurtosis, along with visualizations, to detect outliers or
extreme values that may impact the analysis.
Time Series Decomposition
Time series decomposition is a powerful technique used to break down a time series into its
constituent components. This process helps analysts understand and separate the underlying
patterns, trends, seasonality, and random noise present in the data. In this section, we will
introduce the concept of time series decomposition and guide how to perform it using Excel.

Components of Time Series

● Trend – The long-term movement or direction in the time series. It represents the
underlying growth or decline in the data.
● Seasonality – The repetitive and predictable patterns that occur at fixed intervals within
the time series. Seasonality often corresponds to regular, recurring events such as daily,
weekly, or yearly cycles.
● Noise (Residual) – The irregular and unpredictable fluctuations in the time series that
cannot be attributed to the trend or seasonality. It represents random variation or
measurement errors.

Moving Averages
Moving averages are valuable tools for revealing underlying trends and patterns within data by
reducing noise and short-term fluctuations. This section will also guide you through the practical
application of moving averages and smoothing techniques using Excel.

● Simple Moving Average (SMA) – A basic moving average calculated by averaging a set
of values over a specified time window. It is useful for smoothing out short-term
fluctuations and highlighting long-term trends.

To calculate the simple moving average in Excel, use the AVERAGE function. Create a new
column and input a formula like:

=AVERAGE(data_range)

● Exponential Moving Average (EMA) – A weighted moving average that gives more
importance to recent observations. It responds more quickly to changes, making it
suitable for capturing trends in rapidly changing data.

Excel provides a function, “EMA”, for calculating exponential moving averages. This function
requires the data range and a smoothing factor, commonly represented by a constant (α).

=EMA(data_range, smoothing_factor)
Smoothing Techniques

● Double Exponential Smoothing (Holt’s Method) – Holt’s method extends the concept of
exponential smoothing to capture both trend and seasonality in time series data. Excel’s
Data Analysis ToolPak provides tools for implementing double exponential smoothing.
● Triple Exponential Smoothing (Holt-Winters Method) – Holt-Winters method includes an
additional component to account for seasonality. Excel’s Data Analysis ToolPak also
supports triple exponential smoothing.

Visualization and Interpretation

● Plotting Original vs. Smoothed Data – Create a time series plot that overlays the original
data with the smoothed data. This visual representation helps in comparing the
effectiveness of smoothing techniques.
● Assessing Trend and Seasonality – Analyze the smoothed data to identify underlying
trends and seasonality. Smoothing techniques reveal patterns that may be obscured by
noise in raw time series data.

Excel Tips for Moving Averages and Smoothing

● Dynamic Windows – Use dynamic window sizes for moving averages by incorporating
Excel functions like OFFSET or INDEX. This allows for flexibility in adapting to different
time series characteristics.
● Visualization Tools – Leverage Excel’s charting capabilities to visualize the impact of
moving averages and smoothing techniques on the time series data. Consider creating
side-by-side plots for easy comparison.

Time Series Decomposition in Excel


● Data Preparation – Ensure your time series data is organized with a clear time variable.
If necessary, create a time series plot to visually inspect the overall pattern.
● Excel’s Moving Averages for Trend – Use Excel to calculate moving averages for
different window sizes to identify the trend. This involves creating a new column that
calculates the average of a specified number of previous observations.

=TREND(data_range, timeline_range)

● Seasonal Component – Calculate the seasonal component by removing the trend


component from the original time series data. This can be achieved using Excel’s
subtraction operation.
=original_data – trend_component

● Noise (Residual) – The residual component represents the noise or random fluctuations.
It can be obtained by subtracting the sum of the trend and seasonal components from
the original data.

=original_data – (trend_component + seasonal_component)

● Visualization – Create charts or plots in Excel to visualize the trend, seasonality, and
residual components separately. This allows for a clear understanding of each
component’s contribution to the overall time series.

Interpretation and Analysis


● Analysing Trend – Examine the trend component to understand the long-term movement
in the data. Trends can provide insights into overall growth or decline.
● Understanding Seasonality – Explore the seasonality component to identify recurring
patterns. Seasonal analysis is crucial for understanding the impact of periodic events on
the time series.
● Examining Residuals – Analyze the residuals to identify any remaining patterns or
anomalies in the data. Residuals should ideally exhibit random behavior if the
decomposition is effective.

Time Series Forecasting Methods


● Single Exponential Smoothing (SES) – Applies different weights to historical
observations, with more recent data receiving higher weights. It is suitable for datasets
with no clear trend or seasonality.
● Double Exponential Smoothing (Holt’s Method) – Extends SES to account for trends in
the time series data.
● Triple Exponential Smoothing (Holt-Winters Method) – Takes seasonality in addition to
trend into consideration, making it suitable for data with both trends and recurring
patterns.
● ARIMA – ARIMA combines Autoregression (AR), Integration (I), and Moving Average
(MA) components to model time series data. It is particularly effective for datasets
exhibiting trend and seasonality.
Time Series Forecasting with Excel
Time series forecasting is a critical aspect of data analysis, enabling the prediction of future
values based on historical patterns. In this section, we will explore forecasting methods,
focusing on exponential smoothing and Autoregressive Integrated Moving Average (ARIMA),
and provide a step-by-step guide on conducting time series forecasts in Excel.

Single Exponential Smoothing (SES)

1. Data Preparation: Organize your time series data in Excel.


2. Calculate Initial Forecast: Use the first observation as the initial forecast.
3. Smoothed Forecast: Apply the SES formula to update the forecast for subsequent
periods.

=α * Actual + (1 – α) * Previous Forecast

Double Exponential Smoothing (Holt’s Method)

1. Data Preparation: Organize your time series data in Excel.


2. Calculate Initial Trend and Forecast: Use the first two observations to calculate the initial
trend and forecast.
3. Smoothed Forecast and Trend: Update the forecast and trend using Holt’s method
formulas.

Triple Exponential Smoothing (Holt-Winters Method)

1. Data Preparation: Organize your time series data in Excel.


2. Calculate Initial Level, Trend, and Seasonal Components: Use the first few observations
to initialize these components.
3. Update Components: Apply Holt-Winters formulas to update the level, trend, and
seasonality.
4. Calculate Forecast: Combine the updated components to calculate the forecast.

ARIMA Forecasting

1. Identify Parameters: Use autocorrelation and partial autocorrelation plots to identify


appropriate ARIMA parameters (p, d, q).
2. Differencing: If needed, differentiate the time series data to achieve stationarity.
3. Fit ARIMA Model: Use Excel’s Data Analysis ToolPak to fit an ARIMA model to the
differenced data.
4. Forecast: Utilize the fitted ARIMA model to make future predictions.
Visualization and Validation
● Plotting Actual vs. Forecasted Values – Create visualizations in Excel that overlay the
actual time series data with the forecasted values. This helps assess the accuracy of the
forecast.
● Model Evaluation – Utilize measures like Mean Absolute Error (MAE), Mean Squared
Error (MSE), or Root Mean Squared Error (RMSE) to evaluate the accuracy of the
forecasting models.

Evaluating Forecasting Accuracy


Ensuring the accuracy of time series forecasts is crucial for making informed decisions based on
predicted values. In this section, we will explore various metrics used to evaluate the accuracy
of time series forecasts, with a focus on commonly employed measures such as Mean Absolute
Error (MAE) and Root Mean Square Error (RMSE).

Importance of Forecast Accuracy Evaluation


Accurate evaluation of forecasting models is essential for several reasons:

● Decision-Making Confidence: Accurate forecasts instill confidence in decision-makers


who rely on predictions to plan and allocate resources.
● Model Comparison: Different forecasting models can be compared to identify the most
effective one for a particular dataset.
● Improvement Feedback: Evaluation metrics help analysts understand the shortcomings
of a model, allowing for iterative improvement.

Forecast Accuracy Metrics

● Mean Absolute Error (MAE) – MAE represents the average absolute difference between
actual and forecasted values. It is expressed in the same units as the data, making it
easy to interpret.
● Root Mean Square Error (RMSE) – RMSE penalizes larger errors more significantly than
MAE. It provides a measure of the typical size of the forecast errors.
● Mean Absolute Percentage Error (MAPE) – MAPE expresses the average percentage
difference between actual and forecasted values. It is particularly useful when dealing
with datasets with varying scales.
Implementing Accuracy Metrics in Excel

● Calculating MAE in Excel – Use the ABS function to calculate absolute differences and
AVERAGE to find the mean.

= AVERAGE(ABS(Actual_range – Forecast_range))

● Calculating RMSE in Excel – Excel does not have a built-in RMSE function, but it can be
computed using the following formula:

= SQRT(AVERAGE((Actual_range – Forecast_range)^2))

● Calculating MAPE in Excel – Similarly, calculate MAPE using the following formula:

= AVERAGE(ABS((Actual_range – Forecast_range) / Actual_range) * 100)

Consideration in Evaluation of Predictions

● Residual Analysis – Examine the distribution of residuals (actual – forecasted) to ensure


they are normally distributed and unbiased.
● Benchmarking – Compare the performance of the forecasting model against a simple
benchmark, such as a naïve forecast, to provide context to the evaluation.

Case Studies and Examples


In this section, we will explore real-world case studies and examples to demonstrate the
application of time series analysis across various domains. Additionally, we will showcase how
Excel can be utilized as a practical tool for solving specific time series problems in these
scenarios.

● Financial Time Series Analysis – Analyzing daily stock prices to predict short-term trends
and volatility.
● Sales Forecasting for Retail – Predicting future sales for a retail business based on
historical sales data.
● Energy Consumption Prediction – Forecasting future energy consumption to optimize
resource allocation.
● Website Traffic Analysis – Analyzing daily website traffic to identify patterns and plan
server capacity.
● Temperature Forecasting for Agriculture – Predicting future temperatures to assist
farmers in planning crop cycles.
● Inventory Management – Forecasting inventory demand to optimize stock levels and
reduce holding costs.
Step 1 – Input Time Series Data
We are going to use a company’s quarterly revenue in two specific years.

● Put the year series data in column B. In our case, it has only been two years.
● Input the quarter of each year. You can use a repeating sequence for that or use
AutoFill.
● Insert the total revenue in every quarter.

Step 2 – Enable the Data Analysis Feature


● Go to the File tab from the ribbon.
● Go to the Options menu.

● The Excel Options dialog box will appear.


● Go to Add-ins and, under the Add-ins option, select the Analysis Toolpak.
● Choose the Excel Add-ins from the Manage drop-down menu.
● Click on the Go button.
● The Add-ins window will come up.
● Check Analysis Toolpak.
● Click on the OK button.
● We will get the Data Analysis button under the Data tab.
Step 3 – Execute the Statistical Analysis
● Go to the Data tab from the ribbon.
● Click on the Data Analysis tool under the Analysis group.

● The Data Analysis dialog box will pop up.


● Scroll down and select Exponential Smoothing.
● Click on the OK button.
● This will display the Exponential Smoothing dialog box.
● Select the cell range in the Input Range field. In this case, we selected the range
$D$5:$D$12, which is the Revenue column.
● Specify the Damping factor.
● Select the range $E$5 in the Output Range field.
● Check the Chart Output and Standard Errors boxes.
● Click OK.

Final Output to Analyze Time Series Data in


Excel
● The Smoothed Level and Standard Error columns represent the outcomes of the
statistical analysis.
● In the smoothed levels, the column contains the following formula:

=0.7*D6+0.3*E6

● For the standard errors, the formula is as follows.

=SQRT(SUMXMY2(D6:D8,E6:E8)/3)

● We will also get a graphical representation of the Revenue and a forecast.


Time Series Forecasting in Excel
Steps:

● Select the Actual revenue curve line.


● Right-click and select Add Trendline.
● The Format Trendline window will show up on the right side of the spreadsheets.
● Check Polynomial from the Trendline Options.
● Check the Display Equation on chart and Display R-squared value on chart
boxes.
● In the forecast models, the polynomial trend line has a lower error rate.
● The required trend line will be returned in the graph.
● Choose Linear if you like to have a linear trend line.
● In our case we put the Forward period under the Forecast option.
● This will display a linear trend line next to the actual data on the graph.
Suppose we want to forecast exponential dependence. GROWTH delivers the y-values for a set

of new x-values. We can also use this function to fit an exponential curve to already-existing x-

and y-values.

● Insert a new column named Forecast.


● Select the cell where you want the result of the forecast value using the GROWTH
function.
● Put this formula into that selected cell.

=GROWTH($D$5:$D$12,$C$5:$C$12,C5,TRUE)

● Hit the Enter key.


● Drag the Fill Handle down to duplicate the formula over the range or double-click
on the plus (+) symbol.
● You can see the prediction for the revenue.
Overview of Prescriptive Analytics

Introduction to Prescriptive Analytics

Prescriptive Analytics is the third and most advanced phase in the data analytics process,

following Descriptive and Predictive Analytics. While Descriptive Analytics answers the question

"What happened?" and Predictive Analytics answers "What is likely to happen?", Prescriptive

Analytics goes a step further to answer "What should we do?"

It involves the use of data, algorithms, and advanced computational techniques to recommend

actions and predict their outcomes. By integrating insights from past and future trends,

Prescriptive Analytics provides actionable strategies to optimize decision-making.

Key Characteristics

1. Action-Oriented: Provides recommendations for specific actions.

2. Scenario Testing: Allows exploration of "what-if" scenarios to evaluate the outcomes of

various decisions.

3. Optimization: Uses algorithms to find the best course of action among multiple options.

4. Integration with Systems: Can work alongside operational systems like supply chain

management or customer relationship management (CRM) software.

Applications in Business

Prescriptive Analytics is widely applied across industries:


● Healthcare: Optimizing treatment plans for patients or allocating hospital resources

effectively.

● Retail: Managing inventory and pricing strategies to maximize revenue.

● Transportation: Scheduling delivery routes to minimize fuel costs and meet delivery

deadlines.

● Finance: Portfolio optimization, fraud detection, and risk management.

Chapter 2: Components of Prescriptive Analytics

1. Data Collection and Preparation

To provide meaningful recommendations, Prescriptive Analytics relies on data from various

sources:

● Historical Data: Past performance metrics.

● Predictive Data: Forecasted outcomes from Predictive Analytics models.

● External Data: Market trends, weather forecasts, or social sentiment.

2. Analytical Models

Prescriptive Analytics employs a range of models to analyze and recommend actions:

● Optimization Models: Use mathematical equations to maximize or minimize an

objective (e.g., profit, cost).

● Simulation Models: Test the effects of decisions in a controlled virtual environment.

● Machine Learning Models: Adapt and improve recommendations over time by learning

from new data.


3. Decision-Making Framework

Prescriptive Analytics frameworks often provide decision-makers with:

● Actionable Recommendations: Clear steps to follow.

● Outcome Projections: Estimated results of implementing a recommendation.

● Scenario Comparisons: Analysis of different strategies side-by-side.

Chapter 3: Techniques in Prescriptive Analytics

1. What-If Analysis

A technique that evaluates how different decisions impact outcomes by changing input

variables. It is commonly used in Excel through tools like Data Tables and Scenario Manager.

Example: A retail store wants to know how changes in the price of a product affect total

revenue. Using What-If Analysis, they can simulate various price points and observe the

corresponding revenue projections.

2. Goal Seek

Goal Seek in Excel determines the input value required to achieve a specific outcome.

Example: A company wants to find the sales volume needed to achieve a profit of $50,000. By

using Goal Seek, they can calculate the exact number of units required.
3. Solver

Solver is an Excel add-in for solving optimization problems. It identifies the best decision by

adjusting multiple input variables within constraints.

Example: A logistics company needs to minimize transportation costs while ensuring timely

delivery. Using Solver, they can optimize routes and schedules.

Chapter 4: Prescriptive Analytics in Action

Case Study: Restaurant Optimization

A restaurant manager wants to:

1. Maximize profit.

2. Minimize food waste.

3. Ensure customer satisfaction.

Using Prescriptive Analytics:

1. Data Collection: Gather historical sales data, food preparation costs, and customer

preferences.

2. Model Development: Use Solver to determine optimal menu pricing and inventory

management strategies.

3. Implementation: Adjust menu offerings and monitor real-time performance.

4. Evaluation: Measure improvements in profit and waste reduction.

Chapter 5: Benefits and Challenges


Benefits

1. Informed Decision-Making: Provides actionable insights based on data.

2. Resource Optimization: Allocates resources effectively to achieve objectives.

3. Competitive Advantage: Enhances responsiveness to market trends.

Challenges

1. Complexity: Advanced algorithms require specialized expertise.

2. Data Dependency: Relies on accurate and comprehensive data.

3. Implementation Costs: Can be resource-intensive to deploy and maintain.

Chapter 6: The Future of Prescriptive Analytics

Emerging Trends

1. AI-Driven Prescriptions: Increasing use of AI to automate decision-making.

2. Real-Time Analytics: Integration with IoT devices for instant recommendations.

3. Ethical Considerations: Balancing optimization with fairness and transparency.

Conclusion

Prescriptive Analytics represents the pinnacle of data analytics, transforming raw data into

actionable strategies. With its power to optimize outcomes and guide decision-making, it is an

indispensable tool for businesses navigating complex and competitive environments.


What-If Analysis with Data Tables in Excel

What-if analysis is the option available in Data. In what-if analysis, by changing the input value
in some cells you can see the effect on output. It tells about the relationship between input
values and output values. In this article, we will learn how to use the what-if analysis with data
tables effectively.

What is What-if Analysis?

What-if analysis is a procedure in excel in which we work in tabular form data. In the What-if
analysis variety of values have been in the cell of the excel sheet to see the result in different
ways by not creating different sheets. There are three tools of what-if analysis.

Tools of what-if analysis

There are three tools in what-if analysis:

1. Goal seek
2. Scenario manager
3. Data Table

Goal seek

In goal seek we already know our output value we have to find the correct input value. For
example, if a student wants to know his English marks and he knows all the rest of the marks
and total marks in all subjects.

Step 1: Write all subjects and their marks in an excel sheet and do the sum by applying the
formula sum.
Step 2: Go into the data tab of the Toolbar.

Step 3: Under the Data Table section, Select the What-if analysis.

Step 4: A drop-down appears. Select the Goal Seek.


Step 5: The dialogue box appears in the first column write the name of the cell in which you
apply the formula sum. Type D10 in Set cell.

Step 6: In the second column write the value of the target. The target value for this example is
440.
Step 7: In the third column write the name of the cell in which you want to get marks in English.
Provide absolute cell reference, i.e. $D$5.

Step 8: Click ok and see the result. The estimated marks for English are 71.
Scenario Manager

In scenario manager, we create different scenarios by proving different input values for the
same variable than by comparing scenarios to choose the correct result. For Example, To
check the cost of revenue for three different months.

Step 1: Given a data set, for Revenue Cost of Jan, with Expenses and Cost as its columns.
Step 2: Select the numerical value cell and Go to the Data.

Step 3: Under the forecast section, click on the What-if analysis.

Step 4: A drop-down appears. Select the Scenario manager.


Step 5: A dialog box appears in the dialog box select add option.

Step 6: A new dialog appears to write the name of the new scenario in the first column. Under
Scenario name, write “Revenue of Feb”.
Step 7: In the second column select the changing cell. The changing cells for this example,
are $E$5:$E$9.
Step 8: A new dialogue box name Scenario Values appears to write the changed value in the
box. Enter the values as per shown in the image. Click Ok.

Step 9: Repeat step5, step6, and step8.


Step 10: Click Ok then select summary.
Step 11: A new Dialog box name Scenario Summary appears. Select Result cells: $E$10.
Step 12: See the result.

Data Table

In data, we create a table with different input values for the same variables. It is one of the most
helpful features in what-if analysis. One can change different values in x and can achieve
different outputs accordingly for research as well as business-driven purposes.

A data table is of two types:

Data table in one Variable

In the data table in one variable, we can change only one input value either in a row or in a
column. It includes only one input cell. For example, a company wants to know about its
revenue by changing the cost of raw materials by using a data table. Given a data set, with
material and their cost.

Step 1: Create a table of revenue cost.


Step 2: Copy the last cell in which you get output in another cell. D7 for this example.

Step 3: Write the values in the cell for which you want to make a change in a column or in rows.
Step 4: Go to the data tab of the Toolbar.

Step 5: Under the data table section, Select the what-if analysis.

Step 6: A drop-down appears. Select the Data Table.


Step 7: A dialogue box name data table appears then select the cell in which you want to
change the input value in a row or in the column. Input the value of the Column input cell to be
$D$3. Click Ok. Your data table is ready.

Data table in two Variable

In the Data table in two variables, we can change two input values in both row and column. It
includes two input cells. For example, A person wants to know about per month installments
of loan by the different rates of interest and for the different time periods for the same
principal amount.

Step 1: Create a table to find PMT.


Step 2: Copy the last cell in which you get output in another cell

Step 3: Write both values you want to change in both columns and rows.

Step 4: Go to the Data tab of the toolbar.


Step 5: Select the what-if analysis.

Step 6: Select the Data Table.

Step 7: A dialogue box appears in which you have to select the cell in which you want to
change the value in both row and column. The Row input cell value is $D$5 and the column
input cell value is $D$6.
Step 8: Click ok and see the result.

You might also like