0% found this document useful (0 votes)
18 views29 pages

DATA ANALYTICS Unit III

Uploaded by

salpurenk5064
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

DATA ANALYTICS Unit III

Uploaded by

salpurenk5064
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

DATA ANALYTICS BCS SY

Unit III
Working with time series data and regression analysis.

1) Introduction of time series data - Time series data is a type of data that is
collected or recorded over time at regular intervals. In time series analysis, the
order of observations is crucial, as they are taken at successive points in time. This
type of data is commonly used in various fields, including finance, economics,
signal processing, environmental science, and many others

2) Understanding Time series data and importance- Understanding time


series data is crucial in various fields due to its ability to reveal patterns, trends,
and dependencies that can inform decision-making, forecasting, and analysis.
a) Pattern identification –
Time series data allows for the identification of patterns and trends over time.
Recognizing these patterns is essential for understanding the behavior of a
system or process.
b) Forecasting - Time series analysis enables the prediction of future values
based on historical data patterns. Forecasting is valuable in areas such as
finance, where it's used to predict stock prices, or in weather forecasting to
predict future weather condition
c) Resource planning - In business and manufacturing, understanding time
series data is crucial for resource planning. It helps in predicting demand,
managing inventory, and optimizing production schedules.
d) Healthcare Research - In healthcare, time series data is used for patient
monitoring, disease progression analysis, and predicting outbreaks. It plays a
vital role in understanding and managing various health-related phenonema.

3) Working with time series data in excel : data and time –


A) Date import - If your time series data is large or if you have it in a different
format, you might want to import it into Excel. You can use the "Data" tab and
choose various options like "From Text" or "From Workbook" to import external
data.
B) Date Formating - Excel may not always recognize date formats
automatically. Make sure your date or time column is formatted correctly. You
can use the "Format Cells" option under the "Home" tab to format the date as
required.
C) Trend analysis - You can use Excel's built-in functions for trend analysis. For
example, the "LINEST" function can be used for linear regression, and the
"GROWTH" function can be used for exponential growth.

Bhise N K
DATA ANALYTICS BCS SY

Trend Analysis and Forcasting :-


1) Identifying trend and pattern in time series data - Trend analysis in data
analytics involves examining data over time to identify patterns, tendencies, or trends. This
process helps analysts and decision-makers understand how a particular variable or set of
variables changes and evolves over a specified period. Trend analysis is widely used in
various fields, including finance, marketing, economics, and healthcare, among others.
Key Aspect of trends analysis.
❖ Time series data.

❖ Pattern Identification.

❖ Visualization.

❖ Forcasting.

❖ Business Inteligence.
2) Introduction to time series forcasting - Time series forecasting is a specialized area
of predictive analytics that involves making predictions about future values based on
historical data points ordered chronologically. In a time series, each data point is associated
with a specific timestamp, and the goal is to use the patterns and trends within the
historical data to make accurate predictions for future time point.
Key Aspect of time series analysis.
❖ Trend

❖ Seasonality

❖ Time series forcasting methods like machine learning, statistical


methods.
❖ Data preprocessing.

❖ Cross validation like train and test data

Bhise N K
DATA ANALYTICS BCS SY

ForCasting Technique in excel – Linear and polynomial trendline –


A) Linear regression - Linear regression in Excel is a statistical technique that is
used to find the relationship between two variables by fitting a linear equation to
observed data. In the context of forecasting or predicting, linear regression can be
used to estimate the values of one variable (dependent variable) based on the
values of another variable (independent variable.
Formula –

B) Polynomial TrendLine - A polynomial trendline is a curved line that is used


when data fluctuates. It is useful, for example, for analyzing gains and losses
over a large data set. The order of the polynomial can be determined by the
number of fluctuations in the data or by how many bends (hills and valleys)
appear in the curve.

4. Smoothing Technique : Moving Average

A) Introduction to moving average as a smoothing technique - In Excel, a


moving average is a statistical calculation that is used to analyze data over a
certain period of time by creating a series of averages of different subsets of the
full data set. It is often used to smooth out fluctuations in data and highlight
trends or patterns.

There are different types of moving averages, but the most common one is the
Simple Moving Average (SMA). The Simple Moving Average is calculated by
taking the average of a set of data points over a specified period and then
moving the average to the next set of data points. The formula for calculating the
Simple Moving Average for a given data set is :

SMA = Some of data points in the specified period / No. of data point in the
specified period.

Steps : 1) Prepare your data - Arrange your data in a column in excel.

2)Choose aperied - Decide on the period for your moving average. For example, if
you want a 3-period moving average, you would use the average of the first 3
data points, then the next 3, and so on.

Bhise N K
DATA ANALYTICS BCS SY

3) Calculate the moving average - Place the formula in the cell where you want
the moving average to start.

If your data is in column A and you are calculating a 3-period moving average,
and your first data point is in cell A2, the formula in cell B4 would be:
=AVERAGE(A2:A4)

B) Calculating Simple , weighted and exponential moving average -


● weighted moving avg - The Weighted Moving Average assigns different
weights to different data points. Assume you have weights in column B
corresponding to your data in column A. For a 3-period WMA starting from
cell.

● Exponential moving avg - An Exponential Moving Average (EMA) is a type


of moving average that places more emphasis on recent data points,
giving them higher weightage in the calculation compared to older data
points. This makes the EMA more responsive to changes in the data
series, especially when compared to the Simple Moving Average (SMA)
where all data points are equally weighted.

Application of Exponential Moving Average -

1) Financial Forcasting - Exponential smoothing is applied in finance


to predict future trends in stock prices, currency exchange rates, or
other financial indicators. Investors and financial analysts use these
forecasts to make informed decisions about investment strategies.
2) Sales Forcasting - Retailers and manufacturers utilize exponential
smoothing to predict sales volumes for products. This helps in
planning production schedules, managing supply chains, and
optimizing marketing strategies based on anticipated demand.
3) Call Center Volume Prediction - Call centers use exponential
smoothing to forecast the volume of incoming calls. This
information is valuable for efficiently scheduling staff, ensuring
optimal customer service levels, and managing resources
effectively
4) Weather forecasting - Meteorologists use exponential smoothing
to predict weather patterns based on historical climate data. This

Bhise N K
DATA ANALYTICS BCS SY

aids in providing short-term forecasts for temperature, precipitation,


and other meteorological variables.

5. Simple Linear Regression.

Simple linear regression is a statistical method used to model the relationship


between two continuous variables. In simple linear regression, one variable
(often denoted as X) is considered the independent variable or predictor variable,
while the other variable (often denoted as Y) is considered the dependent
variable or response variable.

The relationship between the two variables is assumed to be approximately


linear, meaning that changes in the independent variable are associated with changes in
the dependent variable in a straight-line fashion. The simple linear regression model is
represented by the equation:

Y = B0 + B1X + E

★ Y is the dependent variable.


★ X is the independent variable.
★ B0 is the intercept , which represents the value of Y where X is zero.
★ B1 is slope , which represents the change in Y for a one unit change in X.
★ E is error terms,representing the difference between the observed and predicted
values of Y. it captures the variability in Y that is not explained by the linear
relationship with X. (E = Epsilon).

The goal of simple linear regression is to estimate the values of the coefficients B0 and
B1 that minimize the sum of the squared differences between the observed values of Y
and the values predicted by the regression model. This is often done using the method
of least squares..

Bhise N K
DATA ANALYTICS BCS SY

Once the coefficients are estimated, the regression model can be used to make
predictions about the dependent variable based on values of the independent variable.
Additionally, the fit of the model can be assessed using various metrics such as the
coefficient of determination (R2) and hypothesis tests for the significance of the
coefficients.

6. Multiple Linear Regression.

Performing multiple linear regression in Excel involves using the built-in


functions to analyze a dataset with multiple independent variables (predictors)
and one dependent variable (outcome). Here's a step-by-step guide on how to do
it:

​ Prepare Your Data:


● Organize your data so that each row represents an observation, and each
column represents a variable.
● Ensure that you have a column for the dependent variable (outcome) and
separate columns for each independent variable (predictor).
​ Open Excel and Load Your Data:
● Open Excel and load your dataset into a new or existing spreadsheet.
● Make sure your dataset is arranged in a tabular format with variable
names in the header row and data in subsequent rows.
​ Activate the Data Analysis Toolpak:
● If you haven't already done so, activate the Data Analysis Toolpak add-in in
Excel. You can do this by clicking on "File" > "Options" > "Add-Ins" > "Excel
Add-Ins" > "Analysis Toolpak" > "Go..." and then checking the "Analysis
Toolpak" option.
​ Access the Data Analysis Toolpak:
● Once the Data Analysis Toolpak is activated, you can find it in the "Data"
tab on the Excel ribbon.
● Click on "Data Analysis" in the "Analysis" group to open the Data Analysis
dialog box.
​ Select Regression Analysis:
● In the Data Analysis dialog box, scroll down and select "Regression" from
the list of analysis tools.
● Click "OK" to proceed.

Bhise N K
DATA ANALYTICS BCS SY

​ Enter Input Range and Options:


● In the Regression dialog box, specify the input range for your independent
variables (predictors) and dependent variable (outcome).
● Check the box for "Labels" if your data includes variable names in the first
row.
● Choose where you want the output to be displayed (e.g., a new worksheet
or a specific range in the current worksheet).
● Optionally, you can specify additional options such as confidence level and
residuals.
​ Run the Regression Analysis:
● Click "OK" to run the regression analysis.
● Excel will calculate the coefficients for each independent variable, as well
as other statistics such as R-squared, adjusted R-squared, standard error,
F-statistic, and p-values.
​ Interpret the Results:
● Review the output to interpret the results of the regression analysis.
● Pay attention to the coefficients for each independent variable, as they
represent the magnitude and direction of the relationship with the
dependent variable.
● Evaluate the significance of each coefficient using the p-values. Lower
p-values indicate greater significance.
● Consider other statistics such as R-squared to assess the overall fit of the
model

7. Model Diagnostics And Validation

Model diagnostics and validation in Excel typically involve assessing the performance
and accuracy of a model built within Excel, such as a financial model, forecasting
model, or regression analysis. Here's a general overview of steps you can take for
diagnostics and validation:

​ Data Preparation:
● Ensure your data is clean, organized, and appropriately formatted.
● Split your data into training and testing sets if applicable.

Bhise N K
DATA ANALYTICS BCS SY

​ Model Building:
● Construct your model using Excel functions, formulas, or add-ins.
● Document your model's assumptions, methodology, and limitations.
​ Diagnostic Checks:
● Perform basic checks to ensure your model is functioning correctly, such
as:
● Verifying formulas and references.
● Checking for errors or inconsistencies.
● Assessing outliers or anomalies in the data.
​ Model Evaluation:
● Evaluate the performance of your model using appropriate metrics.
● For forecasting or regression models, consider metrics like Mean Absolute
Error (MAE), Root Mean Squared Error (RMSE), or R-squared (R²).
● For financial models, assess metrics such as Net Present Value (NPV),
Internal Rate of Return (IRR), or Payback Period.
​ Validation:
● Validate your model against real-world data or known outcomes.
● Compare model predictions or outputs with observed results.
● Use techniques like cross-validation if applicable.
​ Sensitivity Analysis:
● Conduct sensitivity analysis to understand how changes in input variables
affect model outputs.
● Use Excel's built-in tools like Data Tables or Scenario Manager for
sensitivity analysis.
​ Visualizations:
● Create visualizations to present your model's outputs and insights
effectively.
● Excel offers various chart types and formatting options for visual
representation.
​ Documentation:
● Document your findings, assumptions, methodologies, and validation
results thoroughly.
● Include notes within your Excel file or create a separate documentation
file.
​ Peer Review:
● Have your model reviewed by colleagues or subject matter experts to
identify potential errors or areas for improvement.
​ Revision and Iteration:

Bhise N K
DATA ANALYTICS BCS SY

● Based on feedback and validation results, revise and refine your model as
needed.
● Iteratively improve your model to enhance its accuracy and reliability.
​ Version Control:
● Maintain version control to track changes and ensure traceability of model
revisions.
​ Final Review and Approval:
● Conduct a final review of your model before deployment or presentation.
● Obtain necessary approvals or sign-offs from stakeholders.

A.assigning the quality of model : R-squared, adjusted


R-squared, and standard error-

Assigning the quality of a model based on R-squared, adjusted R-squared, and standard
error involves assessing how well the model fits the data and whether it provides
meaningful insights. Here's how you can interpret these metrics:

​ R-squared (R²):
● R-squared is a statistical measure that represents the proportion of the
variance in the dependent variable that is explained by the independent
variables in the model.
● It ranges from 0 to 1, where 1 indicates that the model explains all the
variability of the response data around its mean.
● Higher R-squared values generally indicate a better fit of the model to the
data.
● However, R-squared alone does not determine whether a model is good or
bad; it should be interpreted in conjunction with other metrics.
​ Interpretation:
● R-squared values closer to 1 imply that the model explains a large portion
of the variability in the data and is considered desirable.
● R-squared values closer to 0 suggest that the model does not explain
much of the variability in the data and may not be useful for prediction.
​ Adjusted R-squared:
● Adjusted R-squared is similar to R-squared but adjusts for the number of
predictors in the model.
● It penalizes excessive use of predictors and provides a more accurate
measure of model fit, especially when comparing models with different
numbers of predictors.

Bhise N K
DATA ANALYTICS BCS SY

● Adjusted R-squared tends to be slightly lower than R-squared, especially


when additional predictors do not significantly improve the model.
​ Interpretation:
● Similar to R-squared, higher adjusted R-squared values indicate better
model fit.
● Comparing adjusted R-squared values across different models helps
determine which model provides the best balance between explanatory
power and complexity.
​ Standard Error:
● The standard error of the regression (also known as the standard error of
the estimate) measures the average deviation of the observed values from
the predicted values by the model.
● It provides an indication of the accuracy of the predictions made by the
model.
● A lower standard error indicates that the model's predictions are closer to
the actual observed values.
​ Interpretation:
● Lower standard error values suggest that the model's predictions are more
precise and accurate.
● Higher standard error values indicate greater variability in the predictions
and less precision.

In summary, when assessing the quality of a model based on R-squared, adjusted


R-squared, and standard error:

● Look for higher R-squared and adjusted R-squared values, indicating better model
fit.
● Compare adjusted R-squared values across models to assess the trade-off
between model complexity and explanatory power.
● Aim for lower standard error values, indicating more accurate predictions

B.Testing the assumption : normality linearity,multicollinearity


and homoscedasticity

Testing the assumptions of normality, linearity, multicollinearity, and homoscedasticity


is crucial when building regression models. Here's how you can test each assumption in
Excel:

Bhise N K
DATA ANALYTICS BCS SY

​ Normality Assumption:
● Normality of residuals is essential for regression analysis. You can assess
this assumption by examining the distribution of residuals.
● After running your regression model, calculate the residuals (the
differences between the observed and predicted values).
● Use Excel to create a histogram or a Q-Q plot of the residuals to visually
inspect their distribution.
● Additionally, you can perform a formal test for normality, such as the
Shapiro-Wilk test, using Excel's statistical functions or add-ins like Real
Statistics Resource Pack.
​ Linearity Assumption:
● The relationship between the independent and dependent variables should
be linear. You can check this assumption by plotting the observed values
of the dependent variable against the predicted values from your
regression model.
● After running the regression, create a scatter plot in Excel with the
observed values on the y-axis and the predicted values on the x-axis.
● Ensure that the points on the scatter plot are randomly distributed around
a diagonal line, indicating linearity.
● You can also check for linearity by examining residual plots, where
residuals should be randomly distributed around zero for different values
of the independent variables.
​ Multicollinearity Assumption:
● Multicollinearity occurs when independent variables in a regression model
are highly correlated with each other.
● Calculate correlation coefficients between independent variables using
Excel's CORREL function.
● Alternatively, you can use Excel's Data Analysis Toolpak to perform a
correlation analysis.
● Look for high correlation coefficients (close to +1 or -1) between pairs of
independent variables, indicating potential multicollinearity issues.
● Consider using variance inflation factor (VIF) calculations to quantitatively
assess multicollinearity, which can be computed using Excel formulas
after estimating your regression model.
​ Homoscedasticity Assumption:
● Homoscedasticity means that the variance of the residuals is constant
across all levels of the independent variables.
● After running the regression, plot the residuals against the predicted
values or against each independent variable.

Bhise N K
DATA ANALYTICS BCS SY

● Ensure that there are no discernible patterns or trends in the residual plot,
indicating constant variance.
● You can also perform formal tests for homoscedasticity, such as the
Breusch-Pagan test or White's test, using Excel's statistical functions or
add-ins

C. Cross Validation and model selection technique .

Cross-validation and model selection testing are both important techniques used in

machine learning to evaluate and select the best-performing model for a given dataset.

Here's a brief overview of each:

​ Cross-validation:
● Cross-validation is a resampling technique used to assess how well a
model generalizes to an independent dataset.
● The basic idea is to partition the dataset into multiple subsets or folds.
The model is trained on a portion of the data and validated on the
remaining portion.
● Common types of cross-validation include k-fold cross-validation,
stratified k-fold cross-validation, leave-one-out cross-validation (LOOCV),
etc.
● By repeating this process with different partitions of the data, we can
obtain multiple estimates of model performance. The final performance
metric is often computed as the average across all folds.
​ Model selection testing:
● Model selection refers to the process of choosing the best model or
algorithm from a set of candidate models.
● Model selection testing involves evaluating different models using a
performance metric and selecting the one that performs best on unseen
data.
● This process typically involves comparing the performance of models
using techniques such as cross-validation, holdout validation, or other
validation strategies.
● Performance metrics used for model selection testing depend on the
problem at hand but often include accuracy, precision, recall, F1 score,
ROC AUC, etc.

Bhise N K
DATA ANALYTICS BCS SY

● Hyperparameter tuning is often a part of model selection testing, where


different combinations of hyperparameters are tested to find the optimal
configuration for a given model.

In practice, cross-validation is commonly used during model selection testing. Each

model candidate is trained and evaluated using cross-validation, and the model with the

best average performance across the folds is selected as the final model. Additionally,

cross-validation helps in estimating the generalization performance of the selected

model on unseen data.

8.Non-Linear Regression Model


Nonlinear regression models are used when the relationship between the independent

variables and the dependent variable is not linear. In these cases, the relationship may

be better described by a curve or some other nonlinear function. Nonlinear regression

models can capture more complex patterns and relationships in the data compared to

linear regression models.

Here's a general overview of nonlinear regression models:

​ Model Representation:

● Nonlinear regression models can take various forms, depending on the

specific problem and the nature of the data.

● A common form of nonlinear regression model is:

y = f(x, β) + ε

Bhise N K
DATA ANALYTICS BCS SY

● Where:

● y is the dependent variable.

● x is the independent variable(s).

● β represents the parameters of the model.

● f() is a nonlinear function that relates the independent variable(s)

to the dependent variable.

● ε is the error term, representing the difference between the

observed and predicted values.

​ Model Fitting:

● Fitting a nonlinear regression model involves estimating the parameters β

that best fit the observed data.

● This is typically done using optimization techniques such as least squares

estimation, maximum likelihood estimation, or other numerical

optimization methods.

● The objective is to minimize the sum of squared differences between the

observed and predicted values (the residual sum of squares).

​ Types of Nonlinear Models:

● There are many types of nonlinear regression models, including

polynomial regression, exponential regression, logarithmic regression,

power regression, sigmoidal regression, etc.

● The choice of the specific nonlinear function depends on the underlying

relationship between the variables and the characteristics of the data.

​ Model Evaluation:

● Evaluation of nonlinear regression models involves assessing how well the

model fits the data and how well it generalizes to unseen data.

Bhise N K
DATA ANALYTICS BCS SY

● Common evaluation metrics include R-squared (coefficient of

determination), root mean squared error (RMSE), mean absolute error

(MAE), etc.

● Cross-validation techniques can also be applied to assess the model's

performance and guard against overfitting.

​ Applications:

● Nonlinear regression models are widely used in various fields, including

economics, biology, physics, engineering, finance, etc., wherever the

relationships between variables are nonlinear.

In summary, nonlinear regression models provide a flexible framework for capturing

complex relationships in the data and are valuable tools for modeling real-world

phenomena. However, fitting and interpreting these models require careful

consideration of the underlying relationships and appropriate model selection and

evaluation techniques.

8.1 Implementing Non linear regression model in excel using solver

add-in.
Implementing a non-linear regression model in Excel using the Solver Add-In involves

fitting a curve to data points by minimizing the sum of squared differences between the

observed and predicted values. Here's a step-by-step guide on how to do this:

​ Organize your data: Have your independent variable (X) in one column and your

dependent variable (Y) in another column.

Bhise N K
DATA ANALYTICS BCS SY

​ Choose a Model: Decide on the type of non-linear model you want to fit to your

data. Common models include exponential, logarithmic, polynomial, etc. For

example, let's consider fitting an exponential model: Y = A * exp(B * X), where A

and B are parameters to be determined.

​ Initial Guess: Provide initial guesses for the parameters A and B. You can either

estimate them from the data or start with reasonable values.

​ Set up the Model in Excel: In another column, calculate the predicted Y values

based on the current parameter guesses and the chosen model.

​ Calculate Residuals: In another column, calculate the differences between the

observed Y values and the predicted Y values.

​ Sum of Squared Residuals (SSR): Square each residual and sum them up. This is

the objective function you want to minimize.

​ Use Solver Add-In: Go to the "Data" tab, click on "Solver" (if you haven't installed it

yet, you may need to add it from Excel Add-Ins), and set up Solver to minimize the

SSR by changing the cell values of the parameters (A and B).

​ Run Solver: Click Solve, and Solver will try different values of A and B to minimize

the SSR.

​ Analyze Results: Once Solver converges, you'll get the optimal values of

parameters A and B.

Bhise N K
DATA ANALYTICS BCS SY

Here's a simplified example:

Let's say your data is in columns A and B, with X values in column A and Y values in

column B.

● Initial guess for A = 1

● Initial guess for B = 0.1

In another column, calculate predicted Y values using the formula Y = A * EXP(B * X).

Then, calculate residuals (difference between observed Y and predicted Y).

Calculate SSR as the sum of squared residuals.

Set up Solver to minimize SSR by changing cells containing A and B values.

Run Solver, and it will find the optimal values for A and B.

This approach can be generalized to any non-linear model. Just replace the model
formula with the one you want to fit.

9. Time Series Decomposition


Time series decomposition in Excel involves separating a time series into its constituent

components, typically trend, seasonal, and irregular components. While Excel doesn't

have a built-in function specifically for time series decomposition, you can use some of

its functionalities to achieve this.

One common method for time series decomposition is the classical decomposition

method, which involves:

Bhise N K
DATA ANALYTICS BCS SY

● Trend Estimation: Estimating the trend component of the time series.


● Seasonal Adjustment: Adjusting for seasonal effects.
● Residual (Irregular) Calculation: Calculating the irregular component as the
remainder after removing trend and seasonal effects.

Here's a general guide on how to perform time series decomposition in Excel using

these steps:

​ Import Your Time Series Data: Input your time series data into Excel. Typically,
you'll have two columns: one for the dates (time) and another for the
corresponding values.

​ Estimate the Trend: You can use various methods to estimate the trend, such as
moving averages or linear regression. For instance, you could calculate a moving
average over a certain window of time to smooth out fluctuations and estimate
the trend.

​ Seasonal Adjustment: To adjust for seasonal effects, you'll need to calculate
seasonal indices. One simple method is to calculate the average value of the
time series for each season (e.g., each month or each quarter) and then calculate
seasonal indices by dividing each observed value by the corresponding seasonal
average. Subtracting these seasonal indices from the original values gives you
the seasonally adjusted series.

​ Residual Calculation: Once you have estimated the trend and adjusted for
seasonal effects, you can calculate the residuals (irregular component) as the
difference between the original values and the sum of the trend and seasonal
components.

Bhise N K
DATA ANALYTICS BCS SY

​ Visualization and Analysis: Plot the original time series data along with the
estimated trend, seasonal, and irregular components to visualize the
decomposition. You can use Excel charting features for this.

While performing time series decomposition manually in Excel can be somewhat

laborious, it is possible with careful use of formulas and data manipulation techniques.

Alternatively, you may consider using more specialized software or programming

languages like Python or R that have built-in functions and libraries for time series

analysis and decomposition.

9.1 Understanding the components of time series data:


trend, seasonality, and noise.
Understanding the components of time series data—trend, seasonality, and

noise—is crucial for analyzing and modeling time-dependent phenomena

accurately. Here's an overview of each component:

​ Trend:
● Definition: The long-term movement or direction of the data over
time. It represents the underlying pattern in the data that persists
over a long period.
● Characteristics:
● Trends can be increasing, decreasing, or stable over time.
● They reflect changes due to underlying factors such as
population growth, economic cycles, technological
advancements, etc.
● Identification:
● Visual inspection of the time series plot.

Bhise N K
DATA ANALYTICS BCS SY

● Statistical techniques like moving averages, exponential


smoothing, or linear regression to estimate and extract the
trend component.
● Example: A steady increase in sales over several years due to
population growth and increasing demand for the product.

​ Seasonality:
● Definition: Regular, periodic fluctuations in the data occurring at
fixed intervals of time (e.g., daily, weekly, monthly, quarterly, or
yearly).
● Characteristics:
● Seasonality represents patterns that repeat over a specific
period, often driven by calendar-related or natural factors.
● It can be additive (consistent amplitude throughout) or
multiplicative (amplitude scales with the level of the series).
● Identification:
● Visual inspection of the time series plot, looking for recurring
patterns at fixed intervals.
● Seasonal decomposition techniques to isolate and estimate
seasonal effects.
● Example: Higher sales of ice cream during summer months and
lower sales during winter, driven by weather conditions.

​ Noise (Irregular or Residual):
● Definition: Random fluctuations or irregular variations in the data
that cannot be attributed to the trend or seasonality. It represents the
randomness or unpredictability in the data.
● Characteristics:
Bhise N K
DATA ANALYTICS BCS SY

● Noise is typically short-term and unpredictable.


● It may arise from various sources such as measurement
errors, sampling variability, or unpredictable external factors.
● Identification:
● Examining the residuals after removing the trend and seasonal
components.
● Statistical techniques like autocorrelation analysis or residual
diagnostics to assess randomness.
● Example: Random fluctuations in daily stock prices due to market
uncertainty and investor behavior.

Understanding and accurately modeling these components are essential for

forecasting, anomaly detection, and decision-making based on time series data.

Effective decomposition allows analysts to isolate each component's effects,

making it easier to interpret patterns and make informed predictions.

decomposing time series data in excel by using moving


average and seasonal indices:

Decomposing time series data in Excel using moving averages and seasonal indices
involves estimating the trend and seasonal components separately. Here's how you can
do it step by step:

​ Import Your Time Series Data: Input your time series data into Excel. You should
have two columns: one for the dates (time) and another for the corresponding
values.
​ Calculate Moving Averages for Trend Estimation:

Bhise N K
DATA ANALYTICS BCS SY

● Choose a window size for your moving average. The window size
determines how many consecutive data points are averaged.
● In a new column, calculate the moving average for each data point using
Excel's AVERAGE function combined with relative cell references. For
example, if your time series values are in column B starting from B2, and
you've chosen a window size of 5, in cell C3, you would input
=AVERAGE(B2:B6) and drag this formula down to calculate moving averages
for all data points.
​ Calculate Seasonal Indices:
● Determine the periodicity of your seasonal component (e.g., monthly,
quarterly).
● Calculate the average value for each season. For instance, if you have
monthly data, calculate the average value for each month across all years.
● Divide each observed value by the corresponding seasonal average to
obtain seasonal indices.
● In Excel, you can calculate these seasonal averages manually or use
functions like AVERAGEIFS or PivotTables.
● Once you have the seasonal indices, expand them to match the length of
your time series data.
​ Calculate Seasonally Adjusted Values:
● Divide the original time series values by the seasonal indices to obtain
seasonally adjusted values. You can do this in a new column.
● This step removes the seasonal component from the original data, leaving
the trend and irregular components.
​ Calculate Residuals (Irregular Component):
● Subtract the trend (moving averages) from the seasonally adjusted values
to obtain residuals.
● Residuals represent the irregular component of the time series data.
​ Visualize the Components:

Bhise N K
DATA ANALYTICS BCS SY

● Plot the original time series data, moving averages (trend), seasonal
indices, and residuals to visualize how each component contributes to the
overall series.
● Excel's charting features can be used for this purpose.

By following these steps, you can decompose your time series data into its trend,

seasonal, and irregular components using moving averages and seasonal indices in

Excel. This decomposition facilitates a better understanding of the underlying patterns

in the data, aiding in forecasting and analysis.

Fig: Seasonal indices.

Bhise N K
DATA ANALYTICS BCS SY

10. Advance time series forecasting technique


Advanced time series forecasting techniques go beyond simple methods like moving

averages or exponential smoothing and often involve more sophisticated algorithms

and models. Here are some advanced techniques commonly used for time series

forecasting:

​ ARIMA (AutoRegressive Integrated Moving Average):


● ARIMA is a widely used model for time series forecasting that combines
autoregressive (AR) and moving average (MA) components.
● It can handle both trend and seasonality in the data and is suitable for
stationary or non-stationary time series.
● ARIMA models require tuning parameters such as the order of differencing
(d), the number of autoregressive terms (p), and the number of moving
average terms (q).
​ Seasonal ARIMA (SARIMA):
● SARIMA extends the ARIMA model to incorporate seasonal components
in addition to trend and irregular components.
● It includes seasonal parameters (P, D, Q) in addition to the non-seasonal
parameters of ARIMA.
● SARIMA models are effective for time series data with clear seasonal
patterns.
​ Exponential Smoothing State Space Models (ETS):
● ETS models are a class of state space models that include several
exponential smoothing methods such as simple exponential smoothing,
Holt's method, and Holt-Winters' method.
● These models provide a flexible framework for capturing trend,
seasonality, and irregular components in the data.
● ETS models are particularly useful when the data exhibit changing
patterns over time.

Bhise N K
DATA ANALYTICS BCS SY

​ Seasonal Decomposition of Time Series (STL):


● STL decomposes a time series into trend, seasonal, and irregular
components using a robust iterative algorithm.
● It allows for more flexible handling of non-linear trends and irregular
patterns in the data compared to traditional decomposition methods.
● STL is useful for forecasting time series with complex seasonal patterns.
​ Machine Learning Models:
● Various machine learning algorithms such as neural networks, random
forests, support vector machines, and gradient boosting machines can be
applied to time series forecasting.
● These models can capture complex relationships and patterns in the data
and often outperform traditional statistical methods.
● Feature engineering, model selection, and hyperparameter tuning are
important considerations when using machine learning for time series
forecasting.
​ Deep Learning Models:
● Deep learning models, particularly recurrent neural networks (RNNs) and
variants like long short-term memory (LSTM) networks and gated
recurrent units (GRUs), have shown promise for time series forecasting.
● These models can capture long-term dependencies and non-linear
patterns in the data, making them suitable for a wide range of forecasting
tasks.
● Deep learning models may require large amounts of data and
computational resources for training.
​ Ensemble Methods:
● Ensemble methods combine forecasts from multiple models to improve
prediction accuracy and robustness.
● Techniques such as model averaging, weighted averaging, and stacking
can be used to combine forecasts from different models or model
configurations.

Bhise N K
DATA ANALYTICS BCS SY

● Ensemble methods can mitigate the weaknesses of individual models and


provide more reliable forecasts.

When applying advanced time series forecasting techniques, it's essential to evaluate

model performance using appropriate metrics and consider factors such as data

quality, seasonality, trend patterns, and the forecasting horizon. Additionally, model

interpretation and the computational complexity of the chosen approach should be

taken into account when selecting the most suitable technique for a particular

forecasting task.

10.1 AutoRegressive Model :

The Autoregressive (AR) model is a time series forecasting technique that


predicts future values based on a linear combination of past observations. In an
AR model of order p(denote as AR(p)), the current value of Yt is modeled as a
function of the p recent observation.

Yt​=c+ϕ1​Yt−1​+ϕ2​Yt−2​+⋯+ϕp​Yt−p​+εt​

Where :

● c is the constant term.


● ϕ1​,ϕ2​,…,ϕp​are the autoregressive coefficient.
● Yt−1​,Yt−2​,…,Yt−p​are the lagged values of time series.
● εt is the error term at time t.

The AR model captures the linear relationship between the current value of the
time series and its past values.

Bhise N K
DATA ANALYTICS BCS SY

10.2 Moving Average Model :

The Moving Average (MA) model is another time series forecasting technique
that predicts future values based on the weighted sum of past prediction errors.
In an MA model of order q (denoted as a MA(q)) the current value of Yt is
modeled as a function of the q most recent predictors errors.

Yt​=μ+θ1​εt−1​+θ2​εt−2​+⋯+θq​εt−q​+εt​

Where:

● μ (mu) is the mean of time series.


● θ1​,θ2​,…,θq are the moving average coefficients.
● εt−1​,εt−2​,…,εt−q​ are lagged prediction errors.
● εt is the error term at time t.

The MA model captures the dependence between the current value of the time series
and the residual errors from previous predictions.

10.3 Implementing advanced forecasting technique in excel using


custom formula and add-in:
Implementing advanced time series forecasting techniques such as ARIMA in Excel

using custom formulas alone can be quite challenging due to the complexity of these

models. However, you can use Excel in conjunction with add-ins or external tools to

perform advanced forecasting. One such popular add-in for Excel is the "Solver" add-in,

which can be used to optimize parameters for simpler models like exponential

smoothing or simple linear regression.

Here's a general approach using Solver and an external tool like R or Python for ARIMA

forecasting:

Bhise N K
DATA ANALYTICS BCS SY

​ Data Preparation in Excel:


● Organize your time series data in Excel, typically in two columns: one for
the time index (dates) and another for the corresponding values.
​ Export Data:
● Export the data from Excel into a format that can be read by R or Python.
Common formats include CSV (Comma-Separated Values) or Excel files.
​ Model Estimation in R or Python:
● Use R packages like "forecast" or Python libraries like "statsmodels" or
"pmdarima" to fit ARIMA models to your time series data.
● Write scripts or functions in R or Python to read the data, fit the ARIMA
model, and generate forecasts.
​ Optimization with Solver (Optional):
● If you want to optimize model parameters (e.g., for exponential smoothing
models), you can use Excel's Solver add-in.
● Define an objective function in Excel that measures the error between
actual and predicted values (e.g., Mean Squared Error).
● Use Solver to minimize this objective function by adjusting the model
parameters.
​ Import Forecasts into Excel:
● Once you have generated forecasts using R or Python, import the
forecasted values back into Excel.
● You can create a new sheet or column to display the forecasted values
alongside the original data.
​ Visualization and Analysis in Excel:
● Use Excel's charting features to visualize the original time series data and
the forecasted values.
● Analyze the accuracy of the forecasts and make adjustments as needed.

This approach leverages the strengths of both Excel and external tools like R or Python

to perform advanced time series forecasting. While Excel may not be suitable for

Bhise N K
DATA ANALYTICS BCS SY

directly implementing complex forecasting models, it can still be a valuable tool for data

preparation, visualization, and analysis in conjunction with more powerful statistical

software.

Bhise N K

You might also like