Statistical Methods For Computer Science II
Statistical Methods For Computer Science II
Using MS Excel
Objective
Nonlinear relationships are common in real-life data, such as population growth, sales
trends, or temperature variations.
A scatter plot is a graphical tool to observe and understand such relationships.
1. Create a dataset with two variables XXX (independent) and YYY (dependent).
2. Add a new column X2X^2X2, calculated as XXX squared (X2X^2X2).
X Y X²
1 5.2 1
2 9.1 4
3 15.4 9
4 25.8 16
5 40.7 25
1. Go to the Data tab → Select Data Analysis (ensure the Data Analysis ToolPak is
enabled).
2. Choose Regression:
o Input XXX as the independent variable.
o Input YYY as the dependent variable.
3. Use XXX and X2X^2X2 together as independent variables to calculate the
coefficients aaa, bbb, and ccc.
4. Record the results:
o Intercept (aaa): Constant term.
o Coefficient of XXX (bbb).
o Coefficient of X2X^2X2 (ccc).
4. Interpretation of Results
Analyze the coefficients (aaa, bbb, and ccc) to understand the relationship:
o If c>0c > 0c>0, the curve opens upwards.
o If c<0c < 0c<0, the curve opens downwards.
Discuss the goodness of fit using R2R^2R2 value displayed in the trendline options.
Dataset
XXX YYY
2 4.5
4 18.2
6 38.6
8 64.9
10 102.5
12 151.2
Step-by-Step Solution
1. Data Preparation
2. Scatter Plot
3. Adding a Trendline
Example results:
Final equation:
Y=−3.1+1.5X+0.89X2Y = -3.1 + 1.5X + 0.89X^2Y=−3.1+1.5X+0.89X2.
1. Use the trendline options in the scatter plot to visualize the equation.
2. Verify that the curve aligns with the data points.
Interpretation of Results
1. Coefficients:
o c=0.89c = 0.89c=0.89: Indicates the upward curvature.
o b=1.5b = 1.5b=1.5: Positive linear trend.
o a=−3.1a = -3.1a=−3.1: Adjusts the position of the curve.
2. R-squared Value:
o R2=0.998R^2 = 0.998R2=0.998: The curve fits the data well.
3. Conclusion:
o The relationship between XXX and YYY is nonlinear and fits a quadratic
equation.
Practice Dataset
XXX YYY
1 5.3
3 19.7
5 40.2
7 68.5
9 105.8
11 153.4
Repeat the steps above to fit the second-degree curve and interpret the results.
4o
X Y
1 2.4
2 3.8
3 7.2
4 13.4
5 22.6
To fit exponential curves of the form Y=a⋅bxY = a \cdot b^xY=a⋅bx and Y=a⋅XbY = a \cdot
X^bY=a⋅Xb using MS-Excel. This involves determining the parameters aaa and bbb and
using scatter plots to explain the nonlinear relationship between two variables.
Key Concepts
Steps in MS-Excel
1. Input Data
Create a table with two columns: Independent variable xxx (or XXX) and dependent
variable YYY.
Examples
Dataset:
xxx YYY
1 3.2
2 5.1
3 8.0
4 12.5
5 19.6
Steps in Excel:
Verification:
Dataset:
XXX YYY
2 1.8
4 3.6
6 6.3
8 10.2
10 14.5
Steps in Excel:
1. Create a scatter plot to observe the trend.
2. Add a Power Trendline.
3. Display the equation Y=0.5⋅X1.8Y = 0.5 \cdot X^{1.8}Y=0.5⋅X1.8 (example).
Verification:
Scatter plots provide visual insights into the relationship between xxx and YYY.
If the points follow a curve, exponential or power trendlines are appropriate.
The equations fitted help model and predict values effectively.
Practice Problem
Fit the exponential curve Y=a⋅bxY = a \cdot b^xY=a⋅bx using the following data:
xxx YYY
1 4.5
2 7.8
3 13.2
4 22.3
5 37.5
Objective
To estimate the trend component in a time series dataset by using the method of moving
averages. This is useful for identifying long-term patterns or trends over a series of time
periods.
Formula:
2. Using MS-Excel
Examples
Interpretation: The moving average smooths the short-term fluctuations, highlighting the
overall trend.
Steps:
Quarte Reven
4-Period Moving Average Centered Moving Average
r ue ($)
Q1 100
Q2 120
100+120+140+1604=130\
Q3 140 frac{100+120+140+160}{4} =
1304100+120+140+160=130
120+140+160+1804=150\ 130+1502=140\
Q4 160 frac{120+140+160+180}{4} = frac{130+150}{2} =
1504120+140+160+180=150 1402130+150=140
140+160+180+2004=170\
Q5 180 frac{140+160+180+200}{4} =
1704140+160+180+200=170
Q6 200
Steps in Excel:
=AVERAGE(C3:C4)=AVERAGE(C3:C4)=AVERAGE(C3:C4)
Discussion
Moving averages help smooth fluctuations in the data to reveal the underlying trend.
Choosing the period nnn is critical:
o Smaller nnn: Sensitive to short-term changes.
o Larger nnn: Smoother trend but less responsive.
In Excel, dynamic charts and formulas make this method efficient for large datasets.
Practice Problem
Use the following data to calculate and plot a 5-period moving average:
Week Demand
1 120
2 130
3 150
4 170
5 190
6 210
7 230
Objective
To estimate the trend in a time series dataset using the exponential smoothing method. This
technique is used to smooth fluctuations and provide a weighted average where more recent
observations carry higher weight.
1. Time Series: Sequential data recorded over time (e.g., daily sales, monthly
temperature).
2. Exponential Smoothing: A forecasting method where weights decrease
exponentially for older observations.
o More recent data points have a higher influence on the smoothed value.
o Formula: St=αXt+(1−α)St−1S_t = \alpha X_t + (1 - \alpha) S_{t-1}St=αXt
+(1−α)St−1 Where:
StS_tSt: Smoothed value at time ttt
α\alphaα: Smoothing constant (0<α<10 < \alpha < 10<α<1)
XtX_tXt: Actual value at time ttt
St−1S_{t-1}St−1: Smoothed value at time t−1t-1t−1
3. Choice of α\alphaα:
o Small α\alphaα: Smoother trend, less sensitive to recent changes.
o Large α\alphaα: More sensitive to recent changes.
2. Using MS-Excel
Dataset
Steps in Excel
Discussion
Practice Problem
Use the following dataset to calculate the trend using exponential smoothing (α=0.2\alpha =
0.2α=0.2):
Month Demand
Jan 200
Feb 220
Mar 250
Apr 280
May 300