Hypothesis Testing and Regression Modelling
Hypothesis Testing and Regression Modelling
Overview:
Rosie Reeves is an entrepreneurial middle-school student who sells homemade cold drinks from a stand
during the summer months. This summer, Rosie sold two flavors of drink (lemon and orange) at two
different locations (the beach and the park); and to promote her drinks stall, she distributed leaflets in
the local area. Rosie recorded details of her sales and leaflet distribution in a text file, along with a note
of the temperature each day.
Analysis:
Removed any duplicate rows in the data.
Resolve any missing values in the data, either by interpolating missing values that are in an obvious
sequence, or by entering the average value for the column (rounded to the nearest whole number) into
any cells where a value is missing.
Added Derived Columns
1. Added a column named Sales to the table, in which the total sales of lemon and orange is calculated.
2. Added a column named Revenue to the table, in which the sales revenue is calculated by multiplying
the total sales and price.
Used conditional formatting to identify the top 10% and bottom 10% temperatures.
Charts
A line chart that shows Date and Revenue. Then add a linear trendline to determine whether
the revenue trend is rising, falling, or staying level.
1. A scatter-plot chart that shows Leaflets on the X axis and Sales on the Y axis, and determine
whether any signs of linear correlation can be detected.
3. A histogram that shows Revenue distributed into 10 bins, and note whether the distribution of this
data is normal, left skewed, or right skewed.
Descriptive Statistics:
Correlation:
Temperature Leaflets Price Sales
Temperature 1
Leaflets 0.287209 1
Price -0.03357 0.03204 1
-
Sales 0.466616 0.843905 0.29257 1
Compare Sales of Lemon and Orange Flavors
Lemon Orange
Mean 116.5806452 80.35483871
Variance 683.1182796 489.7698925
Observations 31 31
Pooled Variance 586.444086
Hypothesized Mean Difference 0
df 60
t Stat 5.889393952
P(T<=t) one-tail 9.39311E-08
t Critical one-tail 1.670648865
P(T<=t) two-tail 1.87862E-07
t Critical two-tail 2.000297822
Prediction in Excel:
Regression
1. Performed regression using Sales as the Y input and Temperature, Leaflets, and Price as the X
input.
Regression Statistics
0.928895498
R Square 0.862846847
Adjusted R Square 0.847607608
Standard Error 18.82694583
Observations 31
ANOVA
Significance
df SS MS F F
Regression 3 60207.61595 20069.20532 56.62007362 8.95371E-12
Residual 27 9570.255016 354.4538895
Total 30 69777.87097
Predicted
Observation Sales Residuals Standard Residuals
1 150.4498328 13.55016724 0.758653608
2 155.8637872 9.136212807 0.511522898
3 179.9926666 7.007333376 0.392330121
4 182.0264713 50.97352872 2.853931674
5 258.3637611 18.63623886 1.043415155
6 182.9335594 -10.93355935 -0.612153644
7 266.4846928 -22.48469279 -1.258884338
8 227.0210381 -18.02103813 -1.008970986
9 246.5260934 -17.52609345 -0.981259773
10 261.5242824 -23.52428239 -1.317089405
Prediction Plot:
Overlapping of predicted values over actual values tells that are training model is good
Used the intercept and weights to calculate a predicted Y value for the following
features:
Temperature: 80
Leaflets: 110
Price: 0.35