0% found this document useful (0 votes)
280 views

Hypothesis Testing and Regression Modelling

Rosie Reeves is a middle school student who sells lemonade and orangeade from stands at the beach and park during the summer. She recorded sales data over multiple days, including flavors sold, location, temperature, number of leaflets distributed, and price. The document describes cleaning and analyzing the data using Excel, including adding derived columns, descriptive statistics, correlations, comparisons of lemon and orange sales, and linear regression to predict total sales based on temperature, leaflets distributed, and price.

Uploaded by

api-355102227
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
280 views

Hypothesis Testing and Regression Modelling

Rosie Reeves is a middle school student who sells lemonade and orangeade from stands at the beach and park during the summer. She recorded sales data over multiple days, including flavors sold, location, temperature, number of leaflets distributed, and price. The document describes cleaning and analyzing the data using Excel, including adding derived columns, descriptive statistics, correlations, comparisons of lemon and orange sales, and linear regression to predict total sales based on temperature, leaflets distributed, and price.

Uploaded by

api-355102227
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Hypothesis Testing, Statistical Analysis and Prediction using Regression (EXCEL)

Overview:

Rosie Reeves is an entrepreneurial middle-school student who sells homemade cold drinks from a stand
during the summer months. This summer, Rosie sold two flavors of drink (lemon and orange) at two
different locations (the beach and the park); and to promote her drinks stall, she distributed leaflets in
the local area. Rosie recorded details of her sales and leaflet distribution in a text file, along with a note
of the temperature each day.

Analysis:
Removed any duplicate rows in the data.

Resolve any missing values in the data, either by interpolating missing values that are in an obvious
sequence, or by entering the average value for the column (rounded to the nearest whole number) into
any cells where a value is missing.
Added Derived Columns
1. Added a column named Sales to the table, in which the total sales of lemon and orange is calculated.
2. Added a column named Revenue to the table, in which the sales revenue is calculated by multiplying
the total sales and price.

Location Lemon Orange Temperature Leaflets Price Sales Revenue


Park 97 67 70 90 0.25 164 41
Park 98 67 72 90 0.25 165 41.25
Park 110 77 71 104 0.25 187 46.75
Beach 134 99 76 98 0.25 233 58.25
Beach 159 118 78 135 0.25 277 69.25
Beach 103 69 82 90 0.25 172 43
Beach 143 101 81 135 0.25 244 61
Beach 123 86 82 113 0.25 209 52.25
Beach 134 95 80 126 0.25 229 57.25
Beach 140 98 82 131 0.25 238 59.5
Beach 162 120 83 135 0.25 282 70.5
Beach 130 95 84 99 0.25 225 56.25
Beach 109 75 77 99 0.25 184 46
Beach 122 85 78 113 0.25 207 51.75
Beach 98 62 75 108 0.5 160 80
Beach 81 50 74 90 0.5 131 65.5
Beach 115 76 77 126 0.5 191 95.5
Park 131 92 81 122 0.5 223 111.5
Park 122 85 78 113 0.5 207 103.5
Park 71 42 70 109 0.5 113 56.5
Park 83 50 77 90 0.5 133 66.5
Park 112 75 80 108 0.5 187 93.5
Park 120 82 81 117 0.5 202 101
Park 121 82 82 117 0.5 203 101.5
Park 156 113 84 135 0.5 269 134.5
Park 176 129 83 158 0.35 305 106.75
Park 104 68 80 99 0.35 172 60.2
Park 96 63 82 90 0.35 159 55.65
Park 100 66 81 95 0.35 166 58.1
Beach 88 57 82 81 0.35 145 50.75
Beach 76 47 82 68 0.35 123 43.05

Used conditional formatting to identify the top 10% and bottom 10% temperatures.

Charts
A line chart that shows Date and Revenue. Then add a linear trendline to determine whether
the revenue trend is rising, falling, or staying level.
1. A scatter-plot chart that shows Leaflets on the X axis and Sales on the Y axis, and determine
whether any signs of linear correlation can be detected.

3. A histogram that shows Revenue distributed into 10 bins, and note whether the distribution of this
data is normal, left skewed, or right skewed.
Descriptive Statistics:

Leaflets Price Sales

Mean 109.1612903 Mean 0.358064516 Mean 196.9354839


Standard Error 3.559961572 Standard Error 0.020359243 Standard Error 8.661984113
Median 108 Median 0.35 Median 191
Mode 90 Mode 0.25 Mode 187
Standard Standard Standard
Deviation 19.82102718 Deviation 0.113355469 Deviation 48.22788646
Sample Variance 392.8731183 Sample Variance 0.012849462 Sample Variance 2325.929032
- - -
Kurtosis 0.081939332 Kurtosis 1.753135316 Kurtosis 0.323901019
Skewness 0.306154886 Skewness 0.33716585 Skewness 0.362277441
Range 90 Range 0.25 Range 192
Minimum 68 Minimum 0.25 Minimum 113
Maximum 158 Maximum 0.5 Maximum 305
Sum 3384 Sum 11.1 Sum 6105
Count 31 Count 31 Count 31

Correlation:
Temperature Leaflets Price Sales
Temperature 1
Leaflets 0.287209 1
Price -0.03357 0.03204 1
-
Sales 0.466616 0.843905 0.29257 1
Compare Sales of Lemon and Orange Flavors

Performed t-test to determine whether there is a statistically significant difference


between sales of Lemon and Orange drinks and found they are statistically
different

t-Test: Two-Sample Assuming Equal Variances

Lemon Orange
Mean 116.5806452 80.35483871
Variance 683.1182796 489.7698925
Observations 31 31
Pooled Variance 586.444086
Hypothesized Mean Difference 0
df 60
t Stat 5.889393952
P(T<=t) one-tail 9.39311E-08
t Critical one-tail 1.670648865
P(T<=t) two-tail 1.87862E-07
t Critical two-tail 2.000297822

Prediction in Excel:

Regression

1. Performed regression using Sales as the Y input and Temperature, Leaflets, and Price as the X
input.

Regression Statistics
0.928895498
R Square 0.862846847
Adjusted R Square 0.847607608
Standard Error 18.82694583
Observations 31
ANOVA
Significance
df SS MS F F
Regression 3 60207.61595 20069.20532 56.62007362 8.95371E-12
Residual 27 9570.255016 354.4538895
Total 30 69777.87097

Coefficients Standard Error t Stat P-value


Intercept -178.5719123 67.54600401 -2.643708016 0.013489598
Temperature 2.706977216 0.876830052 3.08723134 0.004634491
Leaflets 1.916846903 0.181217253 10.57761817 4.19907E-11
Price -131.9315256 30.36922822 -4.344250193 0.000177077

Lower 95% Upper 95% Lower 95.0% Upper 95.0%


- - - -
317.1648646 39.97895999 317.1648646 39.97895999
0.907870558 4.506083875 0.907870558 4.506083875
1.545019815 2.288673992 1.545019815 2.288673992
- - - -
194.2440348 69.61901634 194.2440348 69.61901634

Std regression coefficient Temp 0.229942117


Leaflets 0.787798872
-
Price 0.310093621
-
Intercept 178.5719123

Predicted
Observation Sales Residuals Standard Residuals
1 150.4498328 13.55016724 0.758653608
2 155.8637872 9.136212807 0.511522898
3 179.9926666 7.007333376 0.392330121
4 182.0264713 50.97352872 2.853931674
5 258.3637611 18.63623886 1.043415155
6 182.9335594 -10.93355935 -0.612153644
7 266.4846928 -22.48469279 -1.258884338
8 227.0210381 -18.02103813 -1.008970986
9 246.5260934 -17.52609345 -0.981259773
10 261.5242824 -23.52428239 -1.317089405
Prediction Plot:

Overlapping of predicted values over actual values tells that are training model is good
Used the intercept and weights to calculate a predicted Y value for the following
features:

Temperature: 80
Leaflets: 110
Price: 0.35

You might also like