Econometrics Report, Salvatore Ingenito
Econometrics Report, Salvatore Ingenito
Salvatore Ingenito
Economics with Data Science BSc
2
Executive Summary
The purpose of this analysis is to find a model that represents the demand for fish at the fishery
Fresh-Fish in Whitstable, Kent, UK. The given dataset is a time-series containing ninety-seven
observations.
With this report, the fishery aims to obtain enough information on whether expanding its business
would be profitable
The demand results to be inelastic, meaning that customers would keep purchasing given units of
fish despite an increase in price. Also, the model is formulated using the logarithmic form to reduce
the noise in the data.
The results show that an increase in the average price may reduce demand as well as the speed of the
wind. Also, consumers prefer to purchase fish in the weekend rather than the weekdays.
3
Section 1, Introduction
The report is based on data collected by the fishery Fresh-Fish in Whitstable, UK over 97 days. The data
includes information on total fish sold, prices for different customer groups, days of the week, minimum and
maximum wind speed, and average maximum wave height. The purpose of the report is to determine
whether expanding the business to other locations would be beneficial by understanding the factors that
affect fish demand.
4000.0 1.0
Price
3000.0
2000.0 0.5
1000.0
0.0 0.0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Days
QTYW PRCW
Price
4000.0 1.0
2000.0 0.5
0.0 0.0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Days
QTYA PRCA
4
Comparison between average price and total quantity sold
12000.0 2.0
1.8
10000.0 1.6
8000.0 1.4
Quantity 1.2
6000.0 1.0
Price
0.8
4000.0 0.6
2000.0 0.4
0.2
0.0 0.0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Days
TOTQTY AVRPRC
The logarithmic relationship appears to be less noisy than the linear one.
The correct specification of the final model, model 6, is confirmed by a RESET test.
5
2 2
R2−R 1
Number of new regressors ( 0.227−0.225 )
F RESET = 2
= =0.227
1−R 2 1−0.227
n− parameters∈equ 2 91
In this case, since the double-logarithmic form has been adopted, the coefficients correspond to the elasticity.
δY X
This is also given by the ε = ∙ :
δX Y
The models indicate that a 1% increase in price would result in a 0.241 decrease in demand by Caucasian
customers, a 0.527 decrease in demand by Asian customers, and a 0.493 decrease in demand for all of the
fishery's customers.
It is now possible to complete a t-test for each model. The hypotheses are the following: the null hypothesis
says that the variable isn’t statistically significant ( β=0 ¿ , whereas the alternative hypothesis says that the
variable is not rejected and, therefore, statistically significant ( β ≠ 0 ¿ .
^β−β
The formula used is TSt =
s . e( β^ )
The TS for the first model is -1.039, for the second - 2.522, whereas for the third is -2.636. The critical value
a 0.05
is given by t n−k . In this case t 96 , using two-tail probability, gives the critical value of 1.984. This means that
only models two and three are statistically significant since TS is higher that CV and the null hypothesis is
rejected. Instead, in the first model, the t-statistic is unable to reject the null hypothesis, meaning that the
variable is not statistically relevant. It also means that it is not useful in explaining or predicting the outcome
of interest, and therefore should not be included in the model. However, it's important to note that the lack of
statistical significance does not necessarily imply the variable has no effect on the outcome, it may be that
the sample size is not large enough to detect it or that the variable is correlated with other variables in the
model, and therefore it's hard to isolate its effect.
6
Finally, the R square is relatively low in all the models. In the first model, only 10.6% of total variation in
fish demand is explained by the price(the remaining 90% of variation in fish demand is given by unobserved
factors in the error term), in the second 6.3% and in the third 6.8%.
We now move to the model that tries to explain any relationship between fish demand and the days of the
week. The fitted model is the following:
ltotqty t=8.389−0.317 mont −0.679 tuest −0.558 wed t +0.018 thurs t +ε t Model 4
In this case, the coefficients can be interpreted as the difference in the mean value of the response variable
between the two levels of the categorical variable represented by the dummy variable. The reference level,
which is the level of the categorical variable that has been left out as a baseline or reference point, is usually
set to 0. The interpretation of the coefficients of other levels of categorical variable is the difference between
the level of categorical variable and reference level.
Dummy variables are often employed in regression analyses to control for or account for certain categorical
variables that may have an effect on the outcome of interest. By including dummy variables in a regression
model, researchers are able to estimate the effect of a categorical variable while controlling for other
variables in the model. Whereas, the economic significance of dummies can be evaluated by assessing the
magnitude of the coefficient estimate and its effect on the outcome of interest.
The statistical significance using a t-test is the following:
Hypothesis: H 0 : β=0 ; H 1 : β ≠ 0
Probability used: two-tail
F Test:
Hypothesis: H 0 : β=0 ; H 1 ≠ 0
0.126
( )
2
R 4
k−1 1−0.126
F Test = = =3.326
1−R 2
92
n−k
The critical value is 2.46. FS > CV, therefore, at least one variable is statistically significant.
Finally, the R square is 0.14, meaning that 14% of the total variation in fish demand is explained by specific
days of the week.
An additional model is formulated:
ltotqt y t=11.044 −0.756 lspeed 3t −0.120 lspeed 2 t−0.288 lwave 2t +0.033 l wave 3t + ε t Model 5
In this case, we are trying to explain if there may be any correlation between weather condition and fish
demand.
The statistical significance of the variables is shown in the following table:
7
Variable Coefficient T Statistics Critical Value Conclusion
Lspeed3 -0.756 Ab. Val 2.407 1.984 CV <TS , H 0 is rejected
The variable is
statistically significant.
Lspeed2 -0.120 Ab. Val 0.462 1.984 CV >TS , H 0 is not rejected
The variable is not
statistically significant.
Lwave2 -0.288 Ab. Val 1.048 1.984 CV >TS , H 0 is not rejected
The variable is not
statistically significant.
Lwave3 0.033 Ab. Val 0.120 1.984 CV >TS , H 0 is not rejected
The variable is not
statistically significant.
For F(40,39) at 5% confidence level, the CV is 1.69. FS < CV, meaning that the null hypothesis is not
rejected, thus the model is not subject to heteroskedasticity.
Finally, I want to check whether there are any structural breaks in the data, using a Chow Test.
Hypothesis: H 0 :a1=b1 , a2=b 2 , a3 =b3 , a4 =b 4 ; H 1 :a 1 ≠ b1 , a2 ≠ b 2 , a 3 ≠ b3 , a 4 ≠ b 4
RSS R−RSS1−RSS 2
k
F Chow = =0.921
RSS1+ RSS2
n−2 k
8
The CV is given by F(5,87), which is 2.31. In this case, when dividing the data in half, there are no structural
breaks since CV > FS, meaning that the null hypothesis is not rejected.
Section 5, Conclusions
The demand for fish in Whitstable, UK was found to be inelastic regardless of the ethnicity of customers.
Also, the double-logarithmic functional form has been used to reduce the noise in the data. The model was
determined to be homoscedastic and without structural breaks in the time-series. However, for a more
reliable analysis, the fishery should collect observations over a longer period of time and consider other
factors such as household income and customer socio-demographic characteristics.
9
Appendix
Model 1
Model 2
Model 3
10
Model 4
Model 5
11
Model 6
12
The Dataset
13