Statistics - Assignment
Statistics - Assignment
1. (a)
Reasons of disagreement with Jacks comments :
Though the R-squared value is low, we can still get some explanation from the other
variable
The coefficients of other predictor variables could also have significant statistically
implications
Other response variable could also indicate the trend and provide good prediction
Some precision in the prediction could be affected
Putney 1355 32 2 1 1
(b) Putney is expected to obtain higher returns compared to Bob as the Return values for
Putney are definitely higher compared to Bob.
3. (a) As seen above, at 95% CI ( 5% significance level), we see that SAT and GRI show significant
effect on the RET.
If Bob would have attended Princeton, probably his SAT score could have been 1355.
His RET would be better, at 3.0189 compared to current RET of 1.2287
(b) At 10 % significance level, GRI and SAT show significant effect on the RET.
While managing Growth fund instead of Growth and Income, Bob’s return would be:
4. (a)
The coefficient of MBA = 18.1 % and has negative impact
Compared to other coefficients this is high %, but on its own - not very high.
It does indicates that managers having an MBA degree would perform less than those who do
not have. Other factors like Age, Tenure, SAT score are constant.
(b) No, non-MBA managers are not taking more risks comparatively, to get higher returns. The
coefficient would not be negative in the table.
5. (a)
As per the regression, the p value coefficient for Age = 10.005 %
(b)
As per the regression, Age has a negative effect. Hence probability of younger managers
delivering better return is high.
Surely, the survivor bias would influence / dampen the affect seen in 5 (a)
6. (a) As MBA and Tenure are not significant at 15 % significance level, eliminating them
Linearity: In the Residual vs fitted plot below the Red line almost parallel to axis, indicating
linearity
Homoskedasticity: Also, there is very minimal variation for residuals vs fitted. Hence our
assumption of constant variance in the residuals is correct.
i. Age negative effect on Returns. Here the significance is lower = 0.01 compared to
original 0.106
ii. It is more impactful with the 5 variables
7. (a)
As per the R code output, Growth funds have higher Returns by 2.312 % compared to Growth
and Income funds
Also seen is that variation in the Residuals is constant about the fitted line. There is no trend.
Therefore homoskedasticity is proved.
(b) T-test using excel
GFund GIFund
Mean 0.395924 -1.91593
Variance 79.86433 57.7226
Observations 327 213
Pooled Variance 71.13933
Hypothesized Mean Difference 0
df 538
t Stat 3.112946
P(T<=t) one-tail 0.000975
t Critical one-tail 1.647691
P(T<=t) two-tail 0.001951
t Critical two-tail 1.964383
Null hypothesis:
(c)
There is 0.5944 probability (59.44%) of mean return of the funds to be greater than 1.5 % of the
bench mark.
9. Larger sample implies more Degrees of Freedom. The Residual error reduces, precision for
prediction improves. Overall this leads to more efficiency in capturing variance in response
variable and prediction.
10.
Here with GRI as predictor, and regression on AGE, indicates GRI significance = 0.05
Age = 43.2446+1.732*GRI
This implies that if the GRI is 0 for Growth funds, and 1 for Growth and Income funds, the
average Age of fund managers managing Growth and Income funds increases by 1.732. (All
other variables kept constant)
(b)
With GRI, TENURE, SAT constant; and GRI=0; MBA=0; TENURE=0; SAT=0
Age = 32.186 + (1.424*GRI) + (-1.879*MBA) + (0.9424*TENURE) + (0.0077*SAT)
Average Age of Fund Manager with MBA = 30.307
Observation Age of Managers with MBA is less compared to who did not by 1.879 Years.
Constants are : SAT, TENURE
11.
As seen in above regression, excluding MBA as it is not significant at 80% CI
Also Tenure and Age has negative impact on Return
SAT has positive impact on Return
CONCLUSION:
Ms. Putney is the right choice for selection and is expected to deliver at a higher rate of
Return.
Q2. Nano Project
Identify a small problem related to day today work, in which you want to either understand the
relationship between two variables or want to predict one of the variable. Either case formulate
your problem which you want to attack. Collect the necessary dataset, to answer the question.
Apply tools and techniques discusses in the class (Regression Analysis). You have to discuss the
results both in statistical and business framework.
Please submit problem description, data description, R file used to analyse the data, along with
results and discussion. You may write the problem description, results, and discussion on a paper
and submit scanned copy of it. But you have to submit data description and data file (in excel or
csv or txt file) along with running R code.
Solution:
Driving from Hyderabad to Pune and from Pune to Hyderabad.
Road conditions
Traffic conditions
Driver gender
Stops taken on the route
While driving to and fro, I have collected data for the below metrics :
Running the regression, by dropping the co-efficients that are not significant
Conclusion:
The Plots for the Distance travelled are linear.
Average speed and stops are significant.