BSRM Final Assignment
BSRM Final Assignment
Given Data:
o Sample size (n): 100
o Number of successes (x): 62
o Hypothesized population proportion (p0): 0.50
Calculate the Sample Proportion (p^):
p0 = 0.50
This test statistic quantifies how far the observed sample proportion (0.62)
is from the hypothesized population proportion (0.50). In this case, a Z-
value of 2.4 indicates that the sample proportion is 2.4 standard errors
away from the hypothesized proportion.
P-Value = 1 − pnorm(Z)
Logic - The p-value is the probability that the observed test statistic (or
one more extreme) would occur if the null hypothesis were true. For a
one-tailed test where we are testing if the sample proportion is greater
than the hypothesized proportion, the p-value can be found using the
cumulative distribution function (CDF) of the standard normal distribution.
Mathematically, the p-value for a one-tailed test is:
In simpler terms, it quantifies how likely your sample results are given the
null hypothesis. When the p-value is very small (less than 0.05), it
indicates that such extreme results are highly unlikely under the null
hypothesis. This small area beyond the test statistic suggests that the
observed sample proportion significantly deviates from what we would
expect if the null hypothesis were true.
= 0.62 ± 1.645*0.0485
= 0.62 ± 0.0798
= 0.62 ± 1.96*0.0485
= 0.62 ± 0.0951
= 0.62 ± 2.576*0.0485
= 0.62 ± 0.1249
These intervals provide ranges within which the true population proportion
is likely to fall, with the specified level of confidence.
Comparison:
Answer 3.
(a) Let, MonthlyExpenditure = Y
Age = X1
Income = X2
HoursOfExcercise = X3
ScreenTime = X4
(b) From the given data and using Rstudio, the linear regression model of monthly
expenditure on fitness on age, income, hours of exercise and screen-time is as
follows:-
Y= -0.93891 + 1.82695*X1 + 1.70428*X2 - 0.06169*X3 + 0.12477*X4
(c). The p-values obtained are as follows:
For X1, p-value = 2.02x10-08
For X2, p-value < 2x10-16
For X3, p-value = 0.378
For X4, p-value = 0.039
(d). The ‘R’ screenshot for reference is as follows:
(e) Now, from the teachings in the class, we know that a p-value measures the
probability of obtaining the observed results, assuming that the null hypothesis is
true. The lower the p-value, the greater the statistical significance of the observed
difference. A p-value of 0.05 or lower is generally considered statistically significant.
For X1, p-value = 2.02x10-08 and it is < 0.05
For X2, p-value < 2x10-16 and it is < 0.05
For X3, p-value = 0.378 and it is > 0.05
For X4, p-value = 0.039 and it is < 0.05
Thus, from the above obtained p-values, we can draw following conclusion regarding
significance of predictors:-
(i) The age and monthly income (p-value of both is very less than 0.05) of
the individuals are highly significant predictors of their monthly expenditure on
gym. There is strong relationship between the dependent variable monthly
expenditure and independent variables age and monthly income.
(ii) The p-value of predictor viz. hours of exercise is greater than 0.05 and
hence it is insignificant and not a reliable predictor of monthly expenditure of the
individuals.
(iii) The p-value of screen time is less than 0.05 and it bears a significant
relation with monthly expenditure of the individuals. Hence, it is a reliable
predictor of monthly expenditure of the individuals.