Assignment # 4
Assignment # 4
PROBLEM # 1
1.
plot(Steepness,GranuleDiameter)
abline(lm(GranuleDiameter~Steepness))
2. Genearting the model to predict the Diameter of the Granule from steepness of
the beach:
model <-lm(GranuleDiameter~Steepness)
summary(model)
Call:
lm(formula = GranuleDiameter ~ Steepness)
Residuals:
Min 1Q Median 3Q Max
-0.12826 -0.02434 0.01307 0.02739 0.08950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.160913 0.030102 5.346 0.00107 **
Steepness 0.053061 0.006288 8.438 6.48e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
sqrt(0.9105)
[1] 0.9542012
plot(model)
PROBLEM # 2
> BirthRate <- c(54.5, 39.5, 61.2, 59.9, 41.1, 47, 25.8, 46.3, 69.1,
44.5, 55.7, 38.2, 39.1, 42.2, 44.6, 32.5, 43, 51, 58.1, 25.4, 35.4,
23.3, 34.8, 27.5, 64.7, 44.1, 36.4, 37, 53.9, 20, 26.8, 62.4, 29.5,
52.2, 27.2, 39.5, 58, 36.8, 31.6, 35.6, 53, 38, 54.3, 64.4, 36.8,
24.2, 37.6, 33, 45.5, 32.3, 39.9+ )
> plot(povertyPercentage,BirthRate)
> abline(lm(BirthRate~povertyPercentage))
2. Generating the model:
summary(model)
Call:
lm(formula = BirthRate ~ povertyPercentage)
Residuals:
Min 1Q Median 3Q Max
-19.0644 -7.4246 -0.4238 7.8092 15.5325
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.4172 3.7970 5.114 5.23e-06 ***
povertyPercentage 1.7665 0.2765 6.390 5.85e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> plot(model)
By examining the given dataset,graph and residual plot , I identified an outlier
which is an error because the percentage of people living in poverty in that
particular state will never be a negative value. The least value would be 0
percentage.
So, I am removing the negative percentage of people living in poverty in a
particular state(outlier) and the corresponding Teen Birth rate observation from the
Dataset and again generating a regression model.
> BirthRate <- c(54.5, 39.5, 61.2, 59.9, 41.1, 47, 25.8, 46.3, 69.1,
44.5, 55.7, 38.2, 39.1, 42.2, 44.6, 32.5, 43, 51, 58.1, 25.4, 35.4,
23.3, 34.8, 27.5, 64.7, 44.1, 36.4, 37, 53.9, 20, 26.8, 62.4, 29.5,
52.2, 27.2, 39.5, 58, 36.8, 31.6, 35.6, 53, 38, 54.3, 64.4, 36.8,
24.2, 37.6, 33, 45.5, 39.9)
> plot(PovertyPercentage,BirthRate)
> abline(lm(BirthRate~PovertyPercentage))
Generating the model:
> summary(model)
Call:
lm(formula = BirthRate ~ PovertyPercentage)
Residuals:
Min 1Q Median 3Q Max
-19.5956 -6.7355 -0.6259 7.5750 15.7252
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.7266 4.1482 3.791 0.000419 ***
PovertyPercentage 2.0224 0.2991 6.762 1.71e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> sqrt(0.4879)
[1] 0.6984984
> plot(model)
As a result of generating a regression model after removing the outlier from the
Dataset , I have seen that
1. Correlation coefficient ‘r’ is moderately strong, I.e., ‘r’ is somewhat nearest to
1.
2. The relationship is linear. This is confirmed by generating a
residual plot where we can see that residuals are normally
scattered around the 0 line.
3. In my opinion, the relationship is not causal as there is no causing factor for the
teen birth rate from the people living in poverty . For purposes of the course,
however I am assuming this relaion as causal and generating the linear regression
analysis.
Our Formula is :
Teen Birth rate ^ = 15.72 + 2.02 * percentage of people living in poverty.
Since there is no causing factor, we cannot predict the birth rate from the
percentage of people living in poverty. However, I have created the model for the
purpose of practice.