0% found this document useful (0 votes)
3 views

Assignment # 4

Uploaded by

7013222908s
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment # 4

Uploaded by

7013222908s
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

ASSIGNMENT # 4

PROBLEM # 1

1.

1.Generating the graph using R

> Steepness <- c(0.629999995, 0.699999988, 0.819999993, 0.879999995,


1.149999976, 1.5, 4.400000095, 7.300000191, 11.30000019+ )

> GranuleDiameter <- c(0.170000002, 0.189999998, 0.219999999,


0.234999999, 0.234999999, 0.300000012, 0.349999994, 0.419999987,
0.850000024+ )

plot(Steepness,GranuleDiameter)

> plot(Steepness,GranuleDiameter, main="Steepness vs Granule


Diameter")

>plot(Steepness,GranuleDiameter, main="Steepness vs Granule


Diameter" , xlab="steepness of the beach" , ylab= "Diameter of
Granule")

abline(lm(GranuleDiameter~Steepness))

2. Genearting the model to predict the Diameter of the Granule from steepness of
the beach:

model <-lm(GranuleDiameter~Steepness)
summary(model)

Call:
lm(formula = GranuleDiameter ~ Steepness)

Residuals:
Min 1Q Median 3Q Max
-0.12826 -0.02434 0.01307 0.02739 0.08950

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.160913 0.030102 5.346 0.00107 **
Steepness 0.053061 0.006288 8.438 6.48e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06739 on 7 degrees of freedom


Multiple R-squared: 0.9105, Adjusted R-squared: 0.8977
F-statistic: 71.2 on 1 and 7 DF, p-value: 6.475e-05

sqrt(0.9105)
[1] 0.9542012

Inorder to support my belief that the relationship is linear, I


am creating a residual plot where I can confirm the relationship is
linear.

plot(model)

By seeing the summary of the model, we have met certain assumptions


which are necessary to do linear regression analysis.They are
1. Correlation coefficient ‘r’ is strong because ‘r’ value is
nearest to 1.
2. The relationship is linear. This is confirmed by generating
a residual plot where we can see that residuals are normally
scattered around the 0 line.
3. The relaionship is causal.

As we met certain assumptions, we can now generate a formula where we can


predict the granule size from the steepness of the beach.
The Formula is :
Granule Diameter ^ = 0.161 + 0.054 * steepness of the beach

PROBLEM # 2

1. Generating the graph


povertyPercentage <- c(20.1, 7.1, 16.1, 14.9, 16.7, 8.8, 9.7,
10.3, 22, 16.2, 12.1, 10.3, 14.5, 12.4, 9.6, 12.2, 10.8, 14.7, 19.7,
11.2, 10.1, 11, 12.2, 9.2, 23.5, 9.4, 15.3, 9.6, 11.1, 5.3, 7.8,
25.3, 16.5, 12.6, 12, 11.5, 17.1, 11.2, 12.2, 10.6, 19.9, 14.5,
15.5, 17.4, 8.4, 10.3, 10.2, 12.5, 16.7, -1.5, 12.2 )

> BirthRate <- c(54.5, 39.5, 61.2, 59.9, 41.1, 47, 25.8, 46.3, 69.1,
44.5, 55.7, 38.2, 39.1, 42.2, 44.6, 32.5, 43, 51, 58.1, 25.4, 35.4,
23.3, 34.8, 27.5, 64.7, 44.1, 36.4, 37, 53.9, 20, 26.8, 62.4, 29.5,
52.2, 27.2, 39.5, 58, 36.8, 31.6, 35.6, 53, 38, 54.3, 64.4, 36.8,
24.2, 37.6, 33, 45.5, 32.3, 39.9+ )

> plot(povertyPercentage,BirthRate)

> plot(povertyPercentage,BirthRate, main="POVERTY VS BIRTH RATE ")

> plot(povertyPercentage,BirthRate, main="POVERTY VS BIRTH RATE ",


xlab= "percentage of people living in poverty",ylab="Teen Birth
Rate")

> abline(lm(BirthRate~povertyPercentage))
2. Generating the model:

model <- lm(BirthRate~povertyPercentage)

summary(model)

Call:
lm(formula = BirthRate ~ povertyPercentage)

Residuals:
Min 1Q Median 3Q Max
-19.0644 -7.4246 -0.4238 7.8092 15.5325

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.4172 3.7970 5.114 5.23e-06 ***
povertyPercentage 1.7665 0.2765 6.390 5.85e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.19 on 49 degrees of freedom


Multiple R-squared: 0.4545, Adjusted R-squared: 0.4434
F-statistic: 40.83 on 1 and 49 DF, p-value: 5.851e-08

> plot(model)
By examining the given dataset,graph and residual plot , I identified an outlier
which is an error because the percentage of people living in poverty in that
particular state will never be a negative value. The least value would be 0
percentage.
So, I am removing the negative percentage of people living in poverty in a
particular state(outlier) and the corresponding Teen Birth rate observation from the
Dataset and again generating a regression model.

Generating the graph after removal of the outlier:


PovertyPercentage <- c(20.1, 7.1, 16.1, 14.9, 16.7, 8.8, 9.7, 10.3,
22, 16.2, 12.1, 10.3, 14.5, 12.4, 9.6, 12.2, 10.8, 14.7, 19.7, 11.2,
10.1, 11, 12.2, 9.2, 23.5, 9.4, 15.3, 9.6, 11.1, 5.3, 7.8, 25.3,
16.5, 12.6, 12, 11.5, 17.1, 11.2, 12.2, 10.6, 19.9, 14.5, 15.5,
17.4, 8.4, 10.3, 10.2, 12.5, 16.7, 12.2 )

> BirthRate <- c(54.5, 39.5, 61.2, 59.9, 41.1, 47, 25.8, 46.3, 69.1,
44.5, 55.7, 38.2, 39.1, 42.2, 44.6, 32.5, 43, 51, 58.1, 25.4, 35.4,
23.3, 34.8, 27.5, 64.7, 44.1, 36.4, 37, 53.9, 20, 26.8, 62.4, 29.5,
52.2, 27.2, 39.5, 58, 36.8, 31.6, 35.6, 53, 38, 54.3, 64.4, 36.8,
24.2, 37.6, 33, 45.5, 39.9)

> plot(PovertyPercentage,BirthRate)

> plot(PovertyPercentage,BirthRate , main="POVERTY VS BIRTH RATE")

> plot(PovertyPercentage,BirthRate , main="POVERTY VS BIRTH RATE",


xlab="Percentage of people living in poverty", ylab="Teen Birth
Rate")

> abline(lm(BirthRate~PovertyPercentage))
Generating the model:

> model <- lm(BirthRate~PovertyPercentage)

> summary(model)

Call:
lm(formula = BirthRate ~ PovertyPercentage)

Residuals:
Min 1Q Median 3Q Max
-19.5956 -6.7355 -0.6259 7.5750 15.7252

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.7266 4.1482 3.791 0.000419 ***
PovertyPercentage 2.0224 0.2991 6.762 1.71e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.937 on 48 degrees of freedom


Multiple R-squared: 0.4879, Adjusted R-squared: 0.4772
G-statistic: 45.72 on 1 and 48 DF, p-value: 1.705e-08

> sqrt(0.4879)
[1] 0.6984984

> plot(model)
As a result of generating a regression model after removing the outlier from the
Dataset , I have seen that
1. Correlation coefficient ‘r’ is moderately strong, I.e., ‘r’ is somewhat nearest to
1.
2. The relationship is linear. This is confirmed by generating a
residual plot where we can see that residuals are normally
scattered around the 0 line.

3. In my opinion, the relationship is not causal as there is no causing factor for the
teen birth rate from the people living in poverty . For purposes of the course,
however I am assuming this relaion as causal and generating the linear regression
analysis.

Our Formula is :
Teen Birth rate ^ = 15.72 + 2.02 * percentage of people living in poverty.

Since there is no causing factor, we cannot predict the birth rate from the
percentage of people living in poverty. However, I have created the model for the
purpose of practice.

You might also like