0% found this document useful (0 votes)
58 views11 pages

Baylas - Linear Regression Analysis

The dataset contains two columns: number of hours students studied and the marks they got. There is a strong positive correlation of 0.976 between hours studied and scores. A linear regression model found that hours studied significantly predicts scores, with an R-squared value of 0.95. Plotting hours against scores and adding a linear regression line shows their strong linear relationship.

Uploaded by

Fatima Baylas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

Baylas - Linear Regression Analysis

The dataset contains two columns: number of hours students studied and the marks they got. There is a strong positive correlation of 0.976 between hours studied and scores. A linear regression model found that hours studied significantly predicts scores, with an R-squared value of 0.95. Plotting hours against scores and adding a linear regression line shows their strong linear relationship.

Uploaded by

Fatima Baylas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Student

Dataset
Study
Hours

Insights
Baylas, Fatima Joan B.
BSCS-2
CSPE 3100 : Data Science
Dataset
The data set contains two columns that is the number of hours
student studied and the marks they got.

himanshunakrani

kaggle datasets download -d


himanshunakrani/student-study-hours
library(readr)
install.packages("ggplot2")
library(ggplot2)

studstudyhour = read.csv("score.csv", sep=",")


summary(studstudyhour)
head(studstudyhour)

hours = studstudyhour[,"Hours"]
scores = studstudyhour[,"Scores"]

#plot(x,y)
plot(hours, scores, pch = 16, col = "blue")

#correlation of between x and y


cor(hours, scores)
#linear regresssion model
model = lm(scores~hours, data=studstudyhour)
summary(model)
abline(model)

#using ggplot
ggplot(data = studstudyhour,aes(x = hours,y = scores)) +
geom_point(colour = "black",size = 1.5) +
geom_smooth(method = "lm",se = FALSE,colour = "red",size = 0.8)
Insights

Insights
> cor(hours, scores)
[1] 0.9761907
#plot(x,y)
plot(hours, scores, pch = 16, col = "blue")
> model = lm(scores~hours, data=studstudyhour)
> summary(model)

Call:
lm(formula = scores ~ hours, data = studstudyhour)

Residuals:
Min 1Q Median 3Q Max
-10.578 -5.340 1.839 4.593 7.265
R-square value: 0.95

P-value: < 2.2e-16 Coefficients:


Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.4837 2.5317 0.981 0.337
hours 9.7758 0.4529 21.583 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.603 on 23 degrees of freedom


Multiple R-squared: 0.9529, Adjusted R-squared: 0.9509
F-statistic: 465.8 on 1 and 23 DF, p-value: < 2.2e-16
abline(model)
ggplot(data = studstudyhour,aes(x = hours,y = scores)) +
geom_point(colour = "black",size = 1.5) +
geom_smooth(method = "lm",se = FALSE,colour = "red",size = 0.8)

You might also like