Assignment (1)
Assignment (1)
Assignment On:
Data Analysis: Summary Statistics, Visualization, Correlation,
Regression, and Hypothesis Testing
WELCOME TO OUR PRESENTATION
Name Roll
Zihadul Islam Api 96
Mahir Abrar Hossain 98
Protiva Paul Diba 122
Md. Alif Hossain 94
Md. Arif Faisal Anik 92
Md. Saif Masnun 109
Jubaer Al Hasan Tanvir 85
Sadique Ahmed Shovon 2227
Assignment Tasks:
Categorical Variables:
The categorical variables we have in this dataset are
Gender,Race/Ethnicity,Parental level of Education,Test Preparation Course.
Summary Statistics and Key Insights
Math Score: Summary Statistics Value
R Programming:
Minimum 0.00
summary(StudentPerformance$math.score)
1st Quadrant 57.00
varianceMathScore=var(StudentPerformance$math.s
core) Median 66
varianceMathScore
Mean 66.09
StandardDeviationMathScore=sd(StudentPerformanc
e$math.score) 3rd Quadrant 77.00
StandardDeviationMathScore Maximum 100.00
library(moments)
Variance 229.919
SkewnessMathScore=moments::skewness(StudentPe
rformance$math.score) Standard Deviation 15.16308
SkewnessMathScore Skewness -0.2785166
KurtosisMathScore=moments::kurtosis(StudentPerfor
mance$math.score) Kurtosis 3.267597
KurtosisMathScore
Summary Statistics and Key Insights
Reading Score: Summary Statistics Value
varianceReadingScore=var(StudentPerformance$read Median 70
ing.score)
varianceReadingScore Mean 69.17
Minimum 10.00
R Programming:
1st Quadrant 57.75
summary(StudentPerformance$writing.score)
varianceWritingScore=var(StudentPerformance$writin Median 69.00
g.score)
Mean 68.05
varianceWritingScore
3rd Quadrant 79.00
StandardDeviationWritingScore=sd(StudentPerforman
ce$writing.score) Maximum 100.00
StandardDeviationWritingScore
Variance 230.908
SkewnessWritingScore=moments::skewness(Student
Performance$writing.score) Standard Deviation 15.19566
SkewnessWritingScore Skewness -0.2890096
KurtosisWritingScore=moments::kurtosis(StudentPerf
ormance$writing.score) Kurtosis 2.960808
KurtosisWritingScore
Frequency Table for Categorical Variables
Gender: R Programming:
table(StudentPerformance$gender)
Male Female
482 518
table(StudentPerformance$test.pr
eparation.course)
Test Preparation Course: table(StudentPerformance$race.et
hnicity)
Completed None table(StudentPerformance$parenta
358 642 l.level.of.education)
Frequency Table for Categorical Variables
Race/Ethnicity: Parental Level of Education:
• Female
Group Frequency Degree Frequency
Group A 89 Associate’s Degree 222
Group B 190 Bachelor’s Degree 118
Group C 319 Master’s degree 59
Group D 262 High School 196
Group E 140 Some College 226
Some High School 179
Data Visualization and Observation
R Programming:
#Histogram for Math Scores:
hist(StudentPerformance$math.score,
col="blue",
main="Distribution of Math Score",
xlab="Math Scores",ylab="Count")
Data Visualization and Observation
R Programming:
#Histogram for Reading Score
hist(StudentPerformance$reading.score,
col="gray",
main="Distribution of Reading
Score",
xlab="Reading
Scores",ylab="Count")
Data Visualization and Observation
R programming:
#Histogram for Writing Score
hist(StudentPerformance$writing.score,
col="pink",
main="Distribution of Writing Score",
xlab="Writing Scores",ylab="Count")
Data Visualization and Observation
R programming:
#Bar chart for Gender
barplot(table(StudentPerformance$gender),
col=c("skyblue","lightgreen"),
main="Distribution of Gender",
xlab="Gender",ylab="Count")
Data Visualization and Observation
R programming:
#Bar chart for Test Preparation Course
barplot(table(StudentPerformance$test.
preparation.course),
col=c("orange","cyan"),
main="Test Preparation Course
Completion",
xlab="Course Status",
ylab="Count")
Data Visualization and Observation
R Programming:
#Pie chart for Ethnicity
EthnicityFreq=table(StudentPerformance$race.ethnicity)
EthnicityLabels=paste0(names(EthnicityFreq),
"(",round(100*EthnicityFreq/sum(EthnicityFreq),1),"%)")
pie(EthnicityFreq,
labels=EthnicityLabels,
col=rainbow(length(EthnicityFreq)),
main="Distribution of Ethnicity")
Data Visualization and Observation
R programming:
#Pie chart for Parental Level of Education
EduFreq=table(StudentPerformance$parental.level.of.education)
EduLabels=paste0(names(EduFreq),
"(",round(100*EduFreq/sum(EduFreq),1),"%)")
pie(EduFreq,
labels=EduLabels,
col=rainbow(length(EduFreq)),
main="Distribution of Parental Level of Education")
R Programming:
library(dplyr)
NumericalVariables=StudentPerformance %>%
select(math.score,reading.score,writing.score)
Correlation CorrelationMatrix=cor(NumericalVariables,use="complete.obs")
print(CorrelationMatrix)
Analysis and math.score reading.score writing.score
R Programming:
StudentPerformance$gender=as.factor(StudentPerformance$gender)
StudentPerformance$race.ethnicity=as.factor(StudentPerformance$race.ethnicity)
StudentPerformance$parental.level.of.education=as.factor(StudentPerformance$parental.level.of.edu
cation)
StudentPerformance$test.preparation.course=as.factor(StudentPerformance$test.preparation.course)
RegressionModel=lm(math.score~reading.score+writing.score+gender+test.preparation.course,data=
StudentPerformance)
summary(RegressionModel)
Regression Model Estimation and Findings
Regression Coefficients:
Variable Estimate Std. Error t-value P-value