0% found this document useful (0 votes)

3 views27 pages

Assignment (1)

The assignment focuses on data analysis using a dataset from Kaggle related to student performance, covering summary statistics, visualizations, correlations, regression modeling, and hypothesis testing. Key findings include strong correlations between scores in different subjects and significant regression coefficients indicating the impact of reading and writing scores on math scores. The analysis concludes that all tested relationships are statistically significant.

Uploaded by

abrarmahir818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views27 pages

Assignment (1)

Uploaded by

abrarmahir818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Jahangirnagar University

Department of Statistics and Data Science

Course Title: Econometrics
Course Code: STAT-305
Submitted to: Md Moyazzem Hossain,PhD
Professor
Department of Statistics and Data Science
Jahangirnagar University

Submitted by: Group-10 (Roll:85,92,94,96,98,109,122)

Assignment On:
Data Analysis: Summary Statistics, Visualization, Correlation,
Regression, and Hypothesis Testing
WELCOME TO OUR PRESENTATION

Name Roll
Zihadul Islam Api 96
Mahir Abrar Hossain 98
Protiva Paul Diba 122
Md. Alif Hossain 94
Md. Arif Faisal Anik 92
Md. Saif Masnun 109
Jubaer Al Hasan Tanvir 85
Sadique Ahmed Shovon 2227
Assignment Tasks:

1. Download a dataset from Kaggle.

2. Compute the summary statistics of the available
variables. Make comments.
3. Visualize the variables using the appropriate
charts/graphs. Make comments
4. Compute the correlations of the variables. Comments on
your results.
5. Estimate a regression model and interpret your findings.
6. Test the hypothesis about your calculated correlation
coefficient and regression coefficients.
Dataset Acquisition from Kaggle
The dataset we used for this analysis R Programming:
is the “Student Performance StudentPerformance=read.csv("D:\
Dataset”. \Stat 3rd year\\Stat 305
Assignment\\StudentsPerformanc
The dataset can be accessed and e.csv")
downloaded from Kaggle at the head(StudentPerformance)
following link:
https://www.kaggle.com/datasets/
spscientist/students-performance-
in-exams
Variables In The Dataset
Numerical Variables:
The numerical variables in this dataset are Math Score,Reading Score and
Writing Score.

Categorical Variables:
The categorical variables we have in this dataset are
Gender,Race/Ethnicity,Parental level of Education,Test Preparation Course.
Summary Statistics and Key Insights
Math Score: Summary Statistics Value
R Programming:
Minimum 0.00
summary(StudentPerformance$math.score)
1st Quadrant 57.00
varianceMathScore=var(StudentPerformance$math.s
core) Median 66
varianceMathScore
Mean 66.09
StandardDeviationMathScore=sd(StudentPerformanc
e$math.score) 3rd Quadrant 77.00
StandardDeviationMathScore Maximum 100.00
library(moments)
Variance 229.919
SkewnessMathScore=moments::skewness(StudentPe
rformance$math.score) Standard Deviation 15.16308
SkewnessMathScore Skewness -0.2785166
KurtosisMathScore=moments::kurtosis(StudentPerfor
mance$math.score) Kurtosis 3.267597

KurtosisMathScore
Summary Statistics and Key Insights
Reading Score: Summary Statistics Value

R Programming: Minimum 17.00

summary(StudentPerformance$reading.score) 1st Quadrant 59.00

varianceReadingScore=var(StudentPerformance$read Median 70
ing.score)
varianceReadingScore Mean 69.17

StandardDeviationReadingScore=sd(StudentPerforma 3rd Quadrant 79.00

nce$reading.score)
Maximum 100.00
StandardDeviationReadingScore
SkewnessReadingScore=moments::skewness(Studen Variance 213.1656
tPerformance$reading.score)
Standard Deviation 14.60019
SkewnessReadingScore
Skewness -0.2587157
KurtosisReadingScore=moments::kurtosis(StudentPer
formance$reading.score) Kurtosis 2.926081
KurtosisReadingScore
Summary Statistics and Key Insights
Writing Score: Summary Statistics Value

Minimum 10.00
R Programming:
1st Quadrant 57.75
summary(StudentPerformance$writing.score)
varianceWritingScore=var(StudentPerformance$writin Median 69.00
g.score)
Mean 68.05
varianceWritingScore
3rd Quadrant 79.00
StandardDeviationWritingScore=sd(StudentPerforman
ce$writing.score) Maximum 100.00
StandardDeviationWritingScore
Variance 230.908
SkewnessWritingScore=moments::skewness(Student
Performance$writing.score) Standard Deviation 15.19566
SkewnessWritingScore Skewness -0.2890096
KurtosisWritingScore=moments::kurtosis(StudentPerf
ormance$writing.score) Kurtosis 2.960808

KurtosisWritingScore
Frequency Table for Categorical Variables
Gender: R Programming:
table(StudentPerformance$gender)
Male Female
482 518
table(StudentPerformance$test.pr
eparation.course)
Test Preparation Course: table(StudentPerformance$race.et
hnicity)
Completed None table(StudentPerformance$parenta
358 642 l.level.of.education)
Frequency Table for Categorical Variables
Race/Ethnicity: Parental Level of Education:
• Female
Group Frequency Degree Frequency
Group A 89 Associate’s Degree 222
Group B 190 Bachelor’s Degree 118
Group C 319 Master’s degree 59
Group D 262 High School 196
Group E 140 Some College 226
Some High School 179
Data Visualization and Observation
R Programming:
#Histogram for Math Scores:
hist(StudentPerformance$math.score,
col="blue",
main="Distribution of Math Score",
xlab="Math Scores",ylab="Count")
Data Visualization and Observation
R Programming:
#Histogram for Reading Score
hist(StudentPerformance$reading.score,
col="gray",
main="Distribution of Reading
Score",
xlab="Reading
Scores",ylab="Count")
Data Visualization and Observation
R programming:
#Histogram for Writing Score
hist(StudentPerformance$writing.score,
col="pink",
main="Distribution of Writing Score",
xlab="Writing Scores",ylab="Count")
Data Visualization and Observation
R programming:
#Bar chart for Gender
barplot(table(StudentPerformance$gender),
col=c("skyblue","lightgreen"),
main="Distribution of Gender",
xlab="Gender",ylab="Count")
Data Visualization and Observation
R programming:
#Bar chart for Test Preparation Course
barplot(table(StudentPerformance$test.
preparation.course),
col=c("orange","cyan"),
main="Test Preparation Course
Completion",
xlab="Course Status",
ylab="Count")
Data Visualization and Observation
R Programming:
#Pie chart for Ethnicity
EthnicityFreq=table(StudentPerformance$race.ethnicity)
EthnicityLabels=paste0(names(EthnicityFreq),

"(",round(100*EthnicityFreq/sum(EthnicityFreq),1),"%)")
pie(EthnicityFreq,
labels=EthnicityLabels,
col=rainbow(length(EthnicityFreq)),
main="Distribution of Ethnicity")
Data Visualization and Observation
R programming:
#Pie chart for Parental Level of Education
EduFreq=table(StudentPerformance$parental.level.of.education)
EduLabels=paste0(names(EduFreq),
"(",round(100*EduFreq/sum(EduFreq),1),"%)")
pie(EduFreq,
labels=EduLabels,
col=rainbow(length(EduFreq)),
main="Distribution of Parental Level of Education")
R Programming:
library(dplyr)
NumericalVariables=StudentPerformance %>%
select(math.score,reading.score,writing.score)

Correlation CorrelationMatrix=cor(NumericalVariables,use="complete.obs")
print(CorrelationMatrix)
Analysis and math.score reading.score writing.score

Interpretation math.score 1.0000000

reading.score 0.8175797
0.8175797
1.0000000
0.8026420
0.9545981
writing.score 0.8026420 0.9545981 1.0000000

Here we see that,the variables are strongly correlated with

each other.The strongest correlation is in between Reading
Score and Writing Score.
Regression Model Estimation and Findings

R Programming:
StudentPerformance$gender=as.factor(StudentPerformance$gender)
StudentPerformance$race.ethnicity=as.factor(StudentPerformance$race.ethnicity)
StudentPerformance$parental.level.of.education=as.factor(StudentPerformance$parental.level.of.edu
cation)
StudentPerformance$test.preparation.course=as.factor(StudentPerformance$test.preparation.course)
RegressionModel=lm(math.score~reading.score+writing.score+gender+test.preparation.course,data=
StudentPerformance)
summary(RegressionModel)
Regression Model Estimation and Findings
Regression Coefficients:
Variable Estimate Std. Error t-value P-value

(Intercept) -10.904 1.131 -9.642 <2e-16***

Reading Score 0.298 0.044 6.750 <2.51e-11***

Writing Score 0.698 0.044 15.756 <2e-16***

Gender (Male) 13.633 0.397 34.303 <2e-16***

Test Preparation 3.582 0.420 8.535 <2e-16***

Course (None)
Regression Model Estimation and Interpretation

Thus,the model is given below:

Math Score=-10.904+0.298*Reading Score+0.698*Writing Score+13.633* Gender
(Male)+3.582*Test Preparation Course (None)
Interpretation:
Here,all the variables are statistically significant since p<0.005.
If we increase one unit in reading score,on an average,math score will increase
by 0.298 units .
If we increase one unit in writing score,on an average,math score will increase by
0.698 units .
If all variables be absent,then the math score will be -10.904.
Male students score,on an average,13.633 units higher in math compared to
female students,holding all other variables constant.
Students who did not complete the test preparation course, on an average, 3.582
units lower in math compared to those who completed the course, holding all other
variables constant.
Hypothesis Testing for Correlations
R programming:
CorrelationTestReadingVsMath=cor.test(StudentPerformance$readin
g.score,StudentPerformance$math.score)
print(CorrelationTestReadingVsMath)
CorrelationTestReadingVsWriting=cor.test(StudentPerformance$read
ing.score,StudentPerformance$writing.score)
print(CorrelationTestReadingVsWriting)
CorrelationTestWritingVsMath=cor.test(StudentPerformance$writing
.score,StudentPerformance$math.score)
print(CorrelationTestWritingVsMath)
library(corrplot)
corrplot(CorrelationMatrix,method="color")
Hypothesis Testing for Correlations
H0=True correlation is equal to 0.
Ha=True correlation is not equal to 0.
Under the null,
Variables Correlation Coefficient 95% Confidence P-value
(r) Interval
Reading Score Vs Math 0.818 (0.796,0.837) <2.2e-16***
Score

Reading Score Vs Writing 0.955 (0.949,0.960) <2.2e-16***

Score

Writing Score Vs Math 0.803 (0.779,0.824) <2.2e-16***

Score
Hypothesis Testing
for Correlations

Since,for all the

correlation
coefficients,p-value is
greater than
0.005,thus the null
hypothesis is
rejected,that is,for all
pairs of variables,the
correlation coefficient
is statistically
significant.
Hypothesis Testing for Regression Coefficients
R Programming:
H0=The coefficient is equal to 0. CoefficientsSummary=summary(Regression
Ha=The coefficient is not equal to 0. Model)$coefficients
print(CoefficientsSummary)
Under the null,
Variable Estimate Std. Error t-value P-value

(Intercept) -10.904 1.131 -9.642 <2e-16***

Reading Score 0.298 0.044 6.750 <2.51e-11***

Writing Score 0.698 0.044 15.756 <2e-16***

Gender (Male) 13.633 0.397 34.303 <2e-16***

Test Preparation 3.582 0.420 8.535 <2e-16***

Course (None)
Hypothesis Testing for Correlations

Since,for all the regression

coefficients,the p-value is greater
than the significance level (0.005)
,thus null hypothesis is rejected,that
is,the coefficients are statistically
significant.
THANK YOU

Jeffrey M Wooldridge Solutions Manual and Supplementary Materials For Econometric Analysis of Cross Section and Panel Data 2003
94% (17)
Jeffrey M Wooldridge Solutions Manual and Supplementary Materials For Econometric Analysis of Cross Section and Panel Data 2003
135 pages
Psychology Statistics For Dummies
From Everand
Psychology Statistics For Dummies
Martin Dempster
5/5 (1)
Sta210 Group Project
No ratings yet
Sta210 Group Project
14 pages
COMM 374 Midterm Notes
No ratings yet
COMM 374 Midterm Notes
10 pages
Case Study Rent A Car
0% (1)
Case Study Rent A Car
14 pages
77 - Soma Halder - Prepare Graph and Use Statistics For Analysis of Test Result
No ratings yet
77 - Soma Halder - Prepare Graph and Use Statistics For Analysis of Test Result
40 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Filipino Political Participation
100% (1)
Filipino Political Participation
35 pages
TO: California Department of Education FROM: Maria Cristina Coello Recalde
No ratings yet
TO: California Department of Education FROM: Maria Cristina Coello Recalde
21 pages
STATISTICS-ASSIGNMENT-DSEB
No ratings yet
STATISTICS-ASSIGNMENT-DSEB
3 pages
Students Performance Analysis Allesandro Yudo
No ratings yet
Students Performance Analysis Allesandro Yudo
16 pages
MAT 211 CourseGuide - Lecture Notes - Spring - 2022
No ratings yet
MAT 211 CourseGuide - Lecture Notes - Spring - 2022
74 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Introduction Qr
No ratings yet
Introduction Qr
34 pages
Group 1 - SM Project Report
No ratings yet
Group 1 - SM Project Report
8 pages
Students Performance Analysis
No ratings yet
Students Performance Analysis
12 pages
Studentperfomance
No ratings yet
Studentperfomance
23 pages
Assignment 2 2020
No ratings yet
Assignment 2 2020
6 pages
Basic Data Analysis in Action Research With Computer
No ratings yet
Basic Data Analysis in Action Research With Computer
43 pages
Introduction Qr1
No ratings yet
Introduction Qr1
34 pages
102x Screening Exam Questions
No ratings yet
102x Screening Exam Questions
3 pages
Commands for Data Analysis using R
No ratings yet
Commands for Data Analysis using R
11 pages
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
100% (1)
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
120 pages
STATISTIKA_FINAL_EXAM[1]
No ratings yet
STATISTIKA_FINAL_EXAM[1]
9 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Brown and Black Aesthetic Paper Vintage Page Border A4 Document
No ratings yet
Brown and Black Aesthetic Paper Vintage Page Border A4 Document
8 pages
Lab 2 - Basic Statistical Analysis
No ratings yet
Lab 2 - Basic Statistical Analysis
7 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Project of SPSS: (Project No 01 3 Semester Fall-2018)
No ratings yet
Project of SPSS: (Project No 01 3 Semester Fall-2018)
19 pages
Lumacad Submission 01 Methodology
No ratings yet
Lumacad Submission 01 Methodology
2 pages
Phan Project4
No ratings yet
Phan Project4
10 pages
Admt Stat Final - SP24
No ratings yet
Admt Stat Final - SP24
6 pages
maths lab
No ratings yet
maths lab
17 pages
Excel Stats Nicar2013
No ratings yet
Excel Stats Nicar2013
6 pages
W7 - Assumptions
No ratings yet
W7 - Assumptions
8 pages
Attachment 1
No ratings yet
Attachment 1
6 pages
Introduction to Probabilistic Models
No ratings yet
Introduction to Probabilistic Models
5 pages
MAT 211 Probability and Statistics Course Guide - Spring 2024: Offered by
No ratings yet
MAT 211 Probability and Statistics Course Guide - Spring 2024: Offered by
77 pages
Regression2 Implementation
No ratings yet
Regression2 Implementation
29 pages
Business Analytics C-2
No ratings yet
Business Analytics C-2
7 pages
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
No ratings yet
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
7 pages
Business Statistics Outline
No ratings yet
Business Statistics Outline
5 pages
Student Performance Project
No ratings yet
Student Performance Project
7 pages
Module 6 Content
No ratings yet
Module 6 Content
12 pages
Group4
No ratings yet
Group4
9 pages
Enjoy immediate access to the full Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual in PDF.
100% (13)
Enjoy immediate access to the full Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual in PDF.
48 pages
QUIZ FOR EXAM SCORE PROJECT
No ratings yet
QUIZ FOR EXAM SCORE PROJECT
4 pages
Not1
No ratings yet
Not1
8 pages
STAT 1000 Assignment - Solutions
No ratings yet
STAT 1000 Assignment - Solutions
7 pages
Manzan SW4e Ch01 02 03
No ratings yet
Manzan SW4e Ch01 02 03
70 pages
Introduction To Econometrics (Lecture Slides Complete 1 - 13)
No ratings yet
Introduction To Econometrics (Lecture Slides Complete 1 - 13)
657 pages
8614 Solved 2023
No ratings yet
8614 Solved 2023
9 pages
Educational Statistics (8614) (1)
No ratings yet
Educational Statistics (8614) (1)
10 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
STAT IIII FULL UNITS 24-25 IIII ALL RESOURCES
No ratings yet
STAT IIII FULL UNITS 24-25 IIII ALL RESOURCES
23 pages
ca school summary statistics deepseek
No ratings yet
ca school summary statistics deepseek
8 pages
Educational Statistics EDU 408.doc ready
No ratings yet
Educational Statistics EDU 408.doc ready
41 pages
Solution of Spss Project
No ratings yet
Solution of Spss Project
3 pages
M01 StockWatson123635 03 Econ Part01
No ratings yet
M01 StockWatson123635 03 Econ Part01
61 pages
PO687 End of Term Project
No ratings yet
PO687 End of Term Project
3 pages
Reseach 04
No ratings yet
Reseach 04
13 pages
Tableau Final Paper
No ratings yet
Tableau Final Paper
9 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
FALL 2012 - PSYC305 - Statistics For Experimental Design - Syllabus
No ratings yet
FALL 2012 - PSYC305 - Statistics For Experimental Design - Syllabus
4 pages
Sweet Dreams Bakery Was Started Five Years Ago by Della
No ratings yet
Sweet Dreams Bakery Was Started Five Years Ago by Della
2 pages
Adas
No ratings yet
Adas
22 pages
Engineering Mathematics-III
No ratings yet
Engineering Mathematics-III
8 pages
MLH Positif - Bjornsen
No ratings yet
MLH Positif - Bjornsen
10 pages
Bus Scheduling Model User Interfae
No ratings yet
Bus Scheduling Model User Interfae
5 pages
Prediction of Engineering Properties of A Selected Litharenite Sandstone From Its Petrographic Characteristics Using Correlation and Multivariate Statistical Techniques
No ratings yet
Prediction of Engineering Properties of A Selected Litharenite Sandstone From Its Petrographic Characteristics Using Correlation and Multivariate Statistical Techniques
23 pages
Endogenous Selection Bias
No ratings yet
Endogenous Selection Bias
58 pages
Bhati 2016
No ratings yet
Bhati 2016
32 pages
Matrix Calculations in Excel PDF
No ratings yet
Matrix Calculations in Excel PDF
37 pages
(Ebook) Applied Regression Modeling by Iain Pardoe ISBN 9781119615866, 1119615860 download
100% (1)
(Ebook) Applied Regression Modeling by Iain Pardoe ISBN 9781119615866, 1119615860 download
56 pages
UG BBA Syllabus NEP 1st and 2nd Sem 2023
No ratings yet
UG BBA Syllabus NEP 1st and 2nd Sem 2023
8 pages
Classification and Regression Trees First Issued In Hardback Edition Breiman pdf download
100% (2)
Classification and Regression Trees First Issued In Hardback Edition Breiman pdf download
65 pages
Download (Ebook) Python Deep Learning: Understand how deep neural networks work and apply them to real-world tasks by Vasilev, Ivan ISBN 9781837638505, 1837638500 ebook All Chapters PDF
100% (12)
Download (Ebook) Python Deep Learning: Understand how deep neural networks work and apply them to real-world tasks by Vasilev, Ivan ISBN 9781837638505, 1837638500 ebook All Chapters PDF
65 pages
Strategic Human Resource Management and Employee Performance: A Study of Selected Indian Power Sector Psus
No ratings yet
Strategic Human Resource Management and Employee Performance: A Study of Selected Indian Power Sector Psus
10 pages
Download ebooks file (Ebook) Urban and Regional Transportation Modeling: Essays in Honor of David Boyce (New Dimensions in Networks) by Der-Horng Lee, David E. Boyce ISBN 9781843763062, 1843763060 all chapters
100% (4)
Download ebooks file (Ebook) Urban and Regional Transportation Modeling: Essays in Honor of David Boyce (New Dimensions in Networks) by Der-Horng Lee, David E. Boyce ISBN 9781843763062, 1843763060 all chapters
81 pages
Measurement: Sciencedirect
No ratings yet
Measurement: Sciencedirect
8 pages
ECON F213 - Course Handout-ID DR Rishi
No ratings yet
ECON F213 - Course Handout-ID DR Rishi
4 pages
The Influence of Business Process Management System Implementation on an Organization s Process Orientation A Case Study of a Financial Service Provi
No ratings yet
The Influence of Business Process Management System Implementation on an Organization s Process Orientation A Case Study of a Financial Service Provi
23 pages
The Effect of Product Quality, Promotion and Brand Image On Puchase Intention Wall's Ice Cream
No ratings yet
The Effect of Product Quality, Promotion and Brand Image On Puchase Intention Wall's Ice Cream
7 pages
Final Questions For Last Class
No ratings yet
Final Questions For Last Class
5 pages
1 Aerm
No ratings yet
1 Aerm
3 pages
Unit 5
No ratings yet
Unit 5
26 pages
Deformation Resistance of Asphalt Mixtures by The Wheel Tracking Test
No ratings yet
Deformation Resistance of Asphalt Mixtures by The Wheel Tracking Test
6 pages
Solution Manual For Econometrics by Example 2nd Edition by Gujarati ISBN 1137375019 9781137375018
No ratings yet
Solution Manual For Econometrics by Example 2nd Edition by Gujarati ISBN 1137375019 9781137375018
9 pages