0% found this document useful (0 votes)
3 views44 pages

COM2007 CaseStudy Sample

This case study analyzes a dataset of 395 secondary education students from two Portuguese schools, focusing on their demographic, social, family, and academic performance attributes. Key insights reveal performance differences between schools, the impact of absences and extra educational support on grades, and how family features and student behavior influence academic outcomes. The study concludes that careful analysis of such datasets can yield valuable insights for future educational research.

Uploaded by

ruohanli240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views44 pages

COM2007 CaseStudy Sample

This case study analyzes a dataset of 395 secondary education students from two Portuguese schools, focusing on their demographic, social, family, and academic performance attributes. Key insights reveal performance differences between schools, the impact of absences and extra educational support on grades, and how family features and student behavior influence academic outcomes. The study concludes that careful analysis of such datasets can yield valuable insights for future educational research.

Uploaded by

ruohanli240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

CASE STUDY

ON STUDENT
DATA
TABLE OF CONTENTS

01 DATASET INTRO
02 METHODOLOGY

03 ANALYSIS
RESULTS
04 CONCLUSIONS
THE DATASET
DERIVED FROM…
The dataset is published on the website of Center for Machine
Learning and Intelligent Systems of University of California.
THE DATASET IS
ABOUT…
The data was collected from school reports and questionnaires,
the research objects are 395 students in secondary education
of two Portuguese schools.

Most of the attributes are related to student’s demographic,


social, family, and grades information; and are organized in
binary or five-level classification.
THE DATASET
INCLUDES…
1
2
school - (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
sex - student's sex (binary: 'F' - female or 'M' - male)
3 age - student's age (numeric: from 15 to 22)
4 address - student's home address type (binary: 'U' - urban or 'R' - rural)
5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater
than 3)
6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' -
apart)
7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th
grade), 2 (5th to 9th grade), 3 (secondary education) or 4 (higher education)
THE DATASET
INCLUDES…
8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th
grade), 2 - (5th to 9th grade), 3 - (secondary education) or 4 - (higher
education)
9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil
'services' (e.g. administrative or police), 'at_home' or 'other')
10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services'
(e.g. administrative or police), 'at_home' or 'other')
11 reason - reason to choose this school (nominal: close to 'home', school
'reputation', 'course' preference or 'other')
12 guardian - student's guardian (nominal: 'mother', 'father' or 'other')
THE DATASET
INCLUDES…
13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to
30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 -
5 to 10 hours, or 4 - >10 hours)
15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)
16 schoolsup - extra educational support (binary: yes or no)
17 famsup - family educational support (binary: yes or no)
18 paid - extra paid classes within the course subject (Math or Portuguese)
(binary: yes or no)
19 activities - extra-curricular activities (binary: yes or no)
THE DATASET
INCLUDES…
20
21
nursery - attended nursery school (binary: yes or no)
higher - wants to take higher education (binary: yes or no)
22 internet - Internet access at home (binary: yes or no)
23 romantic - with a romantic relationship (binary: yes or no)
24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 -
excellent)
25 freetime - free time after school (numeric: from 1 - very low to 5 - very
high)
26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)
THE DATASET
INCLUDES…
27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very
high)
28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very
high)
29 health - current health status (numeric: from 1 - very bad to 5 - very good)
30 absences - number of school absences (numeric: from 0 to 93)

# these grades are related with the course subject Math:


31 G1 - first period grade (numeric: from 0 to 20)
31 G2 - second period grade (numeric: from 0 to 20)
32 G3 - final grade (numeric: from 0 to 20, output target)
OUR METHODOLOGY

Access Power BI
To create datatables that To visualize key insights
could highlight the key that we found with Access
insights SQL
WHAT
WE HAVE
FOUND
INSIGHT
#1
Performance difference in two different schools,
whether extra educational support and absences are two attributes that impact the
students’ grades.
Student Grade in different
School
<SELECT StudentInfo.School, COUNT(School) AS NoOfStudents, Avg(StudentInfo.G1) AS AvgGradeIn1stPeriod,
Max(G1) AS HighestGradeIn1stPeriod, Min(G1) AS LowestGradeIn1stPeriod, Avg(StudentInfo.G2) AS
AvgGradeIn2ndPeriod, Max(G2) AS aHighestGradeIn2ndPeriod, Min(G2) AS LowestGradeIn2ndPeriod,
Avg(StudentInfo.G3) AS AvgFinalGrade, Max(G3) AS HighestFinalGrade, Min(G3) AS LowestFinalGrade
FROM StudentInfo GROUP BY StudentInfo.School;>
Student Grade in School Of
“GP” is better
We found that in every column, the grade of GP’s students are better than the
MS Students.
Academic Result of School of GP is better!
Whether absence affects the
grade?
<SELECT absences, COUNT(Absences) AS NoOfStudent, AVG(G1) AS AvgOf1st, AVG(G2) AS AVGof2nd,
AVG(G3) AS AVGoffinalgd
FROM StudentInfo GROUP BY Absences;>
Whether absence affects the
grade?
Maybe Yes.
Using the final grade and
absences data into scatter plot.

Majority of grade > 15,


Their Absences <10

Majority of grade < 10,


Their absences < 0

Students Absent > 20,


Grade around 10 or under.
Whether extra support benefits
grades?
<SELECT SCHOOL, COUNT(SCHOOL) AS NoOfStudent, SCHOOLSUP, AVG(G1) AS AvgGradeIn1stPeriod, AVG(G2) AS
AvgGradeIn2ndPeriod, AVG(G3) AS AvgFinalGrade
FROM StudentInfo GROUP BY SCHOOLSUP, SCHOOL ORDER BY SCHOOL;>
Whether extra support benefits
grades?
Maybe Yes.
We found that student who have school support, however, their grade is not
better than who don’t have school support.

Avg Grade
SchScp: 9.43 NoSchSup: 10.67

We estimate that “school support” is for students whose result is not ideal for
making improvement.

As School of MS didn’t provide support to their students, we cannot analyze the


result..
INSIGHT
#2
How does the family decide “who” will become the guardian of the student?
Is it because of one of them has a higher level of education, or has a teaching-related
profession?
Does parent’s cohabitation status has a say on the attribute?
What influences the attribute
“Guardian”?
“Guardian” correlates with…

THEORY 1 THEORY 2 THEORY 3


Mother/Father’s Mother/Father’s Job Parents living status
Level of Education
SELECT COUNT(T.ID) AS Edu_Guardian,
COUNT(StudentInfo.ID) AS All_Guardian,
All_Guardian - Edu_Guardian
FROM (SELECT ID, Medu, Fedu, guardian FROM
StudentInfo WHERE (Medu > Fedu AND guardian
= "mother") OR (Fedu > Medu AND guardian =
"father")) AS T RIGHT JOIN StudentInfo ON T.ID =
StudentInfo.ID;

Edu_Gardian
133

NonEdu_Gardian
262 It is not always the
parent who has a
higher education level
who takes the role of
the guardian.
SELECT COUNT(S.ID) AS Job_Guardian, COUNT(A.ID) AS All_Guardian, All_Guardian -
Job_Guardian AS NonJob_Guardian
FROM
(SELECT ID, Mjob, Fjob, guardian
FROM StudentInfo
WHERE (Mjob = "at_home" AND guardian = "mother") OR (Fjob = "at_home" AND
guardian = "father")) AS S RIGHT JOIN (SELECT *
FROM StudentInfo
WHERE Mjob = "at_home" OR Fjob = "at_home") AS A ON S.ID = A.ID
;

In a family, if there is
one parent that is “at
home”, then this
parent has higher
chance to be the
guardian of the
student.
SELECT COUNT(T.ID) AS Teacher_Guardian, COUNT(A.ID) AS All_Guardian,
All_Guardian - Teacher_Guardian AS NonTeacher_Guardian
FROM
(SELECT *
FROM (SELECT ID, Mjob, Fjob, guardian FROM StudentInfo WHERE Mjob = "teacher"
OR Fjob = "teacher") AS [%$##@_Alias]
WHERE (Mjob = "teacher" AND guardian = "mother") OR (Fjob = "teacher" AND
guardian = "father")) AS T RIGHT JOIN (SELECT *
FROM StudentInfo
WHERE Mjob = "teacher" OR Fjob = "teacher") AS A ON T.ID = A.ID;

In a family, if there is
one parent that is
“teacher”, then this
parent has higher
chance to be the
guardian of the
student.
SELECT COUNT(A.ID) AS Apart_other, COUNT (B.ID) AS All_other, All_other -
Apart_other AS NonApart_other
FROM
(SELECT ID, Pstatus, guardian
FROM StudentInfo
WHERE Pstatus = "A" AND guardian = "other") AS A RIGHT JOIN
(SELECT ID, Medu, Fedu, Mjob, Fjob, Pstatus, guardian
FROM StudentInfo
WHERE guardian = "other") AS B ON A.ID = B.ID;

The “other” feature of


the attribute
“Guardian” does not
seem to be impacted
by the cohabitation
status of the parents.
INSIGHT
#3
Family features’ impact on student’s performance, such as:
free time / study time / travel time / internet usage / activities / Mjob and Fjob
The overall distribution basis
on Mjob/Fjob Before we look at
the insight, we need
to know that is the
data
reliable?
M/Fjob v freetime
We can see the average mark is higher for teacher’s group and health’s
group
M/Fjob(teacher) v freetime
M/Fjob(at_home) v freetime
We can see none of those group are having least free time after school.
M/Fjob(teacher) v Traveltime
& Studytime
After than, we want to look more about their education model on
raveltime & studytime
M/Fjob(at_home) v Traveltime
& Studytime
They are try to avoid on putting too much time on travel and study
Highest traveltime
Highest Study time
NO Internet & Activities
Internet & Activities also refer to how parents teach their child

If there is no
internet &
extra-curricular
activities, there
can have better
performance by
putting more
time on
studying.
Internet & Activities

If there has
both internet &
extra-curricular
activities, there
can have better
performance
than average.
Internet & Activities
(Mjob=teacher) By <SELECT
COUNT(*)
FROM
Studentinfo
WHERE
Fjob='teacher'
and
internet='yes'
and
activities='yes';
>

There are 12
records and
shows the
representation
of reliability.
INSIGHT
#4
How student behavior affects student’s performance.
Including attributes:
Studytime, goout, dalc, romantic.
How studytime affect the
grade
SELECT studytime, COUNT(studytime) AS numberofstudent, AVG(G1)+AVG(G2)+AVG(G3)
AS TotalAvgGrade FROM StudentInfo GROUP BY studytime;

studytime - weekly study time (numeric: 1 -


<2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours,
or 4 - >10 hours)
Students who have more than 10 hours of
weekly study time can achieve a total average
grade(G1+G2+G3) of over 35. Compare to
students who only have less than 2 hours of
weekly study time, they can achieve about 5
more points.
We can observe an obvious trend that
more study time leads to a higher
grade.
How romantic relationship affect
SELECT romantic, COUNT(romantic) AS numberofstudent,

the grade
AVG(G1)+AVG(G2)+AVG(G3) AS TotalAvgGrade FROM StudentInfo GROUP BY
romantic;

We found that the number of students


with a romantic relationship is about
half of students with out a romantic
relationship, and their total average
grade(G1+G2+G3) is relatively about 2
points less.
How goout affect the grade
SELECT goout, COUNT(goout) AS NoOfStudent, AVG(G1)+AVG(G2)+AVG(G3) AS
TotalAvgG FROM StudentInfo GROUP BY goout;

goout - going out with friends


(numeric: from 1 - very low to 5 - very
high)
From the observation, the goout
attribute has a negative correlation
with grades except for the lowest
group.
Therefore, we can conclude that the
very low frequency of going out with
friends might cause a negative impact
on students’ study performance.
How dalc affect the grade
SELECT dalc, COUNT(dalc) AS numberofstudent, AVG(G1)+AVG(G2)+AVG(G3)
AS TotalAvgGrade FROM StudentInfo GROUP BY dalc;

Dalc - workday alcohol consumption (numeric:


from 1 - very low to 5 - very high)

From the result,about 70% of the


students are in the lowest group of
workday alcohol consumption(1) and
they have the highest total average
grade. We can say that keep the low
workday alcohol consumption can
cause benefit to most of the students
on the grade.
Overall
In general,we can observe that the
majority of good/active personal
behavior(longer study time, lower
workday alcohol consumption) can
have a positive impact on student’s
grade.
Besides,students who more focus on
social or building relationships with
friends might have a slightly regression
on their grades.
Conclusions
We have seen that many insightful
information could be taken out from a dataset
made of student’s basic information, and we
believe that with more care to these kind of
datasets, many other studies could be
elaborated in the future.

You might also like