0% found this document useful (0 votes)
50 views9 pages

Student Grade Prediction

The document describes student performance data with 33 columns. It analyzes the distribution of final grades, finding most students scored between 8-14. It checks for null values and finds none. Additional analysis includes comparing grades by sex and address, finding the number of male and female students, and correlating features to final grade. A new column is created for average grade across years and non-essential features are dropped.

Uploaded by

Tuba Saleha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views9 pages

Student Grade Prediction

The document describes student performance data with 33 columns. It analyzes the distribution of final grades, finding most students scored between 8-14. It checks for null values and finds none. Additional analysis includes comparing grades by sex and address, finding the number of male and female students, and correlating features to final grade. A new column is created for average grade across years and non-essential features are dropped.

Uploaded by

Tuba Saleha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 9

IMPLEMENTATION CODE:

# %% [code]
import pandas as pd
import numpy as np

Text cell <C8VtdiHQZff0>


# %% [markdown]
There are 33 columns:

school: student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da


Silveira)

sex: student's sex (binary: 'F' - female or 'M' - male)

age: student's age (numeric: from 15 to 22)

address: student's home address type (binary: 'U' - urban or 'R' - rural)

famsize: family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)

Pstatus: parent's cohabitation status (binary: 'T' - living together or 'A' -


apart)

Medu: mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2


– 5th to 9th grade, 3 – secondary education or 4 – higher education)

Fedu: father's education (numeric: 0 - none, 1 - primary education (4th grade), 2


– 5th to 9th grade, 3 – secondary education or 4 – higher education)

Mjob: mother's job (nominal: 'teacher', 'health' care related, civil 'services'
(e.g. administrative or police), 'at_home' or 'other')

Fjob: father's job (nominal: 'teacher', 'health' care related, civil 'services'
(e.g. administrative or police), 'at_home' or 'other')

reason: reason to choose this school (nominal: close to 'home', school


'reputation', 'course' preference or 'other')

guardian: student's guardian (nominal: 'mother', 'father' or 'other')

traveltime: home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3


- 30 min. to 1 hour, or 4 - >1 hour)

studytime: weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10


hours, or 4 - >10 hours)

failures: number of past class failures (numeric: n if 1<=n<3, else 4)

schoolsup: extra educational support (binary: yes or no)

famsup: family educational support (binary: yes or no)

paid: extra paid classes within the course subject (Math or Portuguese) (binary:
yes or no)

activities: extra-curricular activities (binary: yes or no)

nursery: attended nursery school (binary: yes or no)


higher: wants to take higher education (binary: yes or no)

internet: Internet access at home (binary: yes or no)

romantic: with a romantic relationship (binary: yes or no)

famrel: quality of family relationships (numeric: from 1 - very bad to 5 -


excellent)

freetime: free time after school (numeric: from 1 - very low to 5 - very high)

goout: going out with friends (numeric: from 1 - very low to 5 - very high)

Dalc: workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

Walc: weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

health: current health status (numeric: from 1 - very bad to 5 - very good)

absences: number of school absences (numeric: from 0 to 93)

G1: first period grade (numeric: from 0 to 20)

G2: second period grade (numeric: from 0 to 20)

G3: final grade (numeric: from 0 to 20)

Code cell <6kTGMtp3RftE>


# %% [code]
data=pd.read_csv("/content/student-mat.csv")

Code cell <0mWovy-hRjBC>


# %% [code]
data.head()
Execution output
6KB
text/plain
school sex age address famsize Pstatus ... Walc health absences G1 G2 G3
0 GP F 18 U GT3 A ... 1 3 6 5 6
6
1 GP F 17 U GT3 T ... 1 3 4 5 5
6
2 GP F 15 U LE3 T ... 3 3 10 7 8
10
3 GP F 15 U GT3 T ... 1 5 2 15 14
15
4 GP F 16 U GT3 T ... 2 5 4 6 10
10

[5 rows x 33 columns]

Text cell <30FGjJoaR7XC>


# %% [markdown]
## Final grade

Code cell <WDKi9jUsRkcD>


# %% [code]
data['G3'].describe()
Execution output
0KB
text/plain
count 395.000000
mean 10.415190
std 4.581443
min 0.000000
25% 8.000000
50% 11.000000
75% 14.000000
max 20.000000
Name: G3, dtype: float64

Code cell <f6AB7NgmR-k7>


# %% [code]
import seaborn as sns
import matplotlib.pyplot as plt
demo= sns.countplot(data['G3'])
demo.axes.set_title('Distribution of Final grade of students', fontsize = 35)
demo.set_xlabel('Final Grade', fontsize = 20)
demo.set_ylabel('Count', fontsize = 20)
plt.show()

Execution output
20KB
Stream
/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43:
FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12,
the only valid positional argument will be `data`, and passing other arguments
without an explicit keyword will result in an error or misinterpretation.
FutureWarning
text/plain
<Figure size 432x288 with 1 Axes>

Text cell <fbdewJ73Sqx3>


# %% [markdown]
Something seems off here. Apart from the high number of students scoring 0, the
distribution is normal as expected. Maybe the value 0 is used in place of null. Or
maybe the students who did not appear for the exam, or were not allowed to sit for
the exam due to some reason are marked as 0. We cannot be sure. Let us check the
table for null values

Code cell <oc1hFnHRSbuo>


# %% [code]
data.isnull().any()
Execution output
1KB
text/plain
school False
sex False
age False
address False
famsize False
Pstatus False
Medu False
Fedu False
Mjob False
Fjob False
reason False
guardian False
traveltime False
studytime False
failures False
schoolsup False
famsup False
paid False
activities False
nursery False
higher False
internet False
romantic False
famrel False
freetime False
goout False
Dalc False
Walc False
health False
absences False
G1 False
G2 False
G3 False
dtype: bool

Code cell <7uWC74qRSvzk>


# %% [code]
data.isnull().sum()
Execution output
1KB
text/plain
school 0
sex 0
age 0
address 0
famsize 0
Pstatus 0
Medu 0
Fedu 0
Mjob 0
Fjob 0
reason 0
guardian 0
traveltime 0
studytime 0
failures 0
schoolsup 0
famsup 0
paid 0
activities 0
nursery 0
higher 0
internet 0
romantic 0
famrel 0
freetime 0
goout 0
Dalc 0
Walc 0
health 0
absences 0
G1 0
G2 0
G3 0
dtype: int64

Code cell <hX2bZN2WS0C5>


# %% [code]
male_student = len(data[data['sex'] == 'M'])
female_student= len(data[data['sex'] == 'F'])
print('Number of male students:',male_student)
print('Number of female students:',female_student)
Execution output
0KB
Stream
Number of male students: 187
Number of female students: 208

Code cell <rq-2VhqKS3tH>


# %% [code]
demo= sns.countplot('age',hue='sex', data=data)
demo.axes.set_title('age groups',fontsize=30)
demo.set_xlabel("Age",fontsize=30)
demo.set_ylabel("Count",fontsize=20)
plt.show()
Execution output
13KB
Stream
/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43:
FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12,
the only valid positional argument will be `data`, and passing other arguments
without an explicit keyword will result in an error or misinterpretation.
FutureWarning
text/plain
<Figure size 432x288 with 1 Axes>

Code cell <UjwF_i9ATI8G>


# %% [code]
demo = sns.countplot(data['address'])
demo.axes.set_title('Urban Vs rural students', fontsize = 30)
demo.set_xlabel('Address', fontsize = 20)
demo.set_ylabel('Count', fontsize = 20)
plt.show()
Execution output
14KB
Stream
/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43:
FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12,
the only valid positional argument will be `data`, and passing other arguments
without an explicit keyword will result in an error or misinterpretation.
FutureWarning
text/plain
<Figure size 432x288 with 1 Axes>

Code cell <8y1hBr0eTR3y>


# %% [code]
data.corr()['G3'].sort_values()

Execution output
1KB
text/plain
failures -0.360415
age -0.161579
goout -0.132791
traveltime -0.117142
health -0.061335
Dalc -0.054660
Walc -0.051939
freetime 0.011307
absences 0.034247
famrel 0.051363
studytime 0.097820
Fedu 0.152457
Medu 0.217147
G1 0.801468
G2 0.904868
G3 1.000000
Name: G3, dtype: float64

Text cell <aIIjgBT0ZpEa>


# %% [markdown]
From the data above, let's create one more column to get the average grade from G1
to G3 (3 years average):

Code cell <t8t_3olyZr_3>


# %% [code]
data['GradeAvg'] = (data['G1'] + data['G2'] + data['G3']) / 3

Code cell <rUurlCOuaIsX>


# %% [code]

Text cell <AOW12U4IahVF>


# %% [markdown]
Next, we can drop school name and age feature because it is not a computational
value

Code cell <Fz5GzMEbafv9>


# %% [code]
data.drop(["school","age"], axis=1, inplace=True)

Code cell <Gp5OXt8FaYsS>


# %% [code]
data.head()
Execution output
6KB
text/plain
sex address famsize Pstatus Medu ... absences G1 G2 G3 GradeAvg
0 F U GT3 A 4 ... 6 5 6 6 5.666667
1 F U GT3 T 1 ... 4 5 5 6 5.333333
2 F U LE3 T 1 ... 10 7 8 10 8.333333
3 F U GT3 T 4 ... 2 15 14 15 14.666667
4 F U GT3 T 3 ... 4 6 10 10 8.666667

[5 rows x 32 columns]

Code cell <9EyvxJmBU9Ht>


# %% [code]
data_dum=data

Code cell <BhICZVx9bFks>


# %% [code]
#Converting to categorical value
categorical_d = {'yes': 1, 'no': 0}
data_dum['schoolsup'] = data_dum['schoolsup'].map(categorical_d)
data_dum['famsup'] = data_dum['famsup'].map(categorical_d)
data_dum['paid'] = data_dum['paid'].map(categorical_d)
data_dum['activities'] = data_dum['activities'].map(categorical_d)
data_dum['nursery'] = data_dum['nursery'].map(categorical_d)
data_dum['higher'] = data_dum['higher'].map(categorical_d)
data_dum['internet'] = data_dum['internet'].map(categorical_d)
data_dum['romantic'] = data_dum['romantic'].map(categorical_d)

Code cell <EwbqUkybU8lh>


# %% [code]

Code cell <uTPRN1pUbh8g>


# %% [code]
categorical_d = {'F': 1, 'M': 0}
data_dum['sex'] = data_dum['sex'].map(categorical_d)

# map the address data


categorical_d = {'U': 1, 'R': 0}
data_dum['address'] = data_dum['address'].map(categorical_d)

# map the famili size data


categorical_d = {'LE3': 1, 'GT3': 0}
data_dum['famsize'] = data_dum['famsize'].map(categorical_d)

# map the parent's status


categorical_d= {'T': 1, 'A': 0}
data_dum['Pstatus'] = data_dum['Pstatus'].map(categorical_d)

# map the parent's job


categorical_d = {'teacher': 0, 'health': 1, 'services': 2,'at_home': 3,'other': 4}
data_dum['Mjob'] = data_dum['Mjob'].map(categorical_d)
data_dum['Fjob'] = data_dum['Fjob'].map(categorical_d)

# map the reason data


categorical_d= {'home': 0, 'reputation': 1, 'course': 2,'other': 3}
data_dum['reason'] = data_dum['reason'].map(categorical_d)

# map the guardian data


categorical_d = {'mother': 0, 'father': 1, 'other': 2}
data_dum['guardian'] = data_dum['guardian'].map(categorical_d)

Code cell <PZSeJw1Ab2Yn>


# %% [code]
data_dum.columns
Execution output
0KB
text/plain
Index(['sex', 'address', 'famsize', 'Pstatus', 'Medu', 'Fedu', 'Mjob', 'Fjob',
'reason', 'guardian', 'traveltime', 'studytime', 'failures',
'schoolsup', 'famsup', 'paid', 'activities', 'nursery', 'higher',
'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc', 'Walc',
'health', 'absences', 'G1', 'G2', 'G3', 'GradeAvg'],
dtype='object')

Code cell <CdE4pW6OYo5K>


# %% [code]
from sklearn.model_selection import train_test_split
x=data_dum.drop("G3",axis=1)
y=data_dum['G3']

Code cell <fV9JfD0lcUC1>


# %% [code]
data_dum['G3']
Execution output
0KB
text/plain
0 6
1 6
2 10
3 15
4 10
..
390 9
391 16
392 7
393 10
394 9
Name: G3, Length: 395, dtype: int64

Code cell <aa9Ckc5NTWni>


# %% [code]

X_train, X_test, y_train, y_test = train_test_split(x,y, test_size = 0.20,


random_state=44)

Code cell <jCfQjFKhYXZK>


# %% [code]
from sklearn.linear_model import LinearRegression

Code cell <FdsAu2ZaZB-m>


# %% [code]
L=LinearRegression()

Code cell <k-BzJyL8ZDrC>


# %% [code]
L.fit(X_train, y_train)
Execution output
0KB
text/plain
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Code cell <hBcx_9tGZGDX>


# %% [code]
y_pred=L.predict(X_test)

Code cell <35XBb8PVcvm->


# %% [code]
print(L.score(X_test, y_test))
Execution output
0KB
Stream
1.0

Code cell <wRqSkq8Ic1yd>


# %% [code]

You might also like