23/01/2024, 15:39 Data Preprocessing.
ipynb - Colaboratory
import pandas as pd
keyboard_arrow_down Data Loading
df=pd.read_csv("/content/studentdataset (1)
keyboard_arrow_down Data Exploration
print(type(df))
<class 'pandas.core.frame.DataFrame'>
print(df.head(10))
ID class gender race GPA Maths English Science computer \
0 1141 A male 1 73.47 NaN 81 87 60
1 1142 A female 1 71.22 NaN 50 51 51
2 1143 A female 2 74.56 NaN 48 71 60
3 1144 A female 1 72.89 NaN 72 38 60
4 1145 A female 1 70.11 NaN 45 63 60
5 1146 A male 3 65.04 NaN 60 39 61
6 1147 A male 4 77.11 NaN 43 52 63
7 1148 A female 5 64.75 NaN 38 60 63
8 1149 B female 5 77.92 NaN 60 66 68
9 1150 A female 5 76.50 NaN 61 60 69
History from1 from2 from3 from4 y
0 NaN A A A 3 0
1 NaN B A A 2 0
2 NaN C A A 0 1
3 NaN D A A 0 0
4 NaN E A A 0 0
5 NaN F B C 0 0
6 NaN G A A 0 1
7 NaN H B C 0 0
8 80.0 I B A 0 0
9 NaN H B A 0 0
print(df.tail(10))
ID class gender race GPA Maths English Science computer \
95 1236 A female 1 87.63 82.0 81 97 97
96 1237 A male 2 91.74 94.0 100 96 97
97 1238 A male 1 91.14 98.0 90 98 97
98 1239 A male 1 90.31 84.0 82 99 97
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 1/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
99 1240 B male 1 88.10 87.0 70 95 97
100 1241 A female 1 88.34 87.0 83 92 98
101 1242 B male 1 89.84 98.0 77 95 98
102 1243 B male 1 88.82 83.0 80 91 98
103 1244 A male 1 86.60 92.0 82 91 99
104 1245 A male 1 93.71 93.0 97 99 100
History from1 from2 from3 from4 y
95 88.0 J B A 2 0
96 95.0 C B S 0 2
97 83.0 AA B A 0 1
98 89.0 P B A 0 2
99 91.0 AB B A 0 0
100 93.0 M B A 0 1
101 96.0 A B A 0 1
102 93.0 T B A 0 2
103 94.0 S B A 0 2
104 97.0 K B A 0 2
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 105 non-null int64
1 class 105 non-null object
2 gender 105 non-null object
3 race 105 non-null int64
4 GPA 105 non-null float64
5 Maths 46 non-null float64
6 English 105 non-null int64
7 Science 105 non-null int64
8 computer 105 non-null int64
9 History 90 non-null float64
10 from1 105 non-null object
11 from2 105 non-null object
12 from3 105 non-null object
13 from4 105 non-null int64
14 y 105 non-null int64
dtypes: float64(3), int64(7), object(5)
memory usage: 12.4+ KB
None
print(df.describe())
ID race GPA Maths English Science \
count 105.000000 105.000000 105.000000 46.000000 105.000000 105.000000
mean 1193.000000 1.790476 82.957048 87.021739 71.961905 78.942857
std 30.454885 1.673867 6.053187 5.327034 12.197039 14.997326
min 1141.000000 1.000000 63.490000 80.000000 38.000000 17.000000
25% 1167.000000 1.000000 79.340000 82.000000 64.000000 71.000000
50% 1193.000000 1.000000 84.110000 87.000000 73.000000 83.000000
75% 1219.000000 1.000000 87.300000 92.000000 80.000000 91.000000
max 1245.000000 7.000000 93.710000 98.000000 100.000000 99.000000
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 2/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
computer History from4 y
count 105.000000 90.000000 105.000000 105.000000
mean 85.133333 87.011111 0.504762 0.714286
std 10.269509 6.336083 0.889293 0.828742
min 51.000000 75.000000 0.000000 0.000000
25% 80.000000 82.000000 0.000000 0.000000
50% 87.000000 87.500000 0.000000 0.000000
75% 92.000000 92.750000 0.000000 1.000000
max 100.000000 97.000000 3.000000 2.000000
print(df.dtypes)
ID int64
class object
gender object
race int64
GPA float64
Maths float64
English int64
Science int64
computer int64
History float64
from1 object
from2 object
from3 object
from4 int64
y int64
dtype: object
print(df.shape)
(105, 15)
keyboard_arrow_down Data Cleaning
df.isnull()
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 3/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
ID class gender race GPA Maths English Science computer History fr
0 False False False False False True False False False True Fa
1 False False False False False True False False False True Fa
2 False False False False False True False False False True Fa
3 False False False False False True False False False True Fa
4 False False False False False True False False False True Fa
... ... ... ... ... ... ... ... ... ... ...
100 False False False False False False False False False False Fa
101 False False False False False False False False False False Fa
102 False False False False False False False False False False Fa
103 False False False False False False False False False False Fa
104 False False False False False False False False False False Fa
105 rows × 15 columns
df.isnull().sum()
ID 0
class 0
gender 0
race 0
GPA 0
Maths 59
English 0
Science 0
computer 0
History 15
from1 0
from2 0
from3 0
from4 0
y 0
dtype: int64
df.columns
Index(['ID', 'class', 'gender', 'race', 'GPA', 'Maths', 'English', 'Science',
'computer', 'History', 'from1', 'from2', 'from3', 'from4', 'y'],
dtype='object')
df.columns=['ID', 'class', 'Gender', 'Race',
'computer', 'History', 'From1', 'From
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 4/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
df.head()
ID class Gender Race GPA Maths English Science computer History From1
0 1141 A male 1 73.47 NaN 81 87 60 NaN A
1 1142 A female 1 71.22 NaN 50 51 51 NaN B
2 1143 A female 2 74.56 NaN 48 71 60 NaN C
3 1144 A female 1 72.89 NaN 72 38 60 NaN D
4 1145 A female 1 70.11 NaN 45 63 60 NaN E
print(df['Maths'])
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
100 87.0
101 98.0
102 83.0
103 92.0
104 93.0
Name: Maths, Length: 105, dtype: float64
Mean=df['Maths'].mean()
print(Mean)
87.02173913043478
df['Maths']=df['Maths'].fillna(Mean)
df.head()
ID class Gender Race GPA Maths English Science computer History F
0 1141 A male 1 73.47 87.021739 81 87 60 NaN
1 1142 A female 1 71.22 87.021739 50 51 51 NaN
2 1143 A female 2 74.56 87.021739 48 71 60 NaN
3 1144 A female 1 72.89 87.021739 72 38 60 NaN
4 1145 A female 1 70.11 87.021739 45 63 60 NaN
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 5/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
Mean=df['History'].mean()
df['History']=df['History'].fillna(Mean)
df.head()
ID class Gender Race GPA Maths English Science computer History
0 1141 A male 1 73.47 87.021739 81 87 60 87.011111
1 1142 A female 1 71.22 87.021739 50 51 51 87.011111
2 1143 A female 2 74.56 87.021739 48 71 60 87.011111
3 1144 A female 1 72.89 87.021739 72 38 60 87.011111
4 1145 A female 1 70.11 87.021739 45 63 60 87.011111
df.dtypes
ID int64
class object
Gender object
Race int64
GPA float64
Maths float64
English int64
Science int64
computer int64
History float64
From1 object
From2 object
From3 object
From4 int64
Y int64
dtype: object
df['History']=df['History'].astype(int)
df['Maths']=df['Maths'].astype(int)
df.head()
ID class Gender Race GPA Maths English Science computer History From1
0 1141 A male 1 73.47 87 81 87 60 87 A
1 1142 A female 1 71.22 87 50 51 51 87 B
2 1143 A female 2 74.56 87 48 71 60 87 C
3 1144 A female 1 72.89 87 72 38 60 87 D
4 1145 A female 1 70.11 87 45 63 60 87 E
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 6/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
df.dtypes
ID int64
class object
Gender object
Race int64
GPA float64
Maths int64
English int64
Science int64
computer int64
History int64
From1 object
From2 object
From3 object
From4 int64
Y int64
dtype: object
df=df.drop(['GPA'],axis=1)
df.head()
ID class Gender Race Maths English Science computer History From1 From2
0 1141 A male 1 87 81 87 60 87 A A
1 1142 A female 1 87 50 51 51 87 B A
2 1143 A female 2 87 48 71 60 87 C A
3 1144 A female 1 87 72 38 60 87 D A
4 1145 A female 1 87 45 63 60 87 E A
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 7/7