Data Preprocessing - Ipynb - Colaboratory
Data Preprocessing - Ipynb - Colaboratory
ipynb - Colaboratory
import pandas as pd
print(df.head(10))
ID class gender race GPA Maths English Science computer \
0 1141 A male 1 73.47 NaN 81 87 60
1 1142 A female 1 71.22 NaN 50 51 51
2 1143 A female 2 74.56 NaN 48 71 60
3 1144 A female 1 72.89 NaN 72 38 60
4 1145 A female 1 70.11 NaN 45 63 60
5 1146 A male 3 65.04 NaN 60 39 61
6 1147 A male 4 77.11 NaN 43 52 63
7 1148 A female 5 64.75 NaN 38 60 63
8 1149 B female 5 77.92 NaN 60 66 68
9 1150 A female 5 76.50 NaN 61 60 69
print(df.tail(10))
ID class gender race GPA Maths English Science computer \
95 1236 A female 1 87.63 82.0 81 97 97
96 1237 A male 2 91.74 94.0 100 96 97
97 1238 A male 1 91.14 98.0 90 98 97
98 1239 A male 1 90.31 84.0 82 99 97
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 1/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
99 1240 B male 1 88.10 87.0 70 95 97
100 1241 A female 1 88.34 87.0 83 92 98
101 1242 B male 1 89.84 98.0 77 95 98
102 1243 B male 1 88.82 83.0 80 91 98
103 1244 A male 1 86.60 92.0 82 91 99
104 1245 A male 1 93.71 93.0 97 99 100
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 105 non-null int64
1 class 105 non-null object
2 gender 105 non-null object
3 race 105 non-null int64
4 GPA 105 non-null float64
5 Maths 46 non-null float64
6 English 105 non-null int64
7 Science 105 non-null int64
8 computer 105 non-null int64
9 History 90 non-null float64
10 from1 105 non-null object
11 from2 105 non-null object
12 from3 105 non-null object
13 from4 105 non-null int64
14 y 105 non-null int64
dtypes: float64(3), int64(7), object(5)
memory usage: 12.4+ KB
None
print(df.describe())
ID race GPA Maths English Science \
count 105.000000 105.000000 105.000000 46.000000 105.000000 105.000000
mean 1193.000000 1.790476 82.957048 87.021739 71.961905 78.942857
std 30.454885 1.673867 6.053187 5.327034 12.197039 14.997326
min 1141.000000 1.000000 63.490000 80.000000 38.000000 17.000000
25% 1167.000000 1.000000 79.340000 82.000000 64.000000 71.000000
50% 1193.000000 1.000000 84.110000 87.000000 73.000000 83.000000
75% 1219.000000 1.000000 87.300000 92.000000 80.000000 91.000000
max 1245.000000 7.000000 93.710000 98.000000 100.000000 99.000000
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 2/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
print(df.dtypes)
ID int64
class object
gender object
race int64
GPA float64
Maths float64
English int64
Science int64
computer int64
History float64
from1 object
from2 object
from3 object
from4 int64
y int64
dtype: object
print(df.shape)
(105, 15)
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 3/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
0 False False False False False True False False False True Fa
1 False False False False False True False False False True Fa
2 False False False False False True False False False True Fa
3 False False False False False True False False False True Fa
4 False False False False False True False False False True Fa
... ... ... ... ... ... ... ... ... ... ...
100 False False False False False False False False False False Fa
101 False False False False False False False False False False Fa
102 False False False False False False False False False False Fa
103 False False False False False False False False False False Fa
104 False False False False False False False False False False Fa
df.isnull().sum()
ID 0
class 0
gender 0
race 0
GPA 0
Maths 59
English 0
Science 0
computer 0
History 15
from1 0
from2 0
from3 0
from4 0
y 0
dtype: int64
df.columns
Index(['ID', 'class', 'gender', 'race', 'GPA', 'Maths', 'English', 'Science',
'computer', 'History', 'from1', 'from2', 'from3', 'from4', 'y'],
dtype='object')
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 4/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
df.head()
ID class Gender Race GPA Maths English Science computer History From1
print(df['Maths'])
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
100 87.0
101 98.0
102 83.0
103 92.0
104 93.0
Name: Maths, Length: 105, dtype: float64
Mean=df['Maths'].mean()
print(Mean)
87.02173913043478
df['Maths']=df['Maths'].fillna(Mean)
df.head()
ID class Gender Race GPA Maths English Science computer History F
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 5/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
Mean=df['History'].mean()
df['History']=df['History'].fillna(Mean)
df.head()
ID class Gender Race GPA Maths English Science computer History
df.dtypes
ID int64
class object
Gender object
Race int64
GPA float64
Maths float64
English int64
Science int64
computer int64
History float64
From1 object
From2 object
From3 object
From4 int64
Y int64
dtype: object
df['History']=df['History'].astype(int)
df['Maths']=df['Maths'].astype(int)
df.head()
ID class Gender Race GPA Maths English Science computer History From1
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 6/7
23/01/2024, 15:39 Data Preprocessing.ipynb - Colaboratory
df.dtypes
ID int64
class object
Gender object
Race int64
GPA float64
Maths int64
English int64
Science int64
computer int64
History int64
From1 object
From2 object
From3 object
From4 int64
Y int64
dtype: object
df=df.drop(['GPA'],axis=1)
df.head()
ID class Gender Race Maths English Science computer History From1 From2
0 1141 A male 1 87 81 87 60 87 A A
1 1142 A female 1 87 50 51 51 87 B A
2 1143 A female 2 87 48 71 60 87 C A
3 1144 A female 1 87 72 38 60 87 D A
4 1145 A female 1 87 45 63 60 87 E A
https://colab.research.google.com/drive/1vIKnipi5ArHZTw4oTEXC_s-Po0wlAVrW#scrollTo=3-HZ-kQXyJML&printMode=true 7/7