We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
4 Create an “Academic performance” dataset of students and perform the fotLowing operc
# Python.
# Scan all variables for missing values and inconsistencies. If there are missing valu
# inconsistencies, use any of the suitable techniques to deal with them.
import pandas as pd
import opendatasets as od
import matplotlib.pylab as plt
import numpy as np
od. download("https: //www.kaggle. com/datasets/sankha1998/student-semester-result”)
Please provide your Kaggle credentials to download this dataset. Learn more: http://b
it. ly/kaggle-creds
Your Kaggle username:
Your Kaggle Key:
Dounloading student-senester-result.zip to .\student-senester-result
100% || 2.41 /2.41k [00:00<00:00, 413kB/s]
df = pd.read_csv("student-semester-result/data.csv")
print (d#)
Ist 2nd 3rd 4th Sth College Code Gender Roll Roll no. \
@ 8.11 7.68 7.11 7.43 8.18 115 Female NaN 17020.0
1 6.48 5.98 4.15 4.29 4.96 5 Male NaN 17021.0
2 8.41 8.24 7.52 8.25 7.75 115 Female NaN 1702.0
3 7.33 6.83 6.33 6.79 6.89 5 Male NaN 17023.0
4 7.89 7.34 7.22 7.32 7.46 5 Male NaN 17024.0
173 7.48 7.55 7.67 7.39 8.65 241 F 17048.6 NaN
174 7.38 6.41 6.59 7.11 7.38 241 M 17049.0 NaN
175 6.30 6.28 5.89 5.71 6.50 241 M 17050.0 NaN
176 7.04 7.10 6.81 7.08 6.92 241 M 17051.0 NaN
177 6.78 6.81 6.52 5.39 7.00 2a M 17052.0 NaN
Subject Code
6 16
1 16
2 16
3 16
4 16
173 28
174 28
175 28
176 28
17 28
[178 rows x 10 columns]
# Scan all variables for missing values and inconsistencies. If there are missing valu# inconsistencies, use any of the suitable techniques to deal with them.
df.info()
RangeIndex: 178 entries, @ to 177
Data columns (total 10 columns):
# Column Non-Null Count Dtype
176 non-null —‘float64
2
1 174 non-null —float64
2 176 non-null —float64
3 173 non-null —float64
4 sth 172 non-null —float64a
5 College Code 178 non-null —intea
5 Gender 177 non-null object
7 Roll 132 non-null —float64
8B Roll no. 46 non-null —float64
39 Subject Code 178 non-null inte4
dtypes: float64(7), int64(2), object(1)
memory usage: 14.0+ KB
dF.isnull().sum()
ast
2nd
3rd
ath
sth
College Code
Gender
Roll
Roll no.
Subject Code
dtype: integ
of Rueaunen
# calculate the mean vaule for all subject columns
avg_ist_Marks = df["1st"].astype("Floate4").mean(axis =
‘avg_2nd Marks = df["2nd"].astype("Floats4").mean(axis =
avg_3rd Marks “Floats4") mean (axis
avg_ath Marks mean (axis
avg_Sth_Marks = df["
print
print
“average marks of Ist Paper:", avg ist Marks)
‘Average marks of 2nd Paper avg_2nd_Marks)
print("Average marks of 3rd Paper:", avg 3rd Marks)
print("Average marks of 4th Paper:", avg 4th Marks)
print("Average marks of Sth Paper:", avg Sth Marks)
Average marks of 1st Paper: 7.038863636363637
Average marks of 2nd Paper: 6.943390804597701
Average marks of 3rd Paper: 6.6225
Average marks of 4th Paper: 7.027745664739886
Average marks of Sth Paper: 7.432558139534884
# replace NaN by mean value in "1st to Sth " column
st"].replace(np.nan, avg ast Marks, inplace = True)
ind") .replace(np-nan, avg 2nd Marks, inplace
ied") .replace(np-nan, avg 3rd_Marks, inplaceafl
afl
th] -replace(np.nan, avg sth Marks, inplace = True)
‘th"]-replace(np.nan, avg sth_Marks, inplace = True)
df-isnull().sum()
ast
2nd
3rd
ath
sth
College Code
Gender
Roll
Roll no.
Subject Code
dtype: intes
ef Bucoccce
# Apply data transformations on at Least one of the variables. The purpose of this
# transformation should be one of the following reasons:
# to change the scale for better understanding of the variable,
# to convert a non-Linear relation into a Linear one, or
# to decrease the skewness and convert the distribution into a normal distribution
max_ist = df['1st’ ].max()
max_2nd = df{'2nd" }.max()
max_3rd = df['3rd" }.max()
max_ath = df['4th’ ].max()
max_Sth = df['Sth’ ].max()
print(max_1st, max_2nd, max_ard, max_4th, max_Sth)
9.15 9.21 9.59 9.31 9.46
cgpa_colunns = [‘1st', ‘2nd’, ‘3rd', ‘4th’, ‘Sth']
max_values = [max_1st, max_2nd, max 3rd, max Ath, max_Sth]
for col, max_value in zip(cgpa_colunns, max_values):
df[col + '_Percentage'] = (df[col] / max_value) * 100
print (df)173
174
475
176
7
173
174
175,
176
47
Ast 2nd 3rd
au 7,
6.48 5.
8.41 8.
7.33 6.
7.897.
wees
2
2
See
Subject
7
6.
6.
75
6.
4th_Percentage
79.
46.
88.
72.
2B.
79.
76.
61.
75.
57.
[178 rows x 15 colunns]
5th College Code
115
us
us
us
us
4th
68 7.11 7.43 8.18
98 4.15 4.29 4.96
24 7.52 8.25 7.75
83 6.33 6.79 6.89
347.22 7.32 7.46
55 7.67 7.39 8.65
41 6.59 7.11 7.38
28 5.89 5.71 6.50
10 6.81 7.00 6.92
81 6.52 5.39 7.00
Code 1st_percentage
16 88.6388
16 70.819672
16 91.912568
16 0.109290
16 86,229508
28 81. 748634
28 79.781421
28 68.852459
28 76.939891
28 73.224084
5th_Percentage
806660 86.469345
979484 52.431298
614393 1.923890
932331 72.832981
625134 78.858351
377014 91.437632
369495, 78.012685
331901, 68.710359
187970 73.150106
394737 73.995772
29,
7.
79.
a1.
69.
68.
7.
B.
2nd_Percentage
83.
64.
387622
260803
467978
158523
695983,
976113
598263,
186754
290119
941368
Gender Roll Roll no.
Female NaN 17020.0
Male NaN 17021.0
Fenale NaN 1702.0
Male NaN 17023.0
Male NaN 17024.0
F 17048. NaN
M 17049. NaN
M 17050.0 NaN
M 17051.0 NaN
M 17052.0 NaN
3rd_Percentage \
74,139729
43.274244
78.415016
6.006257
75.286757
79.979145
68.717414
61.418144
71.011478
67.987487
\