Data Wrangling, 2

lab experiment data science and big data analytics

Uploaded by

yashisolanki02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

7 views4 pages

Data Wrangling, 2

lab experiment data science and big data analytics

Uploaded by

yashisolanki02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 4

4 Create an “Academic performance” dataset of students and perform the fotLowing operc # Python. # Scan all variables for missing values and inconsistencies. If there are missing valu # inconsistencies, use any of the suitable techniques to deal with them. import pandas as pd import opendatasets as od import matplotlib.pylab as plt import numpy as np od. download("https: //www.kaggle. com/datasets/sankha1998/student-semester-result”) Please provide your Kaggle credentials to download this dataset. Learn more: http://b it. ly/kaggle-creds Your Kaggle username: Your Kaggle Key: Dounloading student-senester-result.zip to .\student-senester-result 100% || 2.41 /2.41k [00:00<00:00, 413kB/s] df = pd.read_csv("student-semester-result/data.csv") print (d#) Ist 2nd 3rd 4th Sth College Code Gender Roll Roll no. \ @ 8.11 7.68 7.11 7.43 8.18 115 Female NaN 17020.0 1 6.48 5.98 4.15 4.29 4.96 5 Male NaN 17021.0 2 8.41 8.24 7.52 8.25 7.75 115 Female NaN 1702.0 3 7.33 6.83 6.33 6.79 6.89 5 Male NaN 17023.0 4 7.89 7.34 7.22 7.32 7.46 5 Male NaN 17024.0 173 7.48 7.55 7.67 7.39 8.65 241 F 17048.6 NaN 174 7.38 6.41 6.59 7.11 7.38 241 M 17049.0 NaN 175 6.30 6.28 5.89 5.71 6.50 241 M 17050.0 NaN 176 7.04 7.10 6.81 7.08 6.92 241 M 17051.0 NaN 177 6.78 6.81 6.52 5.39 7.00 2a M 17052.0 NaN Subject Code 6 16 1 16 2 16 3 16 4 16 173 28 174 28 175 28 176 28 17 28 [178 rows x 10 columns] # Scan all variables for missing values and inconsistencies. If there are missing valu# inconsistencies, use any of the suitable techniques to deal with them. df.info() RangeIndex: 178 entries, @ to 177 Data columns (total 10 columns): # Column Non-Null Count Dtype 176 non-null —‘float64 2 1 174 non-null —float64 2 176 non-null —float64 3 173 non-null —float64 4 sth 172 non-null —float64a 5 College Code 178 non-null —intea 5 Gender 177 non-null object 7 Roll 132 non-null —float64 8B Roll no. 46 non-null —float64 39 Subject Code 178 non-null inte4 dtypes: float64(7), int64(2), object(1) memory usage: 14.0+ KB dF.isnull().sum() ast 2nd 3rd ath sth College Code Gender Roll Roll no. Subject Code dtype: integ of Rueaunen # calculate the mean vaule for all subject columns avg_ist_Marks = df["1st"].astype("Floate4").mean(axis = ‘avg_2nd Marks = df["2nd"].astype("Floats4").mean(axis = avg_3rd Marks “Floats4") mean (axis avg_ath Marks mean (axis avg_Sth_Marks = df[" print print “average marks of Ist Paper:", avg ist Marks) ‘Average marks of 2nd Paper avg_2nd_Marks) print("Average marks of 3rd Paper:", avg 3rd Marks) print("Average marks of 4th Paper:", avg 4th Marks) print("Average marks of Sth Paper:", avg Sth Marks) Average marks of 1st Paper: 7.038863636363637 Average marks of 2nd Paper: 6.943390804597701 Average marks of 3rd Paper: 6.6225 Average marks of 4th Paper: 7.027745664739886 Average marks of Sth Paper: 7.432558139534884 # replace NaN by mean value in "1st to Sth " column st"].replace(np.nan, avg ast Marks, inplace = True) ind") .replace(np-nan, avg 2nd Marks, inplace ied") .replace(np-nan, avg 3rd_Marks, inplaceafl afl th] -replace(np.nan, avg sth Marks, inplace = True) ‘th"]-replace(np.nan, avg sth_Marks, inplace = True) df-isnull().sum() ast 2nd 3rd ath sth College Code Gender Roll Roll no. Subject Code dtype: intes ef Bucoccce # Apply data transformations on at Least one of the variables. The purpose of this # transformation should be one of the following reasons: # to change the scale for better understanding of the variable, # to convert a non-Linear relation into a Linear one, or # to decrease the skewness and convert the distribution into a normal distribution max_ist = df['1st’ ].max() max_2nd = df{'2nd" }.max() max_3rd = df['3rd" }.max() max_ath = df['4th’ ].max() max_Sth = df['Sth’ ].max() print(max_1st, max_2nd, max_ard, max_4th, max_Sth) 9.15 9.21 9.59 9.31 9.46 cgpa_colunns = [‘1st', ‘2nd’, ‘3rd', ‘4th’, ‘Sth'] max_values = [max_1st, max_2nd, max 3rd, max Ath, max_Sth] for col, max_value in zip(cgpa_colunns, max_values): df[col + '_Percentage'] = (df[col] / max_value) * 100 print (df)173 174 475 176 7 173 174 175, 176 47 Ast 2nd 3rd au 7, 6.48 5. 8.41 8. 7.33 6. 7.897. wees 2 2 See Subject 7 6. 6. 75 6. 4th_Percentage 79. 46. 88. 72. 2B. 79. 76. 61. 75. 57. [178 rows x 15 colunns] 5th College Code 115 us us us us 4th 68 7.11 7.43 8.18 98 4.15 4.29 4.96 24 7.52 8.25 7.75 83 6.33 6.79 6.89 347.22 7.32 7.46 55 7.67 7.39 8.65 41 6.59 7.11 7.38 28 5.89 5.71 6.50 10 6.81 7.00 6.92 81 6.52 5.39 7.00 Code 1st_percentage 16 88.6388 16 70.819672 16 91.912568 16 0.109290 16 86,229508 28 81. 748634 28 79.781421 28 68.852459 28 76.939891 28 73.224084 5th_Percentage 806660 86.469345 979484 52.431298 614393 1.923890 932331 72.832981 625134 78.858351 377014 91.437632 369495, 78.012685 331901, 68.710359 187970 73.150106 394737 73.995772 29, 7. 79. a1. 69. 68. 7. B. 2nd_Percentage 83. 64. 387622 260803 467978 158523 695983, 976113 598263, 186754 290119 941368 Gender Roll Roll no. Female NaN 17020.0 Male NaN 17021.0 Fenale NaN 1702.0 Male NaN 17023.0 Male NaN 17024.0 F 17048. NaN M 17049. NaN M 17050.0 NaN M 17051.0 NaN M 17052.0 NaN 3rd_Percentage \ 74,139729 43.274244 78.415016 6.006257 75.286757 79.979145 68.717414 61.418144 71.011478 67.987487 \

DA Lab Manual r22
No ratings yet
DA Lab Manual r22
31 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
First 4
No ratings yet
First 4
11 pages
Lab2.2 Kritika
No ratings yet
Lab2.2 Kritika
10 pages
Ds&bda 1-14
No ratings yet
Ds&bda 1-14
95 pages
IP XII U1 Ch3 DataHandling (DataFrame) Final
No ratings yet
IP XII U1 Ch3 DataHandling (DataFrame) Final
45 pages
Student Analysis
No ratings yet
Student Analysis
16 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
Practical File Class Xii
No ratings yet
Practical File Class Xii
25 pages
DSBDA Prac2
No ratings yet
DSBDA Prac2
2 pages
Jamboree
No ratings yet
Jamboree
10 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Practical No-2
No ratings yet
Practical No-2
4 pages
Data Manipulation With Python Pandas 1700003764
No ratings yet
Data Manipulation With Python Pandas 1700003764
10 pages
Lab 2 - Basic Statistical Analysis
No ratings yet
Lab 2 - Basic Statistical Analysis
7 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
Python BATCH 11 Voice
No ratings yet
Python BATCH 11 Voice
13 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Exp5 - Naive - Ipynb - Colab
No ratings yet
Exp5 - Naive - Ipynb - Colab
4 pages
Data Wrangling 2
No ratings yet
Data Wrangling 2
4 pages
Davp Pyq 2023 Solution
No ratings yet
Davp Pyq 2023 Solution
15 pages
Pandas Tutorial1 - Informatics
No ratings yet
Pandas Tutorial1 - Informatics
43 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Jamboree
No ratings yet
Jamboree
56 pages
AP19110010030 Assignment-4 Lab
No ratings yet
AP19110010030 Assignment-4 Lab
9 pages
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
No ratings yet
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
33 pages
Dsbda Lab - 2.1 - 1736750718198
No ratings yet
Dsbda Lab - 2.1 - 1736750718198
9 pages
Computer Science Investigatory Project Analysis of Student Performance Using Pandas, Matplotlib, and SQL
No ratings yet
Computer Science Investigatory Project Analysis of Student Performance Using Pandas, Matplotlib, and SQL
11 pages
Jamboree Case Study
No ratings yet
Jamboree Case Study
24 pages
Assignment No 2
No ratings yet
Assignment No 2
8 pages
Codes
No ratings yet
Codes
44 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
Task2 - Colaboratory Dip
No ratings yet
Task2 - Colaboratory Dip
3 pages
Task2 - Colaboratory
No ratings yet
Task2 - Colaboratory
3 pages
IP12 Gargi
No ratings yet
IP12 Gargi
32 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Dataframe in Pandas
No ratings yet
Dataframe in Pandas
23 pages
Prog Found Final
No ratings yet
Prog Found Final
10 pages
DSDBAAssignment2 SUMEET
No ratings yet
DSDBAAssignment2 SUMEET
8 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Exp6b
No ratings yet
Exp6b
4 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Ds Pract 2 Vedanti
No ratings yet
Ds Pract 2 Vedanti
7 pages
Complete Case Analysis (CCA) : Advantages
No ratings yet
Complete Case Analysis (CCA) : Advantages
6 pages
ML Assignment No 3
No ratings yet
ML Assignment No 3
3 pages
Importing Libraries: Pandas PD Numpy NP Matplotlib - Pyplot PLT
No ratings yet
Importing Libraries: Pandas PD Numpy NP Matplotlib - Pyplot PLT
18 pages
Dav 2024 Pyq
No ratings yet
Dav 2024 Pyq
7 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Assessment Test
No ratings yet
Assessment Test
22 pages
Fifth Class Hands On - Jupyter Notebook
No ratings yet
Fifth Class Hands On - Jupyter Notebook
11 pages
Student Performance Analysis and Prediction
No ratings yet
Student Performance Analysis and Prediction
19 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
203 pages
Import Import As Import As: #Default To CSV
No ratings yet
Import Import As Import As: #Default To CSV
6 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
Python - Final 1
No ratings yet
Python - Final 1
17 pages

Data Wrangling, 2

Uploaded by

Data Wrangling, 2

Uploaded by

You might also like