0% found this document useful (0 votes)

46 views4 pages

Assignment 2

The document outlines a data cleaning process for a dataset named 'academics.csv' using Python libraries such as pandas and seaborn. It includes steps for loading the data, checking for missing values, filling them, visualizing outliers, and applying transformations to the 'raisedhands' column. Finally, the cleaned dataset is saved to a new CSV file named 'cleaned_academics.csv'.

Uploaded by

princethakur545454

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views4 pages

Assignment 2

Uploaded by

princethakur545454

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

ASSIGNMENT - 2

In [2]: #import required libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import PowerTransformer

In [3]: # Load the dataset

file_path = "academics.csv"
data = pd.read_csv(file_path)

In [4]: # Check the first few rows

print(data.head())

gender NationalITy PlaceofBirth StageID GradeID SectionID Topic \

0 M KW KuwaIT lowerlevel G-04 A IT
1 M KW KuwaIT lowerlevel G-04 A IT
2 M KW KuwaIT lowerlevel G-04 A IT
3 M KW KuwaIT lowerlevel G-04 A IT
4 M KW KuwaIT lowerlevel G-04 A IT

Semester Relation raisedhands VisITedResources AnnouncementsView \

0 F Father 15 16 2
1 F Father 20 20 3
2 F Father 10 7 0
3 F Father 30 25 5
4 F Father 40 50 12

Discussion ParentAnsweringSurvey ParentschoolSatisfaction \

0 20 Yes Good
1 25 Yes Good
2 30 No Bad
3 35 No Bad
4 50 No Bad

StudentAbsenceDays Class
0 Under-7 M
1 Under-7 M
2 Above-7 L
3 Above-7 L
4 Above-7 M

In [5]: # Display data information

print(data.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480 entries, 0 to 479
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 480 non-null object
1 NationalITy 480 non-null object
2 PlaceofBirth 480 non-null object
3 StageID 480 non-null object
4 GradeID 480 non-null object
5 SectionID 480 non-null object
6 Topic 480 non-null object
7 Semester 480 non-null object
8 Relation 480 non-null object
9 raisedhands 480 non-null int64
10 VisITedResources 480 non-null int64
11 AnnouncementsView 480 non-null int64
12 Discussion 480 non-null int64
13 ParentAnsweringSurvey 480 non-null object
14 ParentschoolSatisfaction 480 non-null object
15 StudentAbsenceDays 480 non-null object
16 Class 480 non-null object
dtypes: int64(4), object(13)
memory usage: 63.9+ KB
None

In [6]: # Check for missing values

data.isnull().sum()

Out[6]: gender 0
NationalITy 0
PlaceofBirth 0
StageID 0
GradeID 0
SectionID 0
Topic 0
Semester 0
Relation 0
raisedhands 0
VisITedResources 0
AnnouncementsView 0
Discussion 0
ParentAnsweringSurvey 0
ParentschoolSatisfaction 0
StudentAbsenceDays 0
Class 0
dtype: int64

In [11]: # Fill missing values in 'raisedhands' with mean

data['raisedhands'] = data['raisedhands'].fillna(data['raisedhands'].mean())

In [13]: # Fill missing values in 'gender' with mode

data['gender'] = data['gender'].fillna(data['gender'].mode()[0])

In [14]: #check for missing values in dataset

data.isnull().sum()

Out[14]: gender 0
NationalITy 0
PlaceofBirth 0
StageID 0
GradeID 0
SectionID 0
Topic 0
Semester 0
Relation 0
raisedhands 0
VisITedResources 0
AnnouncementsView 0
Discussion 0
ParentAnsweringSurvey 0
ParentschoolSatisfaction 0
StudentAbsenceDays 0
Class 0
dtype: int64

In [19]: #Visualize Outliers

# Boxplot for 'raisedhands'
sns.boxplot(data ,x = 'raisedhands')
plt.title('Boxplot of Raised Hands')
plt.show()

In [20]: #handle outliers

# Cap outliers using IQR (Interquartile Range)
Q1 = data['raisedhands'].quantile(0.25)
Q3 = data['raisedhands'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

In [21]: # Cap values outside bounds

data['raisedhands'] = np.clip(data['raisedhands'], lower_bound, upper_bound)

In [22]: # Apply PowerTransformer to 'raisedhands'

transformer = PowerTransformer(method='yeo-johnson')
data['raisedhands'] = transformer.fit_transform(data[['raisedhands']])

In [25]: # Save the cleaned dataset to a CSV file

data.to_csv("cleaned_academics.csv", index=False)

In [ ]:

This notebook was converted with convert.ploomber.io

C
100% (1)
C
15 pages
Bcba
67% (3)
Bcba
4 pages
CC7182 - Programming For Data Analytics
No ratings yet
CC7182 - Programming For Data Analytics
9 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
SMEA Template CY 2024
No ratings yet
SMEA Template CY 2024
525 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
No ratings yet
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
21 pages
The Global Burden of Disease
No ratings yet
The Global Burden of Disease
17 pages
12 Principles Brown
No ratings yet
12 Principles Brown
3 pages
Reporting Student Data File Format: STAAR End-of-Course
No ratings yet
Reporting Student Data File Format: STAAR End-of-Course
21 pages
Numpy NP Pandas PD Matplotlib - Pyplot PLT Seaborn SNS: "Merged - Uscol - TXT" ","
No ratings yet
Numpy NP Pandas PD Matplotlib - Pyplot PLT Seaborn SNS: "Merged - Uscol - TXT" ","
18 pages
Assignment No DA002 (Data Analysis & Reporting)
No ratings yet
Assignment No DA002 (Data Analysis & Reporting)
37 pages
How To Optimize Human Biology: Where Genome Editing and Artificial Intelligence Collide
No ratings yet
How To Optimize Human Biology: Where Genome Editing and Artificial Intelligence Collide
27 pages
Teacher Questionnair Analysis
No ratings yet
Teacher Questionnair Analysis
49 pages
Sol Cs Pb1 All KV Regions
No ratings yet
Sol Cs Pb1 All KV Regions
138 pages
Student Grade Prediction
No ratings yet
Student Grade Prediction
9 pages
FULL REPORT (DSC MATH STUD) 15nov2019
No ratings yet
FULL REPORT (DSC MATH STUD) 15nov2019
16 pages
DMEPA Toolkit Final
No ratings yet
DMEPA Toolkit Final
32 pages
Student Dropout
No ratings yet
Student Dropout
38 pages
00 - Project - Your First Data Science Project - Jupyter Notebook
No ratings yet
00 - Project - Your First Data Science Project - Jupyter Notebook
8 pages
Tarea 4
No ratings yet
Tarea 4
6 pages
Assignment-Data Preprocessing (All)
No ratings yet
Assignment-Data Preprocessing (All)
1 page
Exploratory Data Analysis (EDA) - 2.1
No ratings yet
Exploratory Data Analysis (EDA) - 2.1
3 pages
00 - Lesson - Data Science Workflow - Jupyter Notebook
No ratings yet
00 - Lesson - Data Science Workflow - Jupyter Notebook
6 pages
Abuja Project
No ratings yet
Abuja Project
4 pages
Sahil Project
No ratings yet
Sahil Project
25 pages
Sahil Project
No ratings yet
Sahil Project
26 pages
Ip Project
No ratings yet
Ip Project
21 pages
Xii PB2 MS CS Set2
No ratings yet
Xii PB2 MS CS Set2
7 pages
Student Analysis
No ratings yet
Student Analysis
16 pages
Students Exam Scores Analysis - Ipynb
No ratings yet
Students Exam Scores Analysis - Ipynb
4 pages
PDFen
No ratings yet
PDFen
8 pages
Ip Project
No ratings yet
Ip Project
17 pages
IP12 Gargi
No ratings yet
IP12 Gargi
32 pages
Our Lady of The Pillar College - San Manuel, Inc District No. 3, San Manuel, Isabela
No ratings yet
Our Lady of The Pillar College - San Manuel, Inc District No. 3, San Manuel, Isabela
16 pages
DSBDA Assignment2
No ratings yet
DSBDA Assignment2
12 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Tutorial 15 On Questionnaire Data Display and Likert
No ratings yet
Tutorial 15 On Questionnaire Data Display and Likert
12 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
COM2007 CaseStudy Sample
No ratings yet
COM2007 CaseStudy Sample
44 pages
BME303 Lab4 NinaSawaf
No ratings yet
BME303 Lab4 NinaSawaf
10 pages
230103-ECON209 S2025 Lab 2.ipynb-Colab
No ratings yet
230103-ECON209 S2025 Lab 2.ipynb-Colab
10 pages
Smea Template Quarter 4 Cy 2024 Edited
No ratings yet
Smea Template Quarter 4 Cy 2024 Edited
526 pages
FDS Important Ques
No ratings yet
FDS Important Ques
2 pages
Practical No-2
No ratings yet
Practical No-2
4 pages
CSC Investigatory Project
No ratings yet
CSC Investigatory Project
11 pages
Case Study 1
No ratings yet
Case Study 1
4 pages
Exam 1 690C 2020 SOLUTIONS Stata
No ratings yet
Exam 1 690C 2020 SOLUTIONS Stata
6 pages
Data Wrangling 2
No ratings yet
Data Wrangling 2
4 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Xii - CS - WC - MS - Set 2
No ratings yet
Xii - CS - WC - MS - Set 2
5 pages
7 - Geometria - Pma - 1 Periodo - 2025
No ratings yet
7 - Geometria - Pma - 1 Periodo - 2025
4 pages
Weebly Data
No ratings yet
Weebly Data
19 pages
Introduction To Zabbix
No ratings yet
Introduction To Zabbix
55 pages
Class 12 Cs Ms 3rd Preboard
No ratings yet
Class 12 Cs Ms 3rd Preboard
5 pages
Student Exam Score Analysis
No ratings yet
Student Exam Score Analysis
14 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
22 pages
MS - 12CS - PB-I - 23-24 Set 2
No ratings yet
MS - 12CS - PB-I - 23-24 Set 2
6 pages
Sample Paper 4 - AnswerKey
No ratings yet
Sample Paper 4 - AnswerKey
6 pages
#Source Code
No ratings yet
#Source Code
6 pages
Bayesian Model - Statistics
No ratings yet
Bayesian Model - Statistics
29 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
12th - Mid-Term-IP
No ratings yet
12th - Mid-Term-IP
5 pages
Codealpha Studentseda
No ratings yet
Codealpha Studentseda
2 pages
End of Term Exam Timetable 20222023
No ratings yet
End of Term Exam Timetable 20222023
3 pages
Year 2 English Map1
No ratings yet
Year 2 English Map1
1 page
Name of The Guide: Educational Qualification
No ratings yet
Name of The Guide: Educational Qualification
2 pages
Dimensions Properties ... BS 4848-4 1972 (Superseded by BS en 10056-11999) Dim - Prop
100% (1)
Dimensions Properties ... BS 4848-4 1972 (Superseded by BS en 10056-11999) Dim - Prop
2 pages
Movie Quiz The Butterfly Effect Movie Quiz The Butterfly Effect
No ratings yet
Movie Quiz The Butterfly Effect Movie Quiz The Butterfly Effect
1 page
Joseph Henry
No ratings yet
Joseph Henry
11 pages
SystemC and Codesign Additional Lectures
No ratings yet
SystemC and Codesign Additional Lectures
58 pages
Watson 1999 Liberal Communitarianism As Political Theory
No ratings yet
Watson 1999 Liberal Communitarianism As Political Theory
8 pages
MB0043 Human Resource Management Units 1-5
No ratings yet
MB0043 Human Resource Management Units 1-5
22 pages
TIER 2 - Premier Talent: Capgemini Exceller '22
No ratings yet
TIER 2 - Premier Talent: Capgemini Exceller '22
3 pages
Angka Penting
No ratings yet
Angka Penting
45 pages
Development and Validation of The Positive Evaluation Core Beliefs Scale For Social Anxiety (Gavril Andreea)
No ratings yet
Development and Validation of The Positive Evaluation Core Beliefs Scale For Social Anxiety (Gavril Andreea)
7 pages
Soalan Jawapan Yg Mungkin Akan Ditanya
No ratings yet
Soalan Jawapan Yg Mungkin Akan Ditanya
10 pages
7 Visual Risk Reporting Templates
No ratings yet
7 Visual Risk Reporting Templates
8 pages
FEE 532 Power System Stability II
No ratings yet
FEE 532 Power System Stability II
30 pages
Bidding Principles: Programme Trading
No ratings yet
Bidding Principles: Programme Trading
6 pages
Psychopathy and The Five-Factor Model of Personality: A Replication and Extension
No ratings yet
Psychopathy and The Five-Factor Model of Personality: A Replication and Extension
11 pages
Poster Pfe Bardoux 2
No ratings yet
Poster Pfe Bardoux 2
1 page
American Dream Essay
No ratings yet
American Dream Essay
4 pages
DLL - Mathematics 6 - Q2 - W4
No ratings yet
DLL - Mathematics 6 - Q2 - W4
6 pages
CEng 6104-Course Outline March 2023
No ratings yet
CEng 6104-Course Outline March 2023
2 pages
Venn Diagram
No ratings yet
Venn Diagram
2 pages
GRCon17 Program 1
No ratings yet
GRCon17 Program 1
1 page

Assignment 2

Uploaded by

Assignment 2

Uploaded by

ASSIGNMENT - 2

In [2]: #import required libraries

In [3]: # Load the dataset

In [4]: # Check the first few rows

gender NationalITy PlaceofBirth StageID GradeID SectionID Topic \

Semester Relation raisedhands VisITedResources AnnouncementsView \

Discussion ParentAnsweringSurvey ParentschoolSatisfaction \

In [5]: # Display data information

In [6]: # Check for missing values

In [11]: # Fill missing values in 'raisedhands' with mean

In [13]: # Fill missing values in 'gender' with mode

In [14]: #check for missing values in dataset

In [19]: #Visualize Outliers

In [20]: #handle outliers

In [21]: # Cap values outside bounds

In [22]: # Apply PowerTransformer to 'raisedhands'

In [25]: # Save the cleaned dataset to a CSV file

This notebook was converted with convert.ploomber.io

You might also like