0% found this document useful (0 votes)

10 views10 pages

Data Manipulation With Python Pandas 1700003764

Uploaded by

bukkyojo60

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Data Manipulation With Python Pandas 1700003764

Uploaded by

bukkyojo60

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

index

November 14, 2023

1 Data Manipulation using Python Pandas

——Asad Mujeeb——

[ ]: import pandas as pd

[2]: df = pd.read_csv(r"C:\Users\Asad Raja\OneDrive\Desktop\code\StudentsPerformance.

↪csv")

print(df.head())

gender race/ethnicity parental level of education lunch

0 female group B bachelor's degree standard \
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

[ ]:

[4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 1000 non-null object
1 race/ethnicity 1000 non-null object
2 parental level of education 1000 non-null object
3 lunch 1000 non-null object
4 test preparation course 1000 non-null object

1
5 math score 1000 non-null int64
6 reading score 1000 non-null int64
7 writing score 1000 non-null int64
dtypes: int64(3), object(5)
memory usage: 62.6+ KB

[5]: df.describe()

[5]: math score reading score writing score

count 1000.00000 1000.000000 1000.000000
mean 66.08900 69.169000 68.054000
std 15.16308 14.600192 15.195657
min 0.00000 17.000000 10.000000
25% 57.00000 59.000000 57.750000
50% 66.00000 70.000000 69.000000
75% 77.00000 79.000000 79.000000
max 100.00000 100.000000 100.000000

[6]: df.shape

[6]: (1000, 8)

[7]: df.columns

[7]: Index(['gender', 'race/ethnicity', 'parental level of education', 'lunch',

'test preparation course', 'math score', 'reading score',
'writing score'],
dtype='object')

[8]: df.values

[8]: array([['female', 'group B', "bachelor's degree", …, 72, 72, 74],

['female', 'group C', 'some college', …, 69, 90, 88],
['female', 'group B', "master's degree", …, 90, 95, 93],
…,
['female', 'group C', 'high school', …, 59, 71, 65],
['female', 'group D', 'some college', …, 68, 78, 77],
['female', 'group D', 'some college', …, 77, 86, 86]],
dtype=object)

[10]: df.index

[10]: RangeIndex(start=0, stop=1000, step=1)

[15]: df.sort_values("lunch", ascending=False).head()

[15]: gender race/ethnicity parental level of education lunch

0 female group B bachelor's degree standard \

2
611 female group C some college standard
581 female group E some high school standard
583 female group D associate's degree standard
584 female group D some college standard

test preparation course math score reading score writing score

0 none 72 72 74
611 none 58 59 66
581 none 77 79 80
583 completed 73 75 80
584 none 69 77 77

[17]: df.sort_values(["lunch", "writing score"]).head()

[17]: gender race/ethnicity parental level of education lunch

59 female group C some high school free/reduced \
596 male group B high school free/reduced
327 male group A some college free/reduced
980 female group B high school free/reduced
211 male group C some college free/reduced

test preparation course math score reading score writing score

59 none 0 17 10
596 none 30 24 15
327 none 28 23 19
980 none 8 24 23
211 none 35 28 27

[18]: df.sort_values(["lunch","gender"], ascending= [True, False]).head()

[18]: gender race/ethnicity parental level of education lunch

3 male group A associate's degree free/reduced \
7 male group B some college free/reduced
8 male group D high school free/reduced
18 male group C master's degree free/reduced
24 male group D bachelor's degree free/reduced

test preparation course math score reading score writing score

3 none 47 57 44
7 none 40 43 39
8 completed 64 64 67
18 completed 46 42 46
24 completed 74 71 80

[20]: df["parental level of education"].head()

3
[20]: 0 bachelor's degree
1 some college
2 master's degree
3 associate's degree
4 some college
Name: parental level of education, dtype: object

[21]: df[["math score", "reading score"]].head()

[21]: math score reading score

0 72 72
1 69 90
2 90 95
3 47 57
4 76 78

[26]: cols_to_subset = ["math score", "reading score"]

df[cols_to_subset].head()

[26]: math score reading score

0 72 72
1 69 90
2 90 95
3 47 57
4 76 78

[27]: df["math score"] > 76

[27]: 0 False
1 False
2 True
3 False
4 False
…
995 True
996 False
997 False
998 False
999 True
Name: math score, Length: 1000, dtype: bool

[30]: df[df["reading score"] < 30].head()

[30]: gender race/ethnicity parental level of education lunch

59 female group C some high school free/reduced \
76 male group E some high school standard
211 male group C some college free/reduced
327 male group A some college free/reduced

4
596 male group B high school free/reduced

test preparation course math score reading score writing score

59 none 0 17 10
76 none 30 26 22
211 none 35 28 27
327 none 28 23 19
596 none 30 24 15

[33]: df[df["parental level of education"] == "high school"].head()

[33]: gender race/ethnicity parental level of education lunch

8 male group D high school free/reduced \
9 female group B high school free/reduced
12 female group B high school standard
16 male group C high school standard
20 male group D high school standard

test preparation course math score reading score writing score

8 completed 64 64 67
9 none 38 60 50
12 none 65 81 73
16 none 88 89 86
20 none 66 69 63

[35]: df[df["race/ethnicity"] < "group C"].head()

[35]: gender race/ethnicity parental level of education lunch

0 female group B bachelor's degree standard \
2 female group B master's degree standard
3 male group A associate's degree free/reduced
5 female group B associate's degree standard
6 female group B some college standard

test preparation course math score reading score writing score

0 none 72 72 74
2 none 90 95 93
3 none 47 57 44
5 none 71 83 78
6 completed 88 95 92

[46]: is_lunch = df["lunch"].isin(["standard"])

df[is_lunch].head()

[46]: gender race/ethnicity parental level of education lunch

0 male group B bachelor's degree standard \
1 male group C some college standard

5
2 male group B master's degree standard
4 male group C some college standard
5 male group B associate's degree standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
4 none 76 78 75
5 none 71 83 78

[47]: df["percentage"] = (df["math score"] + df["reading score"] + df["writing␣

↪score"]) / 300

[48]: df.head()

[48]: gender race/ethnicity parental level of education lunch

0 male group B bachelor's degree standard \
1 male group C some college standard
2 male group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score

0 none 72 72 74 \
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

percentage
0 0.726667
1 0.823333
2 0.926667
3 0.493333
4 0.763333

[3]: df["reading score"].mean()

[3]: 69.169

[5]: df["reading score"].min()

[5]: 17

[6]: df["reading score"].max()

[6]: 100

6
[8]: def Func(column):
return column.quantile(0.3)
df["reading score"].agg(Func)

[8]: 62.0

[9]: df[['reading score', "writing score"]].agg(Func)

[9]: reading score 62.0

writing score 60.0
dtype: float64

[10]: def func1(column):

return column.quantile(0.3)

def func2(column):
return column.quantile(0.4)

df["reading score"].agg([func1, func2])

[10]: func1 62.0

func2 66.0
Name: reading score, dtype: float64

[11]: # commulative sum

df["math score"].cumsum()

[11]: 0 72
1 141
2 231
3 278
4 354
…
995 65823
996 65885
997 65944
998 66012
999 66089
Name: math score, Length: 1000, dtype: int64

[13]: df.drop_duplicates(subset =["gender", "race/ethnicity"])

[13]: gender race/ethnicity parental level of education lunch

0 female group B bachelor's degree standard \
1 female group C some college standard
3 male group A associate's degree free/reduced
4 male group C some college standard
7 male group B some college free/reduced

7
8 male group D high school free/reduced
14 female group A master's degree standard
29 female group D master's degree standard
32 female group E master's degree free/reduced
34 male group E some college standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
3 none 47 57 44
4 none 76 78 75
7 none 40 43 39
8 completed 64 64 67
14 none 50 53 58
29 none 62 70 75
32 none 56 72 65
34 none 97 87 82

[14]: df["gender"].value_counts()

[14]: gender
female 518
male 482
Name: count, dtype: int64

[15]: df["gender"].value_counts(sort = True)

[15]: gender
female 518
male 482
Name: count, dtype: int64

[16]: df["reading score"].value_counts(normalize=True)

[16]: reading score

72 0.034
74 0.033
64 0.032
67 0.030
73 0.030
…
28 0.001
26 0.001
17 0.001
32 0.001
40 0.001
Name: proportion, Length: 72, dtype: float64

8
[25]: df[df["race/ethnicity"] == "group B"]["math score"].max()

[25]: 97

[26]: df.groupby("gender")["math score"].mean()

[26]: gender
female 63.633205
male 68.728216
Name: math score, dtype: float64

[28]: df.groupby("race/ethnicity")["writing score"].agg([min, max, sum])

[28]: min max sum

race/ethnicity
group A 19 97 5578
group B 15 96 12464
group C 10 100 21637
group D 32 100 18378
group E 22 100 9997

[29]: df.groupby(["gender", "race/ethnicity"])["math score"].mean()

[29]: gender race/ethnicity

female group A 58.527778
group B 61.403846
group C 62.033333
group D 65.248062
group E 70.811594
male group A 63.735849
group B 65.930233
group C 67.611511
group D 69.413534
group E 76.746479
Name: math score, dtype: float64

[5]: df.pivot_table(values = "math score", index = "race/ethnicity")

[5]: math score

race/ethnicity
group A 61.629213
group B 63.452632
group C 64.463950
group D 67.362595
group E 73.821429

[8]: import numpy as np

9
df.pivot_table(values= "reading score", index = "race/ethnicity", aggfunc=np.
↪median)

[8]: reading score

race/ethnicity
group A 64
group B 67
group C 71
group D 71
group E 74

[9]: df.pivot_table(values = "writing score", index = "race/ethnicity", aggfunc= [np.

↪mean, np.median])

[9]: mean median

writing score writing score
race/ethnicity
group A 62.674157 62
group B 65.600000 67
group C 67.827586 68
group D 70.145038 72
group E 71.407143 72

[11]: df.pivot_table(values = "writing score", index = "gender", columns="race/

↪ethnicity", aggfunc=np.median)

[11]: race/ethnicity group A group B group C group D group E

gender
female 67.0 71.5 74.0 76.0 76.0
male 60.0 60.0 62.0 67.0 70.0

[13]: df.pivot_table(values = "writing score", index = "gender", columns = "race/

↪ethnicity",

fill_value=0, margins=True, aggfunc=np.mean)

[13]: race/ethnicity group A group B group C group D group E

gender
female 67.861111 70.048077 71.777778 75.023256 75.536232 \
male 59.150943 60.220930 62.712230 65.413534 67.394366
All 62.674157 65.600000 67.827586 70.145038 71.407143

race/ethnicity All
gender
female 72.467181
male 63.311203
All 68.054000

[ ]:

DSBDA Prac2
No ratings yet
DSBDA Prac2
2 pages
PMA Experiment 1
No ratings yet
PMA Experiment 1
9 pages
Student Analysis
No ratings yet
Student Analysis
16 pages
A09Ass02 - Jupyter Notebook
No ratings yet
A09Ass02 - Jupyter Notebook
11 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
22 pages
Vertopal.com Practise
No ratings yet
Vertopal.com Practise
9 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Jamboree
No ratings yet
Jamboree
10 pages
Ssce-2025 Practical Test Solution
No ratings yet
Ssce-2025 Practical Test Solution
7 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
Students Performance Analysis
No ratings yet
Students Performance Analysis
12 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Practical File Class Xii
No ratings yet
Practical File Class Xii
25 pages
Data Wrangling, 2
No ratings yet
Data Wrangling, 2
4 pages
Ds&bda 1-14
No ratings yet
Ds&bda 1-14
95 pages
IP XII U1 Ch3 DataHandling (DataFrame) Final
No ratings yet
IP XII U1 Ch3 DataHandling (DataFrame) Final
45 pages
DSBDA02
No ratings yet
DSBDA02
8 pages
Jamboree Case Study
No ratings yet
Jamboree Case Study
24 pages
Payal - 2 - Practical (1) - Edited
No ratings yet
Payal - 2 - Practical (1) - Edited
9 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Aiclass
No ratings yet
Aiclass
9 pages
Students Performance
No ratings yet
Students Performance
17 pages
Coding
No ratings yet
Coding
6 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
Codealpha Studentseda
No ratings yet
Codealpha Studentseda
2 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
UQ21CA632B Unit2 Class12&13 Pandas Basics
No ratings yet
UQ21CA632B Unit2 Class12&13 Pandas Basics
11 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
First 4
No ratings yet
First 4
11 pages
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
No ratings yet
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
33 pages
Tutorial 2 QB & QP
No ratings yet
Tutorial 2 QB & QP
4 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Aiml
No ratings yet
Aiml
27 pages
Data Cleaning
No ratings yet
Data Cleaning
83 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
CMSC320 Final Project
No ratings yet
CMSC320 Final Project
20 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
35 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Dataframe Extended-Ii
No ratings yet
Dataframe Extended-Ii
19 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
Students Exam Scores Analysis - Ipynb
No ratings yet
Students Exam Scores Analysis - Ipynb
4 pages
Exp6b
No ratings yet
Exp6b
4 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Student Performance in Exams
No ratings yet
Student Performance in Exams
71 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
8 pages
Jamboree
No ratings yet
Jamboree
56 pages
IP12 Gargi
No ratings yet
IP12 Gargi
32 pages
Exp5 - Naive - Ipynb - Colab
No ratings yet
Exp5 - Naive - Ipynb - Colab
4 pages
Lab 2 - Basic Statistical Analysis
No ratings yet
Lab 2 - Basic Statistical Analysis
7 pages
Student Marks Register System
No ratings yet
Student Marks Register System
27 pages
Ip
No ratings yet
Ip
6 pages
Paper 2.
No ratings yet
Paper 2.
5 pages
Project 4
No ratings yet
Project 4
8 pages
Student Dropout
No ratings yet
Student Dropout
38 pages
Ip Practical 2024
No ratings yet
Ip Practical 2024
12 pages
DSBDA Assignment 2
No ratings yet
DSBDA Assignment 2
10 pages
CBSE Class 10 Data Science Previous Years Unsolved Question Papers
From Everand
CBSE Class 10 Data Science Previous Years Unsolved Question Papers
Manish Soni
No ratings yet
PostgreSQL Cheatsheet
No ratings yet
PostgreSQL Cheatsheet
1 page
Take Cyber Security Quiz To Test Your Knowledge
No ratings yet
Take Cyber Security Quiz To Test Your Knowledge
5 pages
Gaisano Mall - Permit Form2 - Div Summit, Inc PDF
No ratings yet
Gaisano Mall - Permit Form2 - Div Summit, Inc PDF
1 page
Lab 6
No ratings yet
Lab 6
5 pages
Milestone Systems: Quick Guide: Failover Clustering
No ratings yet
Milestone Systems: Quick Guide: Failover Clustering
15 pages
What Are The Differences Between SIMATIC Modbus - TCP Redundant V1 and V2 - ID 63201104
No ratings yet
What Are The Differences Between SIMATIC Modbus - TCP Redundant V1 and V2 - ID 63201104
1 page
Senior High School
No ratings yet
Senior High School
33 pages
TI 89 User Defined Functions & Menus
No ratings yet
TI 89 User Defined Functions & Menus
9 pages
Loghh
No ratings yet
Loghh
7 pages
Containerization Cloud Computing Based Inspiration Technology For Adoption Through Docker and Kubernetes
No ratings yet
Containerization Cloud Computing Based Inspiration Technology For Adoption Through Docker and Kubernetes
8 pages
Practical Usage of MVS REXX by Anthony S. Rudd MS (Hons) (Auth.)
No ratings yet
Practical Usage of MVS REXX by Anthony S. Rudd MS (Hons) (Auth.)
320 pages
PPS GTU Papers
No ratings yet
PPS GTU Papers
22 pages
Chapter 8A: Application Software
No ratings yet
Chapter 8A: Application Software
36 pages
Working Process
No ratings yet
Working Process
25 pages
Capcut For Ios
No ratings yet
Capcut For Ios
7 pages
Sidexis 4 User Menule
No ratings yet
Sidexis 4 User Menule
366 pages
CCS 1203 Introduction To Computer Programming CAT
No ratings yet
CCS 1203 Introduction To Computer Programming CAT
2 pages
Canebot Fertispray-A Smart Controlled Sugarcane For Iris 2024
No ratings yet
Canebot Fertispray-A Smart Controlled Sugarcane For Iris 2024
7 pages
B.pharma. CAP Practical No 2
No ratings yet
B.pharma. CAP Practical No 2
4 pages
Amazon Simple Storage Service: Developer Guide API Version 2006-03-01
No ratings yet
Amazon Simple Storage Service: Developer Guide API Version 2006-03-01
924 pages
Word Document: Title Bar Menu Bar Standard Toolbar
No ratings yet
Word Document: Title Bar Menu Bar Standard Toolbar
9 pages
Placement Papers
No ratings yet
Placement Papers
18 pages
Classical and Object-Oriented Software Engineering 7th Edition Stephen Schach Solutions Manualpdf Download
100% (11)
Classical and Object-Oriented Software Engineering 7th Edition Stephen Schach Solutions Manualpdf Download
31 pages
Product Acceptance Plan
No ratings yet
Product Acceptance Plan
5 pages
Tut - 01 CMT221
No ratings yet
Tut - 01 CMT221
4 pages
ICT704 Assessment 1
No ratings yet
ICT704 Assessment 1
8 pages
Leica Led3000 Led5000 Manual en
No ratings yet
Leica Led3000 Led5000 Manual en
107 pages
Project - Report - Shashwat (1) (1) (AutoRecovered)
No ratings yet
Project - Report - Shashwat (1) (1) (AutoRecovered)
13 pages
Ashish Singh: Bangalore
No ratings yet
Ashish Singh: Bangalore
2 pages
How To Configure New Condition Type in Sales Order Pricing
No ratings yet
How To Configure New Condition Type in Sales Order Pricing
35 pages

Data Manipulation With Python Pandas 1700003764

Uploaded by

Data Manipulation With Python Pandas 1700003764

Uploaded by

index

November 14, 2023

1 Data Manipulation using Python Pandas

[2]: df = pd.read_csv(r"C:\Users\Asad Raja\OneDrive\Desktop\code\StudentsPerformance.

gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[5]: math score reading score writing score

[7]: Index(['gender', 'race/ethnicity', 'parental level of education', 'lunch',

[8]: array([['female', 'group B', "bachelor's degree", …, 72, 72, 74],

[10]: RangeIndex(start=0, stop=1000, step=1)

[15]: df.sort_values("lunch", ascending=False).head()

[15]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[17]: df.sort_values(["lunch", "writing score"]).head()

[17]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[18]: df.sort_values(["lunch","gender"], ascending= [True, False]).head()

[18]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[20]: df["parental level of education"].head()

[21]: df[["math score", "reading score"]].head()

[21]: math score reading score

[26]: cols_to_subset = ["math score", "reading score"]

[26]: math score reading score

[27]: df["math score"] > 76

[30]: df[df["reading score"] < 30].head()

[30]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[33]: df[df["parental level of education"] == "high school"].head()

[33]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[35]: df[df["race/ethnicity"] < "group C"].head()

[35]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[46]: is_lunch = df["lunch"].isin(["standard"])

[46]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[47]: df["percentage"] = (df["math score"] + df["reading score"] + df["writing␣

[48]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[3]: df["reading score"].mean()

[5]: df["reading score"].min()

[6]: df["reading score"].max()

[9]: df[['reading score', "writing score"]].agg(Func)

[9]: reading score 62.0

[10]: def func1(column):

df["reading score"].agg([func1, func2])

[10]: func1 62.0

[11]: # commulative sum

[13]: df.drop_duplicates(subset =["gender", "race/ethnicity"])

[13]: gender race/ethnicity parental level of education lunch

test preparation course math score reading score writing score

[15]: df["gender"].value_counts(sort = True)

[16]: df["reading score"].value_counts(normalize=True)

[16]: reading score

[26]: df.groupby("gender")["math score"].mean()

[28]: df.groupby("race/ethnicity")["writing score"].agg([min, max, sum])

[28]: min max sum

[29]: df.groupby(["gender", "race/ethnicity"])["math score"].mean()

[29]: gender race/ethnicity

[5]: df.pivot_table(values = "math score", index = "race/ethnicity")

[5]: math score

[8]: import numpy as np

[8]: reading score

[9]: df.pivot_table(values = "writing score", index = "race/ethnicity", aggfunc= [np.

[9]: mean median

[11]: df.pivot_table(values = "writing score", index = "gender", columns="race/

[11]: race/ethnicity group A group B group C group D group E

[13]: df.pivot_table(values = "writing score", index = "gender", columns = "race/

fill_value=0, margins=True, aggfunc=np.mean)

[13]: race/ethnicity group A group B group C group D group E

You might also like