Data Manipulation With Python Pandas 1700003764
Data Manipulation With Python Pandas 1700003764
[ ]: import pandas as pd
print(df.head())
[ ]:
[4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 1000 non-null object
1 race/ethnicity 1000 non-null object
2 parental level of education 1000 non-null object
3 lunch 1000 non-null object
4 test preparation course 1000 non-null object
1
5 math score 1000 non-null int64
6 reading score 1000 non-null int64
7 writing score 1000 non-null int64
dtypes: int64(3), object(5)
memory usage: 62.6+ KB
[5]: df.describe()
[6]: df.shape
[6]: (1000, 8)
[7]: df.columns
[8]: df.values
[10]: df.index
2
611 female group C some college standard
581 female group E some high school standard
583 female group D associate's degree standard
584 female group D some college standard
3
[20]: 0 bachelor's degree
1 some college
2 master's degree
3 associate's degree
4 some college
Name: parental level of education, dtype: object
[27]: 0 False
1 False
2 True
3 False
4 False
…
995 True
996 False
997 False
998 False
999 True
Name: math score, Length: 1000, dtype: bool
4
596 male group B high school free/reduced
5
2 male group B master's degree standard
4 male group C some college standard
5 male group B associate's degree standard
[48]: df.head()
percentage
0 0.726667
1 0.823333
2 0.926667
3 0.493333
4 0.763333
[3]: 69.169
[5]: 17
[6]: 100
6
[8]: def Func(column):
return column.quantile(0.3)
df["reading score"].agg(Func)
[8]: 62.0
def func2(column):
return column.quantile(0.4)
[11]: 0 72
1 141
2 231
3 278
4 354
…
995 65823
996 65885
997 65944
998 66012
999 66089
Name: math score, Length: 1000, dtype: int64
7
8 male group D high school free/reduced
14 female group A master's degree standard
29 female group D master's degree standard
32 female group E master's degree free/reduced
34 male group E some college standard
[14]: df["gender"].value_counts()
[14]: gender
female 518
male 482
Name: count, dtype: int64
[15]: gender
female 518
male 482
Name: count, dtype: int64
8
[25]: df[df["race/ethnicity"] == "group B"]["math score"].max()
[25]: 97
[26]: gender
female 63.633205
male 68.728216
Name: math score, dtype: float64
9
df.pivot_table(values= "reading score", index = "race/ethnicity", aggfunc=np.
↪median)
race/ethnicity All
gender
female 72.467181
male 63.311203
All 68.054000
[ ]:
10