DS - Assig-03-Part-I - Jupyter Notebook
DS - Assig-03-Part-I - Jupyter Notebook
Advanc
#Measures of central tendency include mean, median, and the mode, whi
#We will cover the topics given below:
#Mean
#Median
#Mode
#Standard Deviation
#Variance
#Interquartile Range
#Skewness
(614, 14)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Loan_ID 614 non-null object
1 Gender 601 non-null object
2 Age 513 non-null float64
3 Married 611 non-null object
4 Dependents 599 non-null object
5 Education 614 non-null object
6 Self_Employed 582 non-null object
7 ApplicantIncome 614 non-null int64
8 CoapplicantIncome 614 non-null float64
9 LoanAmount 592 non-null float64
10 Loan_Amount_Term 600 non-null float64
11 Credit_History 564 non-null float64
12 Property_Area 614 non-null object
13 Loan_Status 614 non-null object
dtypes: float64(5), int64(1), object(8)
memory usage: 67.3+ KB
None
In [3]: df.mean()
In [4]: print(df.loc[:,'Age'].mean())
print(df.loc[:,'ApplicantIncome'].mean())
32.10136452241716
5403.459283387622
Out[5]: 0 1249.000000
1 1316.000000
2 575.333333
3 908.000000
4 1088.666667
5 1713.500000
6 722.500000
7 1017.166667
8 1017.833333
9 4092.833333
dtype: float64
In [6]: df.median(axis=1)
Out[6]: 0 35.0
1 360.0
2 45.5
3 240.0
4 85.5
...
609 48.0
610 45.0
611 246.5
612 116.0
613 79.5
Length: 614, dtype: float64
In [7]: df.median()
Out[7]: Age 30.0
ApplicantIncome 3812.5
CoapplicantIncome 1188.5
LoanAmount 128.0
Loan_Amount_Term 360.0
Credit_History 1.0
dtype: float64
In [8]: #to compute a median of a some column
print(df.loc[:,'Age'].median())
print(df.loc[:,'ApplicantIncome'].median())
df.median(axis = 1)[0:10]
30.0
3812.5
Out[8]: 0 35.0
1 360.0
2 45.5
3 240.0
4 85.5
5 313.5
6 227.5
7 259.0
8 264.0
9 354.5
dtype: float64
In [9]: df.mode()
In [12]: print(df.loc[:,'Age'].std())
print(df.loc[:,'ApplicantIncome'].std())
#calculate the standard deviation of the first five rows
df.std(axis = 1)[0:10]
7.732178229043358
6109.041673387174
Out[12]: 0 2575.928085
1 1921.240355
2 1195.703252
3 1219.011567
4 2409.946528
5 2430.485610
6 974.539224
7 1373.763650
8 1569.760799
9 6081.668239
dtype: float64
Out[14]: nan
In [3]: print(df.skew())
Age 0.712146
ApplicantIncome 6.539513
CoapplicantIncome 7.491531
LoanAmount 2.677552
Loan_Amount_Term -2.362414
Credit_History -1.882361
dtype: float64
In [5]: df.describe()
In [6]: df.describe(include='all')
Age
24.0 22 21 21 19 22 20 22
26.0 87 86 87 87 87 81 87
27.0 23 23 22 22 23 22 23
28.0 4 4 4 4 4 2 4
30.0 53 53 53 52 53 52 53
31.0 23 22 23 20 23 23 23
32.0 18 18 18 18 18 18 18
35.0 7 7 7 6 7 7 7
37.0 18 17 18 18 18 16 18
38.0 23 23 23 23 23 23 23
40.0 22 20 22 22 22 20 22
42.0 4 4 4 4 4 4 4
43.0 45 44 45 42 45 40 45
45.0 31 31 30 30 31 30 31
46.0 6 6 6 6 6 5 6
47.0 18 17 18 18 18 18 18
50.0 1 1 1 1 1 1 1
56.0 1 1 1 1 1 1 1
In [8]: df.groupby('Age')['ApplicantIncome']
Out[9]: Age
24.0 109185
25.0 538078
26.0 423267
27.0 178232
28.0 12637
30.0 314638
31.0 174155
32.0 73848
35.0 64332
37.0 99041
38.0 170038
40.0 87414
42.0 28751
43.0 205937
45.0 142359
46.0 38018
47.0 138701
50.0 4106
56.0 6540
Name: ApplicantIncome, dtype: int64
0 m 24
1 f 25
2 f 26
3 m 27
4 f 28
5 m 30
6 m 32
In [13]: f_filter = df_sample['Gender']=='f'
print(df_sample[f_filter])
m_filter = df_sample['Gender']=='m'
print(df_sample[m_filter])
Gender Age
1 f 25
2 f 26
4 f 28
Gender Age
0 m 24
3 m 27
5 m 30
6 m 32
In [ ]: