0% found this document useful (0 votes)

56 views8 pages

DS - Assig-03-Part-I - Jupyter Notebook

The document discusses descriptive statistics measures used in data science including mean, median, mode, standard deviation, variance, interquartile range and skewness. These measures are calculated on a loan dataset to analyze the central tendency and dispersion of variables.

Uploaded by

Yash Shinde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views8 pages

DS - Assig-03-Part-I - Jupyter Notebook

Uploaded by

Yash Shinde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

In [1]:  #Descriptive Statistics is the building block of data science.

Advanc

#Measures of central tendency include mean, median, and the mode, whi

#We will cover the topics given below:

#Mean
#Median
#Mode
#Standard Deviation
#Variance
#Interquartile Range
#Skewness

In [1]:  import pandas as pd

import numpy as np
import statistics as st

# Load the data
df = pd.read_csv("loan_data.csv")
print(df.shape)
print(df.info())

(614, 14)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Loan_ID 614 non-null object
1 Gender 601 non-null object
2 Age 513 non-null float64
3 Married 611 non-null object
4 Dependents 599 non-null object
5 Education 614 non-null object
6 Self_Employed 582 non-null object
7 ApplicantIncome 614 non-null int64
8 CoapplicantIncome 614 non-null float64
9 LoanAmount 592 non-null float64
10 Loan_Amount_Term 600 non-null float64
11 Credit_History 564 non-null float64
12 Property_Area 614 non-null object
13 Loan_Status 614 non-null object
dtypes: float64(5), int64(1), object(8)
memory usage: 67.3+ KB
None
In [3]:  df.mean()

Out[3]: Age 32.101365

ApplicantIncome 5403.459283
CoapplicantIncome 1621.245798
LoanAmount 146.412162
Loan_Amount_Term 342.000000
Credit_History 0.842199
dtype: float64

In [4]:  print(df.loc[:,'Age'].mean())
print(df.loc[:,'ApplicantIncome'].mean())

32.10136452241716
5403.459283387622

In [5]:  df.mean(axis = 1)[0:10]

Out[5]: 0 1249.000000
1 1316.000000
2 575.333333
3 908.000000
4 1088.666667
5 1713.500000
6 722.500000
7 1017.166667
8 1017.833333
9 4092.833333
dtype: float64

In [6]:  df.median(axis=1)

Out[6]: 0 35.0
1 360.0
2 45.5
3 240.0
4 85.5
...
609 48.0
610 45.0
611 246.5
612 116.0
613 79.5
Length: 614, dtype: float64

In [7]:  df.median()
Out[7]: Age 30.0
ApplicantIncome 3812.5
CoapplicantIncome 1188.5
LoanAmount 128.0
Loan_Amount_Term 360.0
Credit_History 1.0
dtype: float64
In [8]:  #to compute a median of a some column
print(df.loc[:,'Age'].median())
print(df.loc[:,'ApplicantIncome'].median())

df.median(axis = 1)[0:10]

30.0
3812.5

Out[8]: 0 35.0
1 360.0
2 45.5
3 240.0
4 85.5
5 313.5
6 227.5
7 259.0
8 264.0
9 354.5
dtype: float64

In [9]:  df.mode()

Out[9]: Loan_ID Gender Age Married Dependents Education Self_Employed ApplicantIncome

0 LP001002 Male 25.0 Yes 0 Graduate No 2500.0

1 LP001003 NaN NaN NaN NaN NaN NaN NaN

2 LP001005 NaN NaN NaN NaN NaN NaN NaN

3 LP001006 NaN NaN NaN NaN NaN NaN NaN

4 LP001008 NaN NaN NaN NaN NaN NaN NaN

... ... ... ... ... ... ... ... ...

609 LP002978 NaN NaN NaN NaN NaN NaN NaN

610 LP002979 NaN NaN NaN NaN NaN NaN NaN

611 LP002983 NaN NaN NaN NaN NaN NaN NaN

612 LP002984 NaN NaN NaN NaN NaN NaN NaN

613 LP002990 NaN NaN NaN NaN NaN NaN NaN

614 rows × 14 columns

Measures the Dispersion

In [10]:  #Measure the Standard deviation
df.std()

Out[10]: Age 7.732178

ApplicantIncome 6109.041673
CoapplicantIncome 2926.248369
LoanAmount 85.587325
Loan_Amount_Term 65.120410
Credit_History 0.364878
dtype: float64

In [12]:  print(df.loc[:,'Age'].std())
print(df.loc[:,'ApplicantIncome'].std())

#calculate the standard deviation of the first five rows
df.std(axis = 1)[0:10]

7.732178229043358
6109.041673387174

Out[12]: 0 2575.928085
1 1921.240355
2 1195.703252
3 1219.011567
4 2409.946528
5 2430.485610
6 974.539224
7 1373.763650
8 1569.760799
9 6081.668239
dtype: float64

In [13]:  #easure the Variance

df.var()

Out[13]: Age 5.978658e+01

ApplicantIncome 3.732039e+07
CoapplicantIncome 8.562930e+06
LoanAmount 7.325190e+03
Loan_Amount_Term 4.240668e+03
Credit_History 1.331362e-01
dtype: float64

In [14]:  #Measures the Interquartile Range (IQR)

from scipy.stats import iqr
iqr(df['Age'])

Out[14]: nan
In [3]:  print(df.skew())

Age 0.712146
ApplicantIncome 6.539513
CoapplicantIncome 7.491531
LoanAmount 2.677552
Loan_Amount_Term -2.362414
Credit_History -1.882361
dtype: float64

In [4]:  #The skewness values can be interpreted in the following manner:

#Highly skewed distribution: If the skewness value is less than −1 or

#Moderately skewed distribution: If the skewness value is between −1

#Approximately symmetric distribution: If the skewness value is betwe

In [5]:  df.describe()

Out[5]: Age ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_

count 513.000000 614.000000 614.000000 592.000000 600.00000 564.

mean 32.101365 5403.459283 1621.245798 146.412162 342.00000 0.

std 7.732178 6109.041673 2926.248369 85.587325 65.12041 0.

min 24.000000 150.000000 0.000000 9.000000 12.00000 0.

25% 25.000000 2877.500000 0.000000 100.000000 360.00000 1.

50% 30.000000 3812.500000 1188.500000 128.000000 360.00000 1.

75% 38.000000 5795.000000 2297.250000 168.000000 360.00000 1.

max 56.000000 81000.000000 41667.000000 700.000000 480.00000 1.

In [6]:  df.describe(include='all')

Out[6]: Loan_ID Gender Age Married Dependents Education Self_Employed Applican

count 614 601 513.000000 611 599 614 582 614

unique 614 2 NaN 2 4 2 2

top LP001157 Male NaN Yes 0 Graduate No

freq 1 489 NaN 398 345 480 500

mean NaN NaN 32.101365 NaN NaN NaN NaN 5403

std NaN NaN 7.732178 NaN NaN NaN NaN 6109

min NaN NaN 24.000000 NaN NaN NaN NaN 150

25% NaN NaN 25.000000 NaN NaN NaN NaN 2877

50% NaN NaN 30.000000 NaN NaN NaN NaN 3812

75% NaN NaN 38.000000 NaN NaN NaN NaN 5795

max NaN NaN 56.000000 NaN NaN NaN NaN 81000

In [7]:  df.groupby('Age').count()

Out[7]: Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome Coappli

Age

24.0 22 21 21 19 22 20 22

25.0 107 105 107 105 107 102 107

26.0 87 86 87 87 87 81 87

27.0 23 23 22 22 23 22 23

28.0 4 4 4 4 4 2 4

30.0 53 53 53 52 53 52 53

31.0 23 22 23 20 23 23 23

32.0 18 18 18 18 18 18 18

35.0 7 7 7 6 7 7 7

37.0 18 17 18 18 18 16 18

38.0 23 23 23 23 23 23 23

40.0 22 20 22 22 22 20 22

42.0 4 4 4 4 4 4 4

43.0 45 44 45 42 45 40 45

45.0 31 31 30 30 31 30 31

46.0 6 6 6 6 6 5 6

47.0 18 17 18 18 18 18 18

50.0 1 1 1 1 1 1 1

56.0 1 1 1 1 1 1 1

In [8]:  df.groupby('Age')['ApplicantIncome']

Out[8]: <pandas.core.groupby.generic.SeriesGroupBy object at 0x000001CE88961

AF0>
In [9]:  df.groupby('Age')['ApplicantIncome'].sum()

Out[9]: Age
24.0 109185
25.0 538078
26.0 423267
27.0 178232
28.0 12637
30.0 314638
31.0 174155
32.0 73848
35.0 64332
37.0 99041
38.0 170038
40.0 87414
42.0 28751
43.0 205937
45.0 142359
46.0 38018
47.0 138701
50.0 4106
56.0 6540
Name: ApplicantIncome, dtype: int64

In [12]:  data = {'Gender':['m','f','f','m','f','m','m'],'Age':[24,25,26,27,28,

df_sample = pd.DataFrame(data)
df_sample

Out[12]: Gender Age

0 m 24

1 f 25

2 f 26

3 m 27

4 f 28

5 m 30

6 m 32
In [13]:  f_filter = df_sample['Gender']=='f'
print(df_sample[f_filter])

m_filter = df_sample['Gender']=='m'
print(df_sample[m_filter])

Gender Age
1 f 25
2 f 26
4 f 28
Gender Age
0 m 24
3 m 27
5 m 30
6 m 32

In [ ]: 

Six Sigma For Dummies
From Everand
Six Sigma For Dummies
Craig Gygi
3.5/5 (23)
Alumni Management System
86% (7)
Alumni Management System
37 pages
Aws Networking Fundamentals
100% (1)
Aws Networking Fundamentals
322 pages
Operating Systems Lecture Notes
100% (3)
Operating Systems Lecture Notes
47 pages
TKT Module 1 Describing Language Phonology PDF
50% (2)
TKT Module 1 Describing Language Phonology PDF
9 pages
Tutorial Data Visualization Pandas Matplotlib Seaborn
No ratings yet
Tutorial Data Visualization Pandas Matplotlib Seaborn
32 pages
Lea History Confession Indulgences 1896
No ratings yet
Lea History Confession Indulgences 1896
546 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Apex Financial Services Loan Data Automation
No ratings yet
Apex Financial Services Loan Data Automation
18 pages
BSC6900 UMTS - Parameter Reference
No ratings yet
BSC6900 UMTS - Parameter Reference
1,261 pages
IGCSE POETRY NOTES For Exam in 2019 20 21 PDF
82% (11)
IGCSE POETRY NOTES For Exam in 2019 20 21 PDF
33 pages
On Giving Advice
No ratings yet
On Giving Advice
3 pages
Huawei ICT Competition 2023-2024 Exam Outline - Cloud Track
0% (1)
Huawei ICT Competition 2023-2024 Exam Outline - Cloud Track
1 page
The Riemann Hypothesis Probability Physics and Primes
No ratings yet
The Riemann Hypothesis Probability Physics and Primes
51 pages
Leer Los Datos: Import As Import As Import As From Import From Import
100% (1)
Leer Los Datos: Import As Import As Import As From Import From Import
14 pages
I Love Merge
No ratings yet
I Love Merge
56 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
12 Useful Pandas Techniques in Python For Data Manipulation
100% (2)
12 Useful Pandas Techniques in Python For Data Manipulation
19 pages
Fourier Series: Applied Mathematics - III
No ratings yet
Fourier Series: Applied Mathematics - III
78 pages
Dsbda Exp4 Part1
No ratings yet
Dsbda Exp4 Part1
39 pages
Step - 05
No ratings yet
Step - 05
56 pages
Kunal DA-12 Assignment-4
No ratings yet
Kunal DA-12 Assignment-4
26 pages
Python For ML
No ratings yet
Python For ML
41 pages
ML Merged
No ratings yet
ML Merged
28 pages
Aiml
No ratings yet
Aiml
27 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Data Frame Notes3
No ratings yet
Data Frame Notes3
39 pages
Feature Engineering - 01
No ratings yet
Feature Engineering - 01
31 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Present Perfet (Already, Yet, Just, Since, For, Ever, Never) Worksheet
No ratings yet
Present Perfet (Already, Yet, Just, Since, For, Ever, Never) Worksheet
1 page
A Population Is The Entire Group That You Want To Draw Conclusions About
No ratings yet
A Population Is The Entire Group That You Want To Draw Conclusions About
39 pages
Ddsu666 User Manual en
No ratings yet
Ddsu666 User Manual en
22 pages
3 - Analysis of Default - Ipynb - Colab
No ratings yet
3 - Analysis of Default - Ipynb - Colab
16 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
DSBDA3 - Jupyter Notebook
No ratings yet
DSBDA3 - Jupyter Notebook
12 pages
DSC Project 442
No ratings yet
DSC Project 442
12 pages
E Service Quality A Conceptual Model
No ratings yet
E Service Quality A Conceptual Model
17 pages
GmPrac3 - Jupyter Notebook
No ratings yet
GmPrac3 - Jupyter Notebook
10 pages
Pentest Tips and Tricks #2 - EK
No ratings yet
Pentest Tips and Tricks #2 - EK
16 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Loan
No ratings yet
Loan
11 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Day89 90 Loan Predictions Model 1706059551
No ratings yet
Day89 90 Loan Predictions Model 1706059551
25 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
Loan Prediction
No ratings yet
Loan Prediction
26 pages
Unit7 Working With Pandas - Solved
No ratings yet
Unit7 Working With Pandas - Solved
12 pages
Lit. Revision Worksheet
No ratings yet
Lit. Revision Worksheet
2 pages
Predicting Credit Risk 1713295035
No ratings yet
Predicting Credit Risk 1713295035
19 pages
Assignment 8
No ratings yet
Assignment 8
4 pages
12 Pandas
No ratings yet
12 Pandas
14 pages
Loan Prediction
No ratings yet
Loan Prediction
33 pages
ME Student Data
No ratings yet
ME Student Data
5 pages
Cleaning Data
No ratings yet
Cleaning Data
18 pages
Feature Engg Code
No ratings yet
Feature Engg Code
16 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
Purpose of Business Communication
No ratings yet
Purpose of Business Communication
4 pages
Standard Bank Home Loan Prediction
No ratings yet
Standard Bank Home Loan Prediction
11 pages
Reading Action Plan 24-25
No ratings yet
Reading Action Plan 24-25
4 pages
0loan - Eligibility - Prediction - Python - Ipynb - Colab
No ratings yet
0loan - Eligibility - Prediction - Python - Ipynb - Colab
6 pages
A Summer Training Report Networking
No ratings yet
A Summer Training Report Networking
44 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
Exp 3 ML
No ratings yet
Exp 3 ML
3 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Act 7.2
No ratings yet
Act 7.2
8 pages
Code
No ratings yet
Code
3 pages
Linear Models Reading
No ratings yet
Linear Models Reading
26 pages
EDA - Session-4 - Numerical Data Analysis
No ratings yet
EDA - Session-4 - Numerical Data Analysis
9 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Embedded Systems: Quiz 2
No ratings yet
Embedded Systems: Quiz 2
19 pages
Dai 2 Glossario Unita Inglese U3
No ratings yet
Dai 2 Glossario Unita Inglese U3
2 pages
Loan Students
No ratings yet
Loan Students
2 pages
Perl 5 Pocket Reference
No ratings yet
Perl 5 Pocket Reference
74 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
Exp 3
No ratings yet
Exp 3
6 pages
Praveen Kumar - Talend Developer - Delhi
No ratings yet
Praveen Kumar - Talend Developer - Delhi
4 pages
Pandas Visualisation
No ratings yet
Pandas Visualisation
27 pages
Postgraduate English: Re-Evaluating Woolf's Androgynous Mind
No ratings yet
Postgraduate English: Re-Evaluating Woolf's Androgynous Mind
25 pages
Database Full Report
No ratings yet
Database Full Report
12 pages
Pandas Plotting Capabilities
No ratings yet
Pandas Plotting Capabilities
27 pages
Floppit
No ratings yet
Floppit
1 page
Switch Alcatel 9700
No ratings yet
Switch Alcatel 9700
6 pages
332 Final Poster
No ratings yet
332 Final Poster
1 page

DS - Assig-03-Part-I - Jupyter Notebook

Uploaded by

DS - Assig-03-Part-I - Jupyter Notebook

Uploaded by

In [1]:  #Descriptive Statistics is the building block of data science.

In [1]:  import pandas as pd

Out[3]: Age 32.101365

In [5]:  df.mean(axis = 1)[0:10]

Out[9]: Loan_ID Gender Age Married Dependents Education Self_Employed ApplicantIncome

0 LP001002 Male 25.0 Yes 0 Graduate No 2500.0

1 LP001003 NaN NaN NaN NaN NaN NaN NaN

2 LP001005 NaN NaN NaN NaN NaN NaN NaN

3 LP001006 NaN NaN NaN NaN NaN NaN NaN

4 LP001008 NaN NaN NaN NaN NaN NaN NaN

... ... ... ... ... ... ... ... ...

609 LP002978 NaN NaN NaN NaN NaN NaN NaN

610 LP002979 NaN NaN NaN NaN NaN NaN NaN

611 LP002983 NaN NaN NaN NaN NaN NaN NaN

612 LP002984 NaN NaN NaN NaN NaN NaN NaN

613 LP002990 NaN NaN NaN NaN NaN NaN NaN

614 rows × 14 columns

Measures the Dispersion

Out[10]: Age 7.732178

In [13]:  #easure the Variance

Out[13]: Age 5.978658e+01

In [14]:  #Measures the Interquartile Range (IQR)

In [4]:  #The skewness values can be interpreted in the following manner:

Out[5]: Age ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_

count 513.000000 614.000000 614.000000 592.000000 600.00000 564.

mean 32.101365 5403.459283 1621.245798 146.412162 342.00000 0.

std 7.732178 6109.041673 2926.248369 85.587325 65.12041 0.

min 24.000000 150.000000 0.000000 9.000000 12.00000 0.

25% 25.000000 2877.500000 0.000000 100.000000 360.00000 1.

50% 30.000000 3812.500000 1188.500000 128.000000 360.00000 1.

75% 38.000000 5795.000000 2297.250000 168.000000 360.00000 1.

max 56.000000 81000.000000 41667.000000 700.000000 480.00000 1.

Out[6]: Loan_ID Gender Age Married Dependents Education Self_Employed Applican

count 614 601 513.000000 611 599 614 582 614

unique 614 2 NaN 2 4 2 2

top LP001157 Male NaN Yes 0 Graduate No

freq 1 489 NaN 398 345 480 500

mean NaN NaN 32.101365 NaN NaN NaN NaN 5403

std NaN NaN 7.732178 NaN NaN NaN NaN 6109

min NaN NaN 24.000000 NaN NaN NaN NaN 150

25% NaN NaN 25.000000 NaN NaN NaN NaN 2877

50% NaN NaN 30.000000 NaN NaN NaN NaN 3812

75% NaN NaN 38.000000 NaN NaN NaN NaN 5795

max NaN NaN 56.000000 NaN NaN NaN NaN 81000

Out[7]: Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome Coappli

25.0 107 105 107 105 107 102 107

Out[8]: <pandas.core.groupby.generic.SeriesGroupBy object at 0x000001CE88961

In [12]:  data = {'Gender':['m','f','f','m','f','m','m'],'Age':[24,25,26,27,28,

Out[12]: Gender Age

You might also like