0% found this document useful (0 votes)

36 views12 pages

Data Frames and Charts 2: 2.1 Dealing With Missing Values

The document discusses exploring and visualizing data using Pandas and Seaborn in Python. It loads automobile mileage data, cleans missing values, and explores the schema. It then demonstrates various plots - bar plots to compare average sale prices by age and role, histograms and density plots of sale price distributions, box plots to identify outliers, scatter plots to show relationships between variables, pair plots to visualize multivariate relationships, and heatmaps to view correlations. The goal is to gain insights from data through visualization.

Uploaded by

Pratyush Barua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views12 pages

Data Frames and Charts 2: 2.1 Dealing With Missing Values

Uploaded by

Pratyush Barua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Data Frames and Charts 2

2.1 Dealing With Missing Values

import pandas as pd
autos = pd.read_csv( 'auto-mpg.data',sep= '\s+', header = None)
autos.head( 5 )

0 1 2 ... 6 7 8

0 18.000 8 307.000 ... 70 1 chevrolet chevelle malibu

1 15.000 8 350.000 ... 70 1 buick skylark 320

2 18.000 8 318.000 ... 70 1 plymouth satellite

3 16.000 8 304.000 ... 70 1 amc rebel sst

4 17.000 8 302.000 ... 70 1 ford torino

0 rows × 9 columns

autos.columns = ['mpg','cylinders', 'displacement',

'horsepower', 'weight', 'acceleration',
'year', 'origin', 'name']

autos.head( 5 )

mpg cylinder displacemen ... year origin name

s t
0 18.000 8 307.000 ... 70 1 chevrolet chevelle malibu

1 15.000 8 350.000 ... 70 1 buick skylark 320

2 18.000 8 318.000 ... 70 1 plymouth satellite

3 16.000 8 304.000 ... 70 1 amc rebel sst

4 17.000 8 302.000 ... 70 1 ford torino

5 rows × 9 columns

Now, we will look at the schema of the datframe.

autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
mpg 398 non-null float64
cylinders 398 non-null int64
displacement 398 non-null float64
horsepower 398 non-null object
weight 398 non-null float64
acceleration 398 non-null float64
year 398 non-null int64
origin 398 non-null int64
name 398 non-null object
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB

autos["horsepower"] = pd.to_numeric( autos["horsepower"], errors = 'coerce' )

autos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
mpg 398 non-null float64
cylinders 398 non-null int64
displacement 398 non-null float64
horsepower 392 non-null float64
weight 398 non-null float64
acceleration 398 non-null float64
year 398 non-null int64
origin 398 non-null int64
name 398 non-null object
dtypes: float64(5), int64(3), object(1)
memory usage: 28.1+ KB

autos[autos.horsepower.isnull()]

mpg cylinder displacemen ... year origin name

s t
32 25.000 4 98.000 ... 71 1 ford pinto

126 21.000 6 200.000 ... 74 1 ford maverick

330 40.900 4 85.000 ... 80 2 renault lecar deluxe

336 23.600 4 140.000 ... 80 1 ford mustang cobra

354 34.500 4 100.000 ... 81 2 renault 18i

374 23.000 4 151.000 ... 82 1 amc concord dl

6 rows × 9 columns

autos = autos.dropna(subset = ['horsepower'])

autos[autos.horsepower.isnull()]

mpg cylinder displacemen ... year origin name

s t

0 rows × 9 columns

2.2 Exploration using Visualization Plots

2.2.1 Drawing Plots

import matplotlib.pyplot as plt

import seaborn as sn
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

2.2.2 Bar Plot

import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)
ipl_auction_df = pd.read_csv( 'IPL IMB381IPL2013.csv' )
soldprice_by_age = ipl_auction_df.groupby('AGE')['SOLD
PRICE'].mean().reset_index() sn.barplot(x = 'AGE', y = 'SOLD PRICE', data =
soldprice_by_age);
ipl_auction_df.groupby('AGE')['SOLD PRICE'].mean()
soldprice_by_age_role = ipl_auction_df.groupby(['AGE', 'PLAYING ROLE'])['SOLD
PRICE'].mean().reset_index()
soldprice_comparison = soldprice_by_age_role.merge(soldprice_by_age, on = 'AGE',
how = 'outer')
soldprice_comparison.rename( columns = { 'SOLD PRICE_x': 'SOLD_PRICE_AGE_ROLE',
'SOLD PRICE_y': 'SOLD_PRICE_AGE' }, inplace = True )
sn.barplot(x = 'AGE', y = 'SOLD_PRICE_AGE_ROLE', hue = 'PLAYING ROLE', data =
soldprice_comparison);

2.2.3 Histogram

plt.hist( ipl_auction_df['SOLD PRICE'] );

plt.hist( ipl_auction_df['SOLD PRICE'], bins = 20 );

2.2.4 Distribution or Density plot

sn.distplot( ipl_auction_df['SOLD PRICE']);

2.2.5 Box Plot

box = sn.boxplot(ipl_auction_df['SOLD PRICE']);

box = plt.boxplot(ipl_auction_df['SOLD PRICE']);

[item.get_ydata()[0] for item in box['caps']]

[20000.0, 1350000.0]

[item.get_ydata()[0] for item in box['whiskers']]

[225000.0, 700000.0]

[item.get_ydata()[0] for item in box['medians']]

[437500.0]
Who are outliers?
ipl_auction_df[ipl_auction_df['SOLD PRICE'] > 1350000.0][['PLAYER NAME',
'PLAYING ROLE',
'SOLD PRICE']]

PLAYER NAME PLAYING ROLE SOLD PRICE

15 Dhoni, MS W. Keeper 1500000

23 Flintoﬀ, A Allrounder 1550000

50 Kohli, V Batsman 1800000

83 Pietersen, KP Batsman 1550000

93 Sehwag, V Batsman 1800000

111 Tendulkar, SR Batsman 1800000

113 Tiwary, SS Batsman 1600000

127 Yuvraj Singh Batsman 1800000

2.2.6 Comparing Distributions

Using distribution plots

sn.distplot( ipl_auction_df[ipl_auction_df['CAPTAINCY EXP'] == 1]['SOLD PRICE'],

color = 'y',
label = 'Captaincy Experience')
sn.distplot( ipl_auction_df[ipl_auction_df['CAPTAINCY EXP'] == 0]['SOLD PRICE'],
color = 'r',
label = 'No Captaincy Experience');
plt.legend();

Using box plots

sn.boxplot(x = 'PLAYING ROLE', y = 'SOLD PRICE', data = ipl_auction_df);

2.2.7 Scatter Plot

ipl_batsman_df = ipl_auction_df[ipl_auction_df['PLAYING ROLE'] == 'Batsman']

plt.scatter(x = ipl_batsman_df.SIXERS,
y = ipl_batsman_df['SOLD PRICE']);
plt.xlabel('SIXERS')
plt.ylabel('SOLD PRICE');
sn.regplot( x = 'SIXERS',
y = 'SOLD PRICE',
data = ipl_batsman_df );

2.2.8 Pair Plot

influential_features = ['SR-B', 'AVE', 'SIXERS', 'SOLD PRICE']

sn.pairplot(ipl_auction_df[influential_features], size=2)
<seaborn.axisgrid.PairGrid at 0x1a1b188860>
2.2.9 Correlations and Heatmaps

ipl_auction_df[influential_features].corr()

SR-B AVE SIXERS SOLD PRICE

SR-B 1.000 0.584 0.425 0.184

AVE 0.584 1.000 0.705 0.397

SIXERS 0.425 0.705 1.000 0.451

SOLD PRICE 0.184 0.397 0.451 1.000

sn.heatmap(ipl_auction_df[influential_features].corr(), annot=True);

Hands-On AI Trading with Python, QuantConnect, and AWS
From Everand
Hands-On AI Trading with Python, QuantConnect, and AWS
Jiri Pik
3/5 (1)
Lifting Equipment Inspection (Accessories) PowerPoint
100% (3)
Lifting Equipment Inspection (Accessories) PowerPoint
254 pages
PACCAR - MX (2013 Emissions) .MX-11 (EPA 2013-16)
100% (3)
PACCAR - MX (2013 Emissions) .MX-11 (EPA 2013-16)
14 pages
Data Mining
No ratings yet
Data Mining
10 pages
Covalent: Term Loan For Expansion and Modernization: Submitted By: Group 10
No ratings yet
Covalent: Term Loan For Expansion and Modernization: Submitted By: Group 10
12 pages
Lec ExploratoryDataAnalysis1Unit5Part1
No ratings yet
Lec ExploratoryDataAnalysis1Unit5Part1
22 pages
Python Codes
No ratings yet
Python Codes
17 pages
Dav Week8 240953580
No ratings yet
Dav Week8 240953580
15 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
Internship
No ratings yet
Internship
23 pages
Eda Notes
No ratings yet
Eda Notes
4 pages
Exp 5 Exploratory Data Analysis SDK Ok
No ratings yet
Exp 5 Exploratory Data Analysis SDK Ok
13 pages
Data Frames and Charts: 2.1 Working With Dataframes
No ratings yet
Data Frames and Charts: 2.1 Working With Dataframes
13 pages
Quikr Car Price Prediction Using Linear Regression 1717999953
No ratings yet
Quikr Car Price Prediction Using Linear Regression 1717999953
12 pages
Engo 645
No ratings yet
Engo 645
10 pages
Project 8 Predictive Analytics - Ipynb - Colaboratory
No ratings yet
Project 8 Predictive Analytics - Ipynb - Colaboratory
8 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
Lab1 For Module3 - Python Code
No ratings yet
Lab1 For Module3 - Python Code
10 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Engo 645
No ratings yet
Engo 645
9 pages
12 Pandas
No ratings yet
12 Pandas
14 pages
Note
No ratings yet
Note
9 pages
Machine Learning Project 1690186790
No ratings yet
Machine Learning Project 1690186790
18 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
Car Price Prediction 1
No ratings yet
Car Price Prediction 1
24 pages
Numpy,,Pandas (24.4.25)
No ratings yet
Numpy,,Pandas (24.4.25)
1 page
Xii Project PDF
No ratings yet
Xii Project PDF
19 pages
Data Analisis 2
No ratings yet
Data Analisis 2
13 pages
Python Pandas Matplot
No ratings yet
Python Pandas Matplot
15 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Trilokesh Assignment
No ratings yet
Trilokesh Assignment
15 pages
City Cycle Fuel Consumption 2024
No ratings yet
City Cycle Fuel Consumption 2024
23 pages
Eda 1
No ratings yet
Eda 1
29 pages
Exploratiory Data Analysis
No ratings yet
Exploratiory Data Analysis
26 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Machine Learning With Python - Part-2
No ratings yet
Machine Learning With Python - Part-2
27 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
7 pages
Binning and Normalization Activity
No ratings yet
Binning and Normalization Activity
2 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
Sem 4.1
No ratings yet
Sem 4.1
8 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
04 Boxplot
No ratings yet
04 Boxplot
22 pages
DS3 1
No ratings yet
DS3 1
8 pages
Task 6
No ratings yet
Task 6
14 pages
Registro Da Analise de Dataset de Laptops
No ratings yet
Registro Da Analise de Dataset de Laptops
1 page
Cars Sales Dashboard
No ratings yet
Cars Sales Dashboard
19 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
DSBDA1
No ratings yet
DSBDA1
5 pages
DV Ca-1
No ratings yet
DV Ca-1
9 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Data Vizualization - Jupyter Notebook
No ratings yet
Data Vizualization - Jupyter Notebook
20 pages
Introduction To Python - Minor Project
No ratings yet
Introduction To Python - Minor Project
5 pages
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
No ratings yet
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
5 pages
BDA File
No ratings yet
BDA File
26 pages
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
No ratings yet
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
5 pages
Dsbda Exp4 Part1
No ratings yet
Dsbda Exp4 Part1
39 pages
Intro To Exploratory Data Analysis Eda in Python
No ratings yet
Intro To Exploratory Data Analysis Eda in Python
7 pages
Kawasaki Superbikes: Z1300
From Everand
Kawasaki Superbikes: Z1300
Stefan R. Oehl
No ratings yet
Blank Answer File
No ratings yet
Blank Answer File
9 pages
DBS India: Banking On The Unbanked
No ratings yet
DBS India: Banking On The Unbanked
16 pages
Revitalizing State Bank of India
No ratings yet
Revitalizing State Bank of India
11 pages
Punjab National Bank: Case Analysis: Group 1
No ratings yet
Punjab National Bank: Case Analysis: Group 1
8 pages
Lecture 20 - Open Economy - Introduction and The Savings Inequality
No ratings yet
Lecture 20 - Open Economy - Introduction and The Savings Inequality
18 pages
Jobless Growth in India: An Investigation: Sheba Tejani
No ratings yet
Jobless Growth in India: An Investigation: Sheba Tejani
28 pages
Growth, Employment and Labour Through A Budget Lens
No ratings yet
Growth, Employment and Labour Through A Budget Lens
4 pages
Structural Change, Jobless Growth and Informalization' of Labor: Challenges in Post Globalized India
No ratings yet
Structural Change, Jobless Growth and Informalization' of Labor: Challenges in Post Globalized India
32 pages
Lectures 3 4 and 5
No ratings yet
Lectures 3 4 and 5
47 pages
Teaching Old Brands New Tricks: Retro Branding and The Revival of Brand Meaning
No ratings yet
Teaching Old Brands New Tricks: Retro Branding and The Revival of Brand Meaning
15 pages
Report On Customer Decision Making Process: Specifications
No ratings yet
Report On Customer Decision Making Process: Specifications
1 page
Probability: Random Experiments Probability Rules of Probability
No ratings yet
Probability: Random Experiments Probability Rules of Probability
62 pages
Intro YONO
No ratings yet
Intro YONO
2 pages
Battle of The Java Sea
No ratings yet
Battle of The Java Sea
2 pages
CHG - Charging System
No ratings yet
CHG - Charging System
57 pages
Soal Us SD Plus Jawaban Kelas 6 - B.inggris Tahun 2025
No ratings yet
Soal Us SD Plus Jawaban Kelas 6 - B.inggris Tahun 2025
6 pages
Inbound Logistics
No ratings yet
Inbound Logistics
10 pages
8 Flight Planning & Monitoring 2020
0% (1)
8 Flight Planning & Monitoring 2020
250 pages
B13 Provisions For Shipping Dangerous Goods in Limited Quantities
No ratings yet
B13 Provisions For Shipping Dangerous Goods in Limited Quantities
4 pages
Ramsey Pricing Explanation
No ratings yet
Ramsey Pricing Explanation
46 pages
Sw-1619020129-Postal and Courier Operators As of April 2020
No ratings yet
Sw-1619020129-Postal and Courier Operators As of April 2020
13 pages
Charter Parties
No ratings yet
Charter Parties
2 pages
AutoCAD Instructions
No ratings yet
AutoCAD Instructions
12 pages
Preston Stiglets - Evidence of Learning - 1
No ratings yet
Preston Stiglets - Evidence of Learning - 1
3 pages
Major Item of Contractor'S Equipment Proposed For Carrying Out The Works
No ratings yet
Major Item of Contractor'S Equipment Proposed For Carrying Out The Works
2 pages
181an24f0701368v1 Meduqs800280
No ratings yet
181an24f0701368v1 Meduqs800280
1 page
747-238B Normal Checklist: Before Start Checklist Before Takeoff Checklist
No ratings yet
747-238B Normal Checklist: Before Start Checklist Before Takeoff Checklist
2 pages
Mounting Parts For Semi Hermetic Compressors Technical Information en GB 4229056
No ratings yet
Mounting Parts For Semi Hermetic Compressors Technical Information en GB 4229056
3 pages
Easa Ad 2022-0149 1
No ratings yet
Easa Ad 2022-0149 1
4 pages
CDL Self Certification Affidavit
No ratings yet
CDL Self Certification Affidavit
1 page
Service Box: Vehicle
No ratings yet
Service Box: Vehicle
4 pages
BGRP34520SS Spec-Sheet
No ratings yet
BGRP34520SS Spec-Sheet
3 pages
Module 002 Overview of Global Logistics
No ratings yet
Module 002 Overview of Global Logistics
280 pages
CUOnline Student Portal
No ratings yet
CUOnline Student Portal
1 page
BSP-AW-201818Child Hit by Car at Road Crossing
No ratings yet
BSP-AW-201818Child Hit by Car at Road Crossing
2 pages
EE Service Dealer - South
No ratings yet
EE Service Dealer - South
5 pages
Cell Transport Notes (Unit 3.4-3.10) : - Concentration
No ratings yet
Cell Transport Notes (Unit 3.4-3.10) : - Concentration
4 pages
Cane, Hareve, Weighing&handlling Plant (1)
No ratings yet
Cane, Hareve, Weighing&handlling Plant (1)
55 pages
02 Port of Ras Tanura
No ratings yet
02 Port of Ras Tanura
44 pages
Affidavit of Change Color
No ratings yet
Affidavit of Change Color
1 page
TOP NOTCH 1 UNIT 9 Taking Transportation - Quizizz
No ratings yet
TOP NOTCH 1 UNIT 9 Taking Transportation - Quizizz
3 pages

Data Frames and Charts 2: 2.1 Dealing With Missing Values

Uploaded by

Data Frames and Charts 2: 2.1 Dealing With Missing Values

Uploaded by

Data Frames and Charts 2

2.1 Dealing With Missing Values

0 18.000 8 307.000 ... 70 1 chevrolet chevelle malibu

1 15.000 8 350.000 ... 70 1 buick skylark 320

2 18.000 8 318.000 ... 70 1 plymouth satellite

3 16.000 8 304.000 ... 70 1 amc rebel sst

4 17.000 8 302.000 ... 70 1 ford torino

autos.columns = ['mpg','cylinders', 'displacement',

mpg cylinder displacemen ... year origin name

1 15.000 8 350.000 ... 70 1 buick skylark 320

2 18.000 8 318.000 ... 70 1 plymouth satellite

3 16.000 8 304.000 ... 70 1 amc rebel sst

4 17.000 8 302.000 ... 70 1 ford torino

Now, we will look at the schema of the datframe.

autos["horsepower"] = pd.to_numeric( autos["horsepower"], errors = 'coerce' )

mpg cylinder displacemen ... year origin name

126 21.000 6 200.000 ... 74 1 ford maverick

330 40.900 4 85.000 ... 80 2 renault lecar deluxe

336 23.600 4 140.000 ... 80 1 ford mustang cobra

354 34.500 4 100.000 ... 81 2 renault 18i

374 23.000 4 151.000 ... 82 1 amc concord dl

autos = autos.dropna(subset = ['horsepower'])

mpg cylinder displacemen ... year origin name

2.2 Exploration using Visualization Plots

2.2.1 Drawing Plots

import matplotlib.pyplot as plt

2.2.2 Bar Plot

plt.hist( ipl_auction_df['SOLD PRICE'] );

2.2.4 Distribution or Density plot

sn.distplot( ipl_auction_df['SOLD PRICE']);

2.2.5 Box Plot

box = plt.boxplot(ipl_auction_df['SOLD PRICE']);

[item.get_ydata()[0] for item in box['caps']]

[item.get_ydata()[0] for item in box['whiskers']]

[item.get_ydata()[0] for item in box['medians']]

PLAYER NAME PLAYING ROLE SOLD PRICE

15 Dhoni, MS W. Keeper 1500000

23 Flintoﬀ, A Allrounder 1550000

50 Kohli, V Batsman 1800000

83 Pietersen, KP Batsman 1550000

93 Sehwag, V Batsman 1800000

111 Tendulkar, SR Batsman 1800000

113 Tiwary, SS Batsman 1600000

127 Yuvraj Singh Batsman 1800000

2.2.6 Comparing Distributions

Using distribution plots

sn.distplot( ipl_auction_df[ipl_auction_df['CAPTAINCY EXP'] == 1]['SOLD PRICE'],

Using box plots

2.2.7 Scatter Plot

ipl_batsman_df = ipl_auction_df[ipl_auction_df['PLAYING ROLE'] == 'Batsman']

2.2.8 Pair Plot

influential_features = ['SR-B', 'AVE', 'SIXERS', 'SOLD PRICE']

SR-B AVE SIXERS SOLD PRICE

SR-B 1.000 0.584 0.425 0.184

AVE 0.584 1.000 0.705 0.397

SIXERS 0.425 0.705 1.000 0.451

SOLD PRICE 0.184 0.397 0.451 1.000

You might also like