0% found this document useful (0 votes)

10 views27 pages

CSL 410 L17

Uploaded by

rpschauhan2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views27 pages

CSL 410 L17

Uploaded by

rpschauhan2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Program:B.

Tech(CSE) IV Semester II Year

CSL-410: Data Science using Python

Unit No. 2
View and Filter Data from data frames

Lecture No. 17

Dr. Sanjay Jain

Associate Professor, CSA/SOET
Outlines
• Preview and examine data in a Pandas DataFrame
• Filter Data Frames Based on Value Condition
• Examples
• References
Student Effective Learning Outcomes(SELO)
01: Ability to understand subject related concepts clearly along with
contemporary issues.
02: Ability to use updated tools, techniques and skills for effective domain
specific practices.
03: Understanding available tools and products and ability to use it
effectively.
Preview and examine data in a Pandas DataFrame
• Once you have data in Python, you’ll want to see the data has loaded, and
confirm that the expected columns and rows are present.
• Print the data : If you’re using a Jupyter notebook, outputs from simply
typing in the name of the data frame will result in nicely formatted
outputs. Printing is a convenient way to preview your loaded data, you can
confirm that column names were imported correctly, that the data
formats are as expected, and if there are missing values anywhere.

<SELO: 1> <Reference No.: R1,R4>

Preview and examine data in a Pandas DataFrame
• Print the data : This is an excellent way to preview data, however notes
that, by default, only 100 rows will print, and 20 columns.
• If you’d like to change these limits, you can edit the defaults using some
internal options for Pandas displays.
pd.options.display.XX = value
• pd.options.display.width – the width of the display in characters – use this
if your display is wrapping rows over more than one line.
• pd.options.display.max_rows – maximum number of rows displayed.
• pd.options.display.max_columns – maximum number of columns
displayed.

<SELO: 1> <Reference No.: R1,R4>

DataFrame rows and columns with .shape
• The shape command gives information on the data set size – ‘shape’
returns a tuple with the number of rows, and the number of columns for
the data in the DataFrame. Another descriptive property is the ‘ndim’
which gives the number of dimensions in your data, typically 2.

<SELO: 1> <Reference No.: R1,R4>

Preview DataFrames with head() and tail()
• The DataFrame.head() function in Pandas, by default, shows you the top 5
rows of data in the DataFrame. The opposite is DataFrame.tail(), which
gives you the last 5 rows.

<SELO: 1> <Reference No.: R1,R4>

Preview DataFrames with head() and tail()
• Pass a number in head() and tail() will print out the specified number of
rows as shown in the example below.

<SELO: 1> <Reference No.: R1,R4>

Data types (dtypes) of columns
• Many DataFrames have mixed data types, that is, some columns are
numbers, some are strings, and some are dates etc. Internally, CSV files do
not contain information on what data types are contained in each column;
all of the data is just characters.
• Pandas infers the data types when loading the data, e.g. if a column
contains only numbers, pandas will set that column’s data type to
numeric: integer or float.
• We can check the types of each column in our example with
the ‘.dtypes’ property of the dataframe.

<SELO: 1> <Reference No.: R1,R4>

Change the Data types (dtypes) of columns
• To change the datatype of a specific column, use the .astype() function.
For example, to see the ‘Item Code’ column as a string, use:
data['Item Code'].astype(str)

<SELO: 1> <Reference No.: R1,R4>

Describing data with .describe()
• For numeric columns, describe() returns basic statistics: the value count,
mean, standard deviation, minimum, maximum, and 25th, 50th, and 75th
quantiles for the data in a column.
• For string columns, describe() returns the value count, the number of
unique entries, the most frequently occurring value (‘top’), and the
number of times the top value occurs (‘freq’)
• Select a column to describe using a string inside the [] braces, and call
describe() as follows:

<SELO: 1> <Reference No.: R1,R4>

Describing data with .describe()
• Use the describe() function to get basic statistics on columns in your
Pandas DataFrame. Note the differences between columns with numeric
datatypes, and columns of strings and characters.

<SELO: 1> <Reference No.: R1,R4>

Describing data with .describe()
• if describe is called on the entire DataFrame, statistics only for the
columns with numeric datatypes are returned, and in DataFrame format.

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• One of the biggest advantages of having the data as a Pandas Dataframe is
that Pandas allows us to slice and dice the data in multiple ways.
• Often, you may want to subset a pandas dataframe based on one or more
values of a specific column. Essentially, we would like to select rows based
on one value or multiple values present in a column.
• Let us take an examples using Pandas dataframe to filter rows or select
rows based values of a column(s). Let us first load gapminder data as a
dataframe into pandas.
# load pandas
import pandas as pd
data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• This data frame has over 6000 rows and 6 columns. One of the columns is
year. Let us look at the first three rows of the data frame.

print(gapminder.head(3))
country year pop continent lifeExp gdpPercap
0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314
1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030
2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Based on a Single Value of a
Column?
• One way to filter by rows in Pandas is to use boolean expression. We first
create a boolean variable by taking the column of interest and checking if
its value equals to the specific value that we want to select/keep.
• For example, let us filter the dataframe or subset the dataframe based on
year’s value 2002. This conditional results in a boolean variable that
has True when the value of year equals 2002, False otherwise.
# does year equals to 2002?
# is_2002 is a boolean variable with True or False in it
is_2002 = gapminder['year']==2002
# filter rows for year 2002 using the boolean variable
gapminder_2002 = gapminder[is_2002]
print(gapminder_2002.shape)
print(gapminder_2002.head())
<SELO: 1> <Reference No.: R1,R4>
Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Based on a Single Value of a
Column?
• Output:
(142, 6)
country year pop continent lifeExp gdpPercap
Afghanistan 2002 25268405.0 Asia 42.129 726.734055
Albania 2002 3508512.0 Europe 75.651 4604.211737
Algeria 2002 31287142.0 Africa 70.994 5288.040382
Angola 2002 10866106.0 Africa 41.003 2773.287312
Argentina 2002 38331121.0 Americas 74.340 8797.640716

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How To Filter rows using Pandas chaining?
• We can also use Pandas chaining operation, to access a dataframe’s column
and to select rows like previous example. Pandas chaining makes it easy to
combine one Pandas command with another Pandas command or user
defined functions.
• Here we use Pandas eq() function and chain it with the year series for
checking element-wise equality to filter the data corresponding to year
2002.
# filter rows for year 2002 using the boolean expression
gapminder_2002 = gapminder[gapminder.year.eq(2002)]
print(gapminder_2002.shape)
(142, 6)
• In the above example, we checked for equality (year==2002) and kept the
rows matching a specific value. We can use any other comparison operator
like “less than” and “greater than” and create boolean expression to filter
rows of pandas dataframe.
<SELO: 1> <Reference No.: R1,R4>
Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Whose Column Value Does
NOT Equal a Specific Value?
• Sometimes, you may want tot keep rows of a data frame based on values of
a column that does not equal something. Let us filter our gapminder
dataframe whose year column is not equal to 2002. Basically we want to
have all the years data except for the year 2002.
# filter rows for year does not equal to 2002
gapminder_not_2002 = gapminder[gapminder.year != 2002]
gapminder_not_2002 = gapminder[gapminder['year']!=2002]
gapminder_not_2002.shape
(1562, 6)

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Whose Column Value is
NOT NA/NAN?
• Often you may want to filter a Pandas dataframe such that you would like
to keep the rows if values of certain column is NOT NA/NAN.
• We can use Pandas notnull() method to filter based on NA/NAN values of a
column.
# filter out rows ina . dataframe with column year values NA/NAN
gapminder_no_NA = gapminder[gapminder.year.notnull()]

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Based on a list?
• In the previous example, we selected rows based on single value, i.e. year
== 2002.
• However, often we may have to select rows using multiple values present
in an iterable or a list. For example, let us say we want select rows for years
[1952, 2002].
• Pandas dataframe’s isin() function allows us to select rows using a list or
any iterable. If we use isin() with a single column, it will simply result in a
boolean variable with True if the value matches and False if it does not.
#To select rows whose column value is in list
years = [1952, 2007]
gapminder.year.isin(years)

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Based on a list?
• We can use the boolean array to select the rows like before
gapminder_years= gapminder[gapminder.year.isin(years)]
gapminder_years.shape
(284, 6)
• We can make sure our new data frame contains row corresponding only the
two years specified in the list. Let us use Pandas unique function to get the
unique values of the column “year”
gapminder_years.year.unique()
array([1952, 2007])

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe Based on Values NOT in a
list?
• We can also select rows based on values of a column that are not in a list or
any iterable. We will create boolean variable just like before, but now we
will negate the boolean variable by placing ~ in the front.
• For example, to get rows of gapminder data frame whose column values
not in the continent list, we will use
continents = ['Asia','Africa', 'Americas', 'Europe']
gapminder_Ocean = gapminder[~gapminder.continent.isin(continents)]
gapminder_Ocean.shape
(24,6)

<SELO: 1> <Reference No.: R1,R4>

Filter Data Frames Based on Value Condition
• How to Select Rows of Pandas Dataframe using Multiple Conditions?
• We can combine multiple conditions using & operator to select rows from a
pandas data frame. For example, we can combine the above two conditions
to get Oceania data from years 1952 and 2002.
gapminder[~gapminder.continent.isin(continents) &
gapminder.year.isin(years)]
• Now we will have rows corresponding to the Oceania continent for the
years 1957 and 2007.
country year pop continent lifeExp gdpPercap
Australia 1952 8691212.0 Oceania 69.120 10039.59564
Australia 2007 20434176.0 Oceania 81.235 34435.36744
New Zealand 1952 1994794.0 Oceania 69.390 10556.57566
New Zealand 2007 4115771.0 Oceania 80.204 25185.00911

<SELO: 1> <Reference No.: R1,R4>

Learning Outcomes

The students have learn and understand the followings:

•Preview and examine data in a Pandas DataFrame

•Filter Data Frames Based on Value Condition
References

1. Data Science with Python by by Aaron England, Mohamed Noordeen

Alaudeen, and Rohan Chopra. Packt Publishing; July 2019
2. https://intellipaat.com/blog/what-is-data-science/
3. https://onlinecourses.nptel.ac.in/noc20_cs36/
Thank you

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Australian/New Zealand Standard
No ratings yet
Australian/New Zealand Standard
36 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
Learn Data Analysis With Pandas - Introduction
No ratings yet
Learn Data Analysis With Pandas - Introduction
2 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Unit 4.2
No ratings yet
Unit 4.2
24 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Lecture Week2
No ratings yet
Lecture Week2
72 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
5 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas
No ratings yet
Pandas
16 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
Python For Data Analysis: Dr. Kishore Kunal
100% (1)
Python For Data Analysis: Dr. Kishore Kunal
43 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Unit 2
No ratings yet
Unit 2
81 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Pandas
No ratings yet
Pandas
21 pages
Unit 3
No ratings yet
Unit 3
10 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas PDF
No ratings yet
Pandas PDF
171 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Pandas
No ratings yet
Pandas
13 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Python Pandas
No ratings yet
Python Pandas
13 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
IP Segmentation L6 Compressed
No ratings yet
IP Segmentation L6 Compressed
103 pages
Simulating Quantum 'Time Travel' Disproves Butterfly Effect in Quantum Realm
No ratings yet
Simulating Quantum 'Time Travel' Disproves Butterfly Effect in Quantum Realm
5 pages
BKD Project Plan
No ratings yet
BKD Project Plan
7 pages
Lynn Hunt - Preface
No ratings yet
Lynn Hunt - Preface
26 pages
Electrical Engineering Germany
No ratings yet
Electrical Engineering Germany
28 pages
Maths
No ratings yet
Maths
2 pages
Printed in Germany
No ratings yet
Printed in Germany
222 pages
RT41012112016
No ratings yet
RT41012112016
8 pages
Quiz About Moon
No ratings yet
Quiz About Moon
1 page
CBSE Sample Papers For Class 2 Maths - Mock Paper 1
50% (2)
CBSE Sample Papers For Class 2 Maths - Mock Paper 1
7 pages
Laterite: Definition
No ratings yet
Laterite: Definition
5 pages
S Mathebula A3
100% (1)
S Mathebula A3
2 pages
Institut Agama Islam Banten (Iaib) Serang - Banten: TAHUN AKADEMIK 2020/2021
No ratings yet
Institut Agama Islam Banten (Iaib) Serang - Banten: TAHUN AKADEMIK 2020/2021
2 pages
Lecture 6 Part One
No ratings yet
Lecture 6 Part One
11 pages
Elementary Workbook Unit7
No ratings yet
Elementary Workbook Unit7
8 pages
Effect of Thickness and CAD-CAM Material On Fatigue Resistance of Endodontically Treated Molars Restored With Occlusal Veneers
No ratings yet
Effect of Thickness and CAD-CAM Material On Fatigue Resistance of Endodontically Treated Molars Restored With Occlusal Veneers
11 pages
Water Supply and Urban Drainage PDF
100% (4)
Water Supply and Urban Drainage PDF
243 pages
Historiography of The Mission of A.F. Negri That Has Been Sent by The Russian Empire To The Emirate of Bukhara
No ratings yet
Historiography of The Mission of A.F. Negri That Has Been Sent by The Russian Empire To The Emirate of Bukhara
10 pages
Mellpi Pro Form For Cmnao (Rnet Use)
No ratings yet
Mellpi Pro Form For Cmnao (Rnet Use)
13 pages
A Detailed Lesson Plan in SNED 132
No ratings yet
A Detailed Lesson Plan in SNED 132
14 pages
MAY 2017 USA: Test Answer Key Explanation & Analysis Curve
No ratings yet
MAY 2017 USA: Test Answer Key Explanation & Analysis Curve
92 pages
Edfd 460 461 Modified Edtpa Lesson Plan Template1
No ratings yet
Edfd 460 461 Modified Edtpa Lesson Plan Template1
2 pages
Mabel Chidinma Onu SOP
No ratings yet
Mabel Chidinma Onu SOP
1 page
GSM Paper
No ratings yet
GSM Paper
29 pages
Lab 7
No ratings yet
Lab 7
12 pages
Guidelines on welding of duplex Stainless Steels（日本船级社）
No ratings yet
Guidelines on welding of duplex Stainless Steels（日本船级社）
26 pages
The New Geopolitics of Climate Change - The Diplomat
No ratings yet
The New Geopolitics of Climate Change - The Diplomat
5 pages
Holidays HW Class Xii
No ratings yet
Holidays HW Class Xii
3 pages
Syllabus JKSSB Panchayat AccountsAsstt Advt 2020
No ratings yet
Syllabus JKSSB Panchayat AccountsAsstt Advt 2020
4 pages

CSL 410 L17

Uploaded by

CSL 410 L17

Uploaded by

Program:B.

Tech(CSE) IV Semester II Year

CSL-410: Data Science using Python

Dr. Sanjay Jain

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

<SELO: 1> <Reference No.: R1,R4>

The students have learn and understand the followings:

•Preview and examine data in a Pandas DataFrame

1. Data Science with Python by by Aaron England, Mohamed Noordeen

You might also like