0% found this document useful (0 votes)

21 views63 pages

Pandas AI ML Python Software Engineering

This lesson covers data manipulation using the Pandas library in Python, focusing on its features, data structures like Series and DataFrame, and methods for handling missing values. Key functionalities include creating data structures, accessing elements, performing vectorized operations, and executing various data operations. The lesson also highlights the importance of Pandas in data analysis and its compatibility with multiple file formats.

Uploaded by

Vijay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views63 pages

Pandas AI ML Python Software Engineering

Uploaded by

Vijay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Data Science with Python

Lesson 7— Data Manipulation with Pandas

What You Will Learn

Pandas and its features

Different data structures of Pandas

Creating Series and DataFrame with data inputs

Viewing, selecting, and accessing elements in a data structure

Handling vectorized operations

Learning how to handle missing values

Analyzing data with different data operation methods

Why Pandas

NumPy is great for mathematical computing. Then why do we need Pandas?

Pandas with several

functionalities

NumPy
Why Pandas

NumPy is great for mathematical computing. Then why do we need Pandas?

Intrinsic data
alignment

Data Structures
Data operation
handling major
functions
use cases
Pandas

Data standardization
functions ? Functions for handling
missing data
.
Pandas Features

The various features of Pandas makes it an efficient library for Data Scientists.

Powerful data
structure

Fast and efficient

High performance
data wrangling
merging and joining
of data sets

Pandas
Intelligent and Easy data aggregation
automated data and transformation
alignment

Tools for reading/

writing data
Data Structures

The four main libraries of Pandas data structure are:

• One-dimensional labeled array

Series
• Supports multiple data types
• Two-dimensional labeled array
• Supports multiple data types
Data Frame
• Input can be a Series
• Three-dimensional labeled array • Input can be another DataFrame
• Supports multiple data types
• Items  axis 0 Panel
• Major axis  rows • Four-dimensional labeled array
• Minor axis columns • Supports multiple data types
Panel 4D • Labels  axis 0
(Experimental) • Items  axis 1
• Major axis  rows
• Minor axis columns
Understanding Series

Series is a one-dimensional array-like object containing data and labels (or index).

Data 4 11 21 36
0 1 2 3

Label(index)

Data alignment is intrinsic and will not be broken until changed explicitly by program.
Series

Series can be created with different data inputs:

Data Input

• Integer
• ndarray 2 3 8 4
• String
• dict
• Python Object 0 1 2 3
• scalar
• Floating Point
• list Label(index)

Data Types
Series
How to Create Series

Key points to note while creating a series are as follows:

• Import Pandas as it is the main library
• Apply the syntax and pass the data elements as arguments
• Import NumPy while working with ndarrays

Basic Method

4 11 21 36
S = pd.Series(data, index = [index])
Series
Create Series from List
This example shows you how to create a series from a list:

Import libraries

Pass list as an argument

Data value

Index

Data type

We have not created index for data but notice that data alignment is done automatically
Create Series from ndarray

This example shows you how to create a series from an ndarray:

ndarray for countries

Pass ndarray as an argument

countries
Index

Data type
Create Series from dict

A series can also be created with dict data input for faster operations.
dict for countries and their gdp

Countries have been passed as an index

and GDP as the actual data value

GDP

Country

Data type
Create Series from Scalar

Scalar input

Index

Data

index

Data type
Accessing Elements in Series
Data can be accessed through different functions like loc, iloc by passing data element position or
index range.
Vectorized Operations in Series

Vectorized operations are performed by the data element’s position.

Add the series

Vectorized Operations in Series
Knowledge Check
KNOWLEDGE How is an index for data elements assigned while creating a Pandas series ? Select all
CHECK that apply.

a. Created automatically

b. Needs to be assigned

c. Once created can not be changed or altered

d. Index is not applicable as series is one-dimensional

How is an index for data elements assigned while creating a Pandas series ? Select all
KNOWLEDGE that apply.
CHECK

a.
Created automatically

b. Needs to be assigned

c. Once created can not be changed or altered

d.
Index is not applicable as series is one-dimensional

The correct answer is a, b .

Explanation: Data alignment is intrinsic in Pandas data structure and happens automatically. One can also assign index to data
elements.
KNOWLEDGE
What will the result be in vector addition if label is not found in a series?
CHECK

a. Marked as Zeros for missing labels

b. Labels will be skipped

c. Marked as NaN for missing labels

d. Will throw an exception, index not found

KNOWLEDGE
CHECK
What will the result be in vector addition if label is not found in a series?

a.
Marked as Zeros for missing labels

b. Labels will be skipped

c. Marked as NaN for missing labels

d.
Will throw an exception, index not found

The correct answer is c .

Explanation: The result will be marked as NaN (Not a Number) for missing labels.
DataFrame

DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

Data Input

• Integer
• ndarray 2 3 8 4
• String
• dict 5 8 10 1
• Python Object
• scalar
• Floating Point 0 1 2 3
• list
Label(index)

Data Types
DataFrame
Create DataFrame from Lists
Let’s see how you can create a DataFrame from lists:

Pass the list to the DataFrame

Create DataFrame from dict
This example shows you how to create a DataFrame from a series of dicts:

dict one dict two

Entire dict
View DataFrame

You can view a DataFrame by referring the column name or with the describe function.
Create DataFrame from dict of Series
Create DataFrame from ndarray

Create an ndarrays with years

Create a dict with the ndarray

Pass this dict to a new DataFrame

Create DataFrame from DataFrame

Create a DataFrame from a

DataFrame
Demo 01—View and Select Data
Demonstrate how to view and select data in a DataFrame.
Missing Values

Various factors may lead to missing data values:

Data not provided by the

source Software issue Data integration issue Network issue
Handle Missing Values
It’s difficult to operate on a dataset when it has missing values or uncommon indices.
Handle Missing Values with Functions
The dropna function drops all the values with uncommon indices.
Handle Missing Values with Functions

The fillna function fills all the uncommon indices with a number instead of dropping them.

Fill the missing values with zero

Handle Missing Values with Functions- Example
Data Operation
Data operation can be performed through various built-in methods for faster data processing.
Data Operation with Functions
While performing data operation, custom functions can be applied with the applymap method.

Declare a custom function

Test the function

Apply the function to the DataFrame

Data Operation with Statistical Functions
This example shows data operations with different statistical functions.

Create a DataFrame with two test

Apply the max function to find the

maximum score

Apply the mean function to find

the average score

Apply the std function to find the standard

deviation for both the tests
Data Operation Using Groupby
This example shows how to operate data using the groupby function.

Create a DataFrame with first and

last name as former presidents

Group the DataFrame with the first name

Data Operation – Sorting
This example shows how to sort data

Sort values by first name

Demo 02—Data Operations
Demonstrate how to perform data operations.
Data Standardization
This example shows how to standardize a dataset.
Create a function to return the standardize value

Apply the function to the entire dataset

Standardized test data is applied for the entire

DataFrame
Knowledge Check
KNOWLEDGE
CHECK
What is the result of DataFrame[3:9]?

a. Series with sliced index from 3 to 9

b. dict of index position 3 and index position 9

c. DataFrame of sliced rows index from 3 to 9

d. DataFrame with data elements at index 3 to index9

KNOWLEDGE
CHECK
What is the result of DataFrame[3:9]?

a. Series with sliced index from 3 to 9

b. dict of index position 3 and index position 9

c. DataFrame of sliced rows index from 3 to 9

d. DataFrame with data elements at index 3 to index9

The correct answer is . c

Explanation: This is DataFrame slicing technique with indexing or selection on data elements. When a user
passes the range 3:9, the entire range from 3 to 9 gets sliced and displayed as output.
KNOWLEDGE
CHECK
What does the fillna() method do?

a. Fills all NaN values with zeros

b. Fills all NaN values with one

c. Fills all NaN values with values mentioned in the parenthesis

d. Drops NaN values from the dataset

KNOWLEDGE
CHECK
What does the fillna() method do?

a. Fills all NaN values with zeros

b. Fills all NaN values with One

c. Fills all NaN values with values mentioned in the parenthesis

d. Drops NaN values from the dataset

The correct answer is . c

Explanation: fillna is one of the basic methods to fill NaN values in a dataset with a desired value by passing
that in parenthesis.
File Read and Write Support

read_hdf
read_excel to_hdf read_clipboard
to_excel to_clipboard

read_csv read_html
to csv to_html

read_json read_pickle
to_json to_pickle

read_sql read_stata
read_sas
to_sql to_stata
to sas
Pandas SQL operation
Pandas SQL operation
Pandas SQL operation
Activity—Sequence it Right!
The code here is buggy. You have to correct its sequence to debug it. To do that, click any two code
snippets, which you feel are out of place, to swap their places.

Click any two code snippets to swap them.

Assignment
Assignment
Quiz
QUIZ
Which of the following data structures is used to store three-dimensional data?
1

a. Series

b. DataFrame

c. Panel

d. PanelND
QUIZ
Which of the following data structures is used to store three-dimensional data?
1

a. Series

b. DataFrame

c. Panel

d. PanelND

The correct answer is c.

Explanation: Panel is a data structure used to store three-dimensional data.

QUIZ
Which method is used for label-location indexing by label?
2

a. iat

b. iloc

c. loc

d. std
QUIZ
Which method is used for label-location indexing by label?
2

a. iat

b. iloc

c. loc

d. std

The correct answer is c.

Explanation: The loc method is used to for label-location indexing by label; iat is strictly integer location and
iloc is integer-location-based indexing by position.
QUIZ
While viewing a dataframe, head() method will .
3

a. return only the first row

b. return only headers or column names of the DataFrame

c. return the first five rows of the DataFrame

d. throw an exception as it expects parameter(number) in parenthesis

QUIZ
While viewing a dataframe, head() method will .
3

a. return only the first row

b. return only headers or column name of the DataFrame

c. return the first five rows of the DataFrame

d. throw an exception as it expects parameter(number) in parenthesis

The correct answer is c.

Explanation: The default value is 5 if nothing is passed in head method. So it will return the first five rows
of the DataFrame.
Key Takeaways

Let us take a quick recap of what we have learned in the lesson:

Pandas is an open source library for data analysis and is an efficient

data wrangling tool in Python.
The four main libraries of Pandas are Series, DataFrame, Panel, and
Panel 4D.
DataFrame is a two-dimensional labeled data structure with columns
of potentially different data types.
To access data elements in a series, 'loc' and 'iloc' methods can be
used.
Key Takeaways

The 'iat' method enables selection of elements in a DataFrame by

index position and returns the corresponding data element.
Missing data values in Pandas can be resolved through
two built-in methods such as dropna and fillna.

Pandas supports multiple files for data analysis such as

Excel, PyTables, Clipboard, HTML, pickle, dta, SAS, SQL,
JSON, and CSV.
This concludes “Data Manipulation with Pandas.”
The next lesson is “Machine Learning with SciKit Learn.”

Pandas Worksheets ALL
100% (1)
Pandas Worksheets ALL
8 pages
Balancing Account' Has Been Setup For One Balancing Segment
No ratings yet
Balancing Account' Has Been Setup For One Balancing Segment
2 pages
Lesson 07 Data Manipulation With Pandas
No ratings yet
Lesson 07 Data Manipulation With Pandas
82 pages
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
No ratings yet
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
72 pages
Chapter - 4 Data Analysis With Pandas
No ratings yet
Chapter - 4 Data Analysis With Pandas
60 pages
Python Pandas Series
No ratings yet
Python Pandas Series
30 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
eda u2
No ratings yet
eda u2
61 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Pandas
No ratings yet
Pandas
63 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
No ratings yet
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
15 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Pandas Series - Notes for PA3.Docx
No ratings yet
Pandas Series - Notes for PA3.Docx
9 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
Pandas 1 Series
No ratings yet
Pandas 1 Series
14 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
Notes - EDA-Unit2
No ratings yet
Notes - EDA-Unit2
43 pages
Leip 102
No ratings yet
Leip 102
36 pages
Pandas Summarized Visually in 8
100% (2)
Pandas Summarized Visually in 8
8 pages
Ip 102
No ratings yet
Ip 102
36 pages
Data Handling Using Pandas
No ratings yet
Data Handling Using Pandas
7 pages
Module 6
No ratings yet
Module 6
48 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
CH 2
No ratings yet
CH 2
36 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
Pandas
No ratings yet
Pandas
49 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
Dataframes UNIT 1 PART 2
No ratings yet
Dataframes UNIT 1 PART 2
33 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Pandas
No ratings yet
Pandas
163 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Data Handling Using Pandas-1 - Series Object Notes PDF
No ratings yet
Data Handling Using Pandas-1 - Series Object Notes PDF
25 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
Pandas
No ratings yet
Pandas
7 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
Pandas
No ratings yet
Pandas
13 pages
MLL Ip Xii
No ratings yet
MLL Ip Xii
22 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas
No ratings yet
Pandas
57 pages
Ip Study
No ratings yet
Ip Study
18 pages
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Cse 310 Assignment : 4 Introduction and Definition
No ratings yet
Cse 310 Assignment : 4 Introduction and Definition
10 pages
Frontend Development - Intermediate Level
No ratings yet
Frontend Development - Intermediate Level
10 pages
TVL - CSS12 - Q2 - M17
No ratings yet
TVL - CSS12 - Q2 - M17
10 pages
BCA-OU-Syllabus Sem III and IV - 2023
No ratings yet
BCA-OU-Syllabus Sem III and IV - 2023
28 pages
On Car Sales System: Mini Project Report
No ratings yet
On Car Sales System: Mini Project Report
41 pages
Eden Net Self Monitoring Using Zabbix
No ratings yet
Eden Net Self Monitoring Using Zabbix
37 pages
Dbms. 5 Unit Part-B
No ratings yet
Dbms. 5 Unit Part-B
8 pages
Lab2 VHDL
No ratings yet
Lab2 VHDL
1 page
Data Base Exam 1 Student Review
No ratings yet
Data Base Exam 1 Student Review
8 pages
Dork Fresh GG
No ratings yet
Dork Fresh GG
41 pages
rc5 H
No ratings yet
rc5 H
11 pages
Agile Model
No ratings yet
Agile Model
5 pages
Red PPT Template-56-60
No ratings yet
Red PPT Template-56-60
5 pages
ZIOIEXCEL
No ratings yet
ZIOIEXCEL
4 pages
Aryan UI Designer Resume
No ratings yet
Aryan UI Designer Resume
3 pages
Microsoft Office Share Point Designer 2007 Training (Standalone Edition)
No ratings yet
Microsoft Office Share Point Designer 2007 Training (Standalone Edition)
5 pages
Nikto Web Vulnerability Scanner: Here Are Some of The Cool Things That Nikto Can Do
No ratings yet
Nikto Web Vulnerability Scanner: Here Are Some of The Cool Things That Nikto Can Do
5 pages
Resume Pooja Jain
No ratings yet
Resume Pooja Jain
3 pages
PPT01 - Introduction To Interface Design
No ratings yet
PPT01 - Introduction To Interface Design
27 pages
GTmetrix Report WWW - Girlsaskguys.com 20180514T234603 WVC1wC2M Full
No ratings yet
GTmetrix Report WWW - Girlsaskguys.com 20180514T234603 WVC1wC2M Full
85 pages
Detect Malware W Memory Forensics
100% (1)
Detect Malware W Memory Forensics
27 pages
36 Popular Snap Command Examples in Linux For Beginners - CyberITHub
No ratings yet
36 Popular Snap Command Examples in Linux For Beginners - CyberITHub
1 page
HTML in A Day For Digital Marketing Pro Course
No ratings yet
HTML in A Day For Digital Marketing Pro Course
1 page
Deadlock Error Log
No ratings yet
Deadlock Error Log
42 pages
Functional Programming in Java
No ratings yet
Functional Programming in Java
7 pages
Agile
No ratings yet
Agile
21 pages
Export OLE Fields in Microsoft Access - by Haggen - Medium
No ratings yet
Export OLE Fields in Microsoft Access - by Haggen - Medium
4 pages
Java Multithreading and Concurrency Training
No ratings yet
Java Multithreading and Concurrency Training
11 pages
Splunk Certification Exams Study Guide
No ratings yet
Splunk Certification Exams Study Guide
11 pages