Pandas AI ML Python Software Engineering
Pandas AI ML Python Software Engineering
NumPy
Why Pandas
Data Structures
Data operation
handling major
functions
use cases
Pandas
Data standardization
functions ? Functions for handling
missing data
.
Pandas Features
The various features of Pandas makes it an efficient library for Data Scientists.
Powerful data
structure
Pandas
Intelligent and Easy data aggregation
automated data and transformation
alignment
Series is a one-dimensional array-like object containing data and labels (or index).
Data 4 11 21 36
0 1 2 3
Label(index)
Data alignment is intrinsic and will not be broken until changed explicitly by program.
Series
Data Input
• Integer
• ndarray 2 3 8 4
• String
• dict
• Python Object 0 1 2 3
• scalar
• Floating Point
• list Label(index)
Data Types
Series
How to Create Series
Basic Method
4 11 21 36
S = pd.Series(data, index = [index])
Series
Create Series from List
This example shows you how to create a series from a list:
Import libraries
Data value
Index
Data type
We have not created index for data but notice that data alignment is done automatically
Create Series from ndarray
countries
Index
Data type
Create Series from dict
A series can also be created with dict data input for faster operations.
dict for countries and their gdp
GDP
Country
Data type
Create Series from Scalar
Scalar input
Index
Data
index
Data type
Accessing Elements in Series
Data can be accessed through different functions like loc, iloc by passing data element position or
index range.
Vectorized Operations in Series
a. Created automatically
b. Needs to be assigned
a.
Created automatically
b. Needs to be assigned
d.
Index is not applicable as series is one-dimensional
Explanation: Data alignment is intrinsic in Pandas data structure and happens automatically. One can also assign index to data
elements.
KNOWLEDGE
What will the result be in vector addition if label is not found in a series?
CHECK
a.
Marked as Zeros for missing labels
d.
Will throw an exception, index not found
Explanation: The result will be marked as NaN (Not a Number) for missing labels.
DataFrame
DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
Data Input
• Integer
• ndarray 2 3 8 4
• String
• dict 5 8 10 1
• Python Object
• scalar
• Floating Point 0 1 2 3
• list
Label(index)
Data Types
DataFrame
Create DataFrame from Lists
Let’s see how you can create a DataFrame from lists:
Entire dict
View DataFrame
You can view a DataFrame by referring the column name or with the describe function.
Create DataFrame from dict of Series
Create DataFrame from ndarray
The fillna function fills all the uncommon indices with a number instead of dropping them.
Explanation: This is DataFrame slicing technique with indexing or selection on data elements. When a user
passes the range 3:9, the entire range from 3 to 9 gets sliced and displayed as output.
KNOWLEDGE
CHECK
What does the fillna() method do?
Explanation: fillna is one of the basic methods to fill NaN values in a dataset with a desired value by passing
that in parenthesis.
File Read and Write Support
read_hdf
read_excel to_hdf read_clipboard
to_excel to_clipboard
read_csv read_html
to csv to_html
read_json read_pickle
to_json to_pickle
read_sql read_stata
read_sas
to_sql to_stata
to sas
Pandas SQL operation
Pandas SQL operation
Pandas SQL operation
Activity—Sequence it Right!
The code here is buggy. You have to correct its sequence to debug it. To do that, click any two code
snippets, which you feel are out of place, to swap their places.
a. Series
b. DataFrame
c. Panel
d. PanelND
QUIZ
Which of the following data structures is used to store three-dimensional data?
1
a. Series
b. DataFrame
c. Panel
d. PanelND
a. iat
b. iloc
c. loc
d. std
QUIZ
Which method is used for label-location indexing by label?
2
a. iat
b. iloc
c. loc
d. std
Explanation: The loc method is used to for label-location indexing by label; iat is strictly integer location and
iloc is integer-location-based indexing by position.
QUIZ
While viewing a dataframe, head() method will .
3
Explanation: The default value is 5 if nothing is passed in head method. So it will return the first five rows
of the DataFrame.
Key Takeaways