On Data Handling Using Pandas-I
On Data Handling Using Pandas-I
On Data Handling Using Pandas-I
Code No-065
CLASS-XII
2020-2021
By
ARYA K S PGT(IP)
Blue Print:
Unit Unit Name Marks
No
4 Societal Impacts 8
Practical 30
Total 100
Unit 1
Data Handling using Pandas and Data
Visualization
Syntax:
Properties of Series:
• Series will contain homogeneous data type.
• Size of the series immutable
• Values in the series are mutable.
Creation of Series:
We can create a pandas series in following ways-
● From arrays
● From Lists
● From Dictionaries
● From scalar value
From Lists :
Output:
From arrays :
Output:
From Dictionary:
Output:
From Scalar Value:
Output:
Mathematical Operations on Series:
Mathematical Operations on Series (cont…):
Output:
Head and Tail functions on Series:
head and tail functions returns first and last n rows respectively.
Syntax:
<Series name>.head(n)
<Series name>.tail(n)
n-number of rows
Default value of n is 5
Selection, Indexing and Slicing on Series:
Selection: We can select a value from the series by using its
corresponding index.
Syntax:
<Series name>[<index number>]
Output:
Indexing:
Series.index attribute is used to get or set the index labels for the
given series.
Syntax:
<Series name>.index
Indexing (cont...):
Output:
Slicing:
Slicing operation on the series split the series based on the given
parameters.
Syntax:
<Series name>[<start>:<stop>:<step>]
Note: start,stop,step are optional
Default values: start=0, stop=n-1, step=1
Note: slicing will take default index
Data Frames
Data Frames:
Data Frames is a two-dimensional(2-D) data structure defined in
pandas which consist of rows and columns.
Data Frames stores an ordered collection of columns that can
store data of different types.
Example:
S.No. Name Age Marks
1 Ravi 25 99
2 Kunal 26 98
Characteristics of Data Frames:
➢ It has two indices (two axes)
○ Row index (axis=0) ->known as index
○ Column index (axis=1) ->known as column-name
➢ Value in the Data Frame will be identifiable by the
combination of row index and column index.
➢ Indices can be of any type
➢ Column can have data of different types.
➢ Value is mutable
➢ Size is mutable
Creation of Data Frames:
Syntax:
<Data Frame Name>=
pandas.DataFrame(
<2D data structure>,
<columns=<column sequence>,
<index=<index sequence>,............)
We can create Data Frame in many ways, such as-
(i) Two dimensional dictionaries
(ii) Two dimensional ndarrays(NumPy arrays)
(iii) Series type object
(iv) Another Dataframe object
(v) Text/CSV files
Creating Data frame from List:
Output:
Creating Data frame from array:
Output:
Creating Data frame from Series:
Output:
Creating Data frame from another Data frame:
Output:
(i) Two dimensional dictionaries
We can create Dataframe from Two dimensional dictionaries-
Output:
Creating Data frame from dictionary of Series:
Output:
(v) Text/CSV files:
We can Create Dataframe from Text/CSV Files by using
read_csv() function.
Syntax:
<data frame name>
=pandas.read_csv(filepath_or_buffer, sep=',',
delimiter=None, header='infer', names=None,
index_col=None, usecols=None, …)
(v) Text/CSV files (cont..):
Output:
Accessing values in dataframe:
Accessing a particular value:
<Data frame name>[<column name>][<index>]
Output:
NaN variable in Python:
NaN , standing for not a number, is a numeric data type used to
represent any value that is undefined or unpresentable. For
example, 0/0 is undefined as a real number and is, therefore,
represented by NaN.
Iteration on Dataframes:
Output:
iteritems():
Output:
itertuples():
Output:
Iterating over Columns :In order to iterate over columns, we
need to create a list of dataframe columns and then iterating
through that list to pull out the data frame columns.
Operations on rows and columns:
● Add
● Select
● Delete
● Rename
Column selection:
Output:
Column addition:
Output:
Column Deletion:
Output:
Column Rename:
Output:
Row selection:
Output:
Row Addition:
Output:
Row Deletion:
Output:
Row Rename:
Output:
Head and Tail functions in Data Frames:
head(n):
Returns the first n rows.
tail(n):
Returns last n rows.
Default value for n is 5
Indexing using Labels in Data Frames: We can make one of
the columns as row index label for the data frame by using the
function set_index().
Output:
Boolean indexing in Data Frames: Boolean indexing helps us
to select the data from the Data Frames using a boolean vector.
Joining, Merging and Concatenation on Data Frames:
Merge:
pandas.merge() method is used for merging two data frames.
It will have three arguments.
● Data frame names
● how - how will take any of the three values i.e., left,right or
inner
● on - on the common column name
Merge (cont..):
Join:The join method uses the index of the dataframes.
Use <dataframe 1>.join(<dataframe 2>) to join
Concatenation:Concatenate uses pandas.concat(<List of data
frames>).
Importing/Exporting Data between CSV files and Data
Frames:
Import data from CSV file to Data Frame:We can import data
from CSV File to Data Frame by using read_csv() function.
Output:
Export data from Data Frame to CSV File:We can export data
from Data Frame to CSV File by using to_csv() function.
Syntax:
<data frame name>.to_csv(<File Path>,.....)
Thank you