On Data Handling Using Pandas-I

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 63
At a glance
Powered by AI
The key takeaways are about data analysis using pandas library in Python and different concepts related to series and dataframes.

We can create pandas series from arrays, lists, dictionaries and scalar values.

We can select a value from a series by using its index number inside square brackets or by using .loc[] method.

INFORMATICS PRACTICES

Code No-065
CLASS-XII
2020-2021

By

ARYA K S PGT(IP)
Blue Print:
Unit Unit Name Marks
No

1 Data Handling using Pandas and Data 30


Visualization

2 Database Query using SQL 25

3 Introduction to Computer Networks 7

4 Societal Impacts 8

Practical 30

Total 100
Unit 1
Data Handling using Pandas and Data
Visualization

(Data Handling using Pandas –I)


Module: Module is a file which contains python functions. It is
.py file which has python executable code or statements.
Package: Package is namespace which contains multiple
packages or modules. It is a directory which contains a special
file __init__.py.
__init__.py file denotes Python the file that contains __init__.py
as package.
Library: It is collection of various packages. There is no
difference between package and python library conceptually.

Framework: It is a collection of various libraries which architects


the code flow.
Pandas:
Pandas is the most popular open source python library used for
data analysis.
We can analyze the data in pandas in two ways-
● Series
● Dataframes
Installation of pandas:

pip install pandas


Series:
Series is 1-Dimensional array defined in python pandas to store
any data type.

Syntax:

<Series Name>=<pd>.Series(<list name>, ...)


Example:
5 15 16 4 34

Properties of Series:
• Series will contain homogeneous data type.
• Size of the series immutable
• Values in the series are mutable.
Creation of Series:
We can create a pandas series in following ways-

● From arrays
● From Lists
● From Dictionaries
● From scalar value
From Lists :

Output:
From arrays :

Output:
From Dictionary:

Output:
From Scalar Value:

Output:
Mathematical Operations on Series:
Mathematical Operations on Series (cont…):

Output:
Head and Tail functions on Series:
head and tail functions returns first and last n rows respectively.
Syntax:
<Series name>.head(n)
<Series name>.tail(n)
n-number of rows
Default value of n is 5
Selection, Indexing and Slicing on Series:
Selection: We can select a value from the series by using its
corresponding index.
Syntax:
<Series name>[<index number>]

Output:
Indexing:
Series.index attribute is used to get or set the index labels for the
given series.

Syntax:
<Series name>.index
Indexing (cont...):

Output:
Slicing:
Slicing operation on the series split the series based on the given
parameters.
Syntax:
<Series name>[<start>:<stop>:<step>]
Note: start,stop,step are optional
Default values: start=0, stop=n-1, step=1
Note: slicing will take default index
Data Frames
Data Frames:
Data Frames is a two-dimensional(2-D) data structure defined in
pandas which consist of rows and columns.
Data Frames stores an ordered collection of columns that can
store data of different types.

Example:
S.No. Name Age Marks

1 Ravi 25 99

2 Kunal 26 98
Characteristics of Data Frames:
➢ It has two indices (two axes)
○ Row index (axis=0) ->known as index
○ Column index (axis=1) ->known as column-name
➢ Value in the Data Frame will be identifiable by the
combination of row index and column index.
➢ Indices can be of any type
➢ Column can have data of different types.
➢ Value is mutable
➢ Size is mutable
Creation of Data Frames:
Syntax:
<Data Frame Name>=
pandas.DataFrame(
<2D data structure>,
<columns=<column sequence>,
<index=<index sequence>,............)
We can create Data Frame in many ways, such as-
(i) Two dimensional dictionaries
(ii) Two dimensional ndarrays(NumPy arrays)
(iii) Series type object
(iv) Another Dataframe object
(v) Text/CSV files
Creating Data frame from List:

Output:
Creating Data frame from array:

Output:
Creating Data frame from Series:

Output:
Creating Data frame from another Data frame:

Output:
(i) Two dimensional dictionaries
We can create Dataframe from Two dimensional dictionaries-

➢ Creating Dataframe from list of dictionaries

➢ Creating Dataframe from dictionary of Series


Creating Dataframe from list of dictionaries:

Output:
Creating Data frame from dictionary of Series:

Output:
(v) Text/CSV files:
We can Create Dataframe from Text/CSV Files by using
read_csv() function.
Syntax:
<data frame name>
=pandas.read_csv(filepath_or_buffer, sep=',',
delimiter=None, header='infer', names=None,
index_col=None, usecols=None, …)
(v) Text/CSV files (cont..):

Output:
Accessing values in dataframe:
Accessing a particular value:
<Data frame name>[<column name>][<index>]

Accessing a group of values:


<Data frame name>.loc[<index>],[<column name>]
Accessing values in dataframe (cont…):

Output:
NaN variable in Python:
NaN , standing for not a number, is a numeric data type used to
represent any value that is undefined or unpresentable. For
example, 0/0 is undefined as a real number and is, therefore,
represented by NaN.
Iteration on Dataframes:

In Pandas Dataframe we can iterate an element in two ways:

● Iterating over rows


● Iterating over columns
Iterating over rows :

To iterate over the rows of the DataFrame, we can use the


following functions −
● iterrows() − iterate over the rows as (index,series) pairs
● iteritems() − to iterate over the (key,value) pairs
● itertuples() − iterate over the rows as namedtuples
iterrows():

Output:
iteritems():

Output:
itertuples():

Output:
Iterating over Columns :In order to iterate over columns, we
need to create a list of dataframe columns and then iterating
through that list to pull out the data frame columns.
Operations on rows and columns:

● Add

● Select

● Delete

● Rename
Column selection:

Output:
Column addition:

Output:
Column Deletion:

Output:
Column Rename:

Output:
Row selection:

Output:
Row Addition:

Output:
Row Deletion:

Output:
Row Rename:

Output:
Head and Tail functions in Data Frames:

head(n):
Returns the first n rows.
tail(n):
Returns last n rows.
Default value for n is 5
Indexing using Labels in Data Frames: We can make one of
the columns as row index label for the data frame by using the
function set_index().

Output:
Boolean indexing in Data Frames: Boolean indexing helps us
to select the data from the Data Frames using a boolean vector.
Joining, Merging and Concatenation on Data Frames:
Merge:
pandas.merge() method is used for merging two data frames.
It will have three arguments.
● Data frame names
● how - how will take any of the three values i.e., left,right or
inner
● on - on the common column name
Merge (cont..):
Join:The join method uses the index of the dataframes.
Use <dataframe 1>.join(<dataframe 2>) to join
Concatenation:Concatenate uses pandas.concat(<List of data
frames>).
Importing/Exporting Data between CSV files and Data
Frames:
Import data from CSV file to Data Frame:We can import data
from CSV File to Data Frame by using read_csv() function.

Output:
Export data from Data Frame to CSV File:We can export data
from Data Frame to CSV File by using to_csv() function.
Syntax:
<data frame name>.to_csv(<File Path>,.....)
Thank you

You might also like