Class Xii Information Practices PPT On Data Handling Using Pandas-I
Class Xii Information Practices PPT On Data Handling Using Pandas-I
4 Societal Impacts 8
Practical 30
Total 100
Unit 1
Data Handling using Pandas and Data
Visualization
Syntax:
Properties of Series:
• Series will contain homogeneous data type.
• Size of the series immutable
• Values in the series are mutable.
Creation of Series:
We can create a pandas series in following ways-
● From arrays
● From Lists
● From Dictionaries
● From scalar value
From Lists :
Output:
From arrays :
Output:
From Dictionary:
Output:
From Scalar Value:
Output:
Mathematical Operations on Series:
Mathematical Operations on Series (cont…):
Output:
Head and Tail functions on Series:
head and tail functions returns first and last n rows
respectively. Syntax:
<Series name>.head(n)
<Series
name>.tail(n) n-number
of rows
Default value of n is 5
Selection, Indexing and Slicing on Series:
Selection: We can select a value from the series by using
its corresponding index.
Syntax:
<Series name>[<index number>]
Output:
Indexing:
Series.index attribute is used to get or set the index
labels for the given series.
Syntax:
<Series name>.index
Indexing (cont...):
Output:
Slicing:
Slicing operation on the series split the series based
on the given parameters.
Syntax:
<Series
name>[<start>:<stop>:<step>]
Note: start,stop,step are optional
Default values: start=0, stop=n-1,
step=1 Note: slicing will take default
index
Data Frames
Data Frames:
Data Frames is a two-dimensional(2-D) data structure
defined in pandas which consist of rows and columns.
Data Frames stores an ordered collection of columns that
can store data of different types.
Example:
S.No. Name Age Marks
1 Ravi 25 99
2 Kunal 26 98
Characteristics of Data Frames:
➢ It has two indices (two axes)
○ Row index (axis=0) ->known as index
○ Column index (axis=1) ->known as column-name
➢ Value in the Data Frame will be identifiable
by the combination of row index and
column index.
➢ Indices can be of any type
➢ Column can have data of different types.
➢ Value is mutable
➢ Size is mutable
Creation of Data Frames:
Syntax:
<Data Frame Name>=
pandas.DataFrame(
<2D data structure>,
<columns=<column sequence>,
<index=<index sequence>,.....)
We can create Data Frame in many ways, such as-
(i) Two dimensional dictionaries
(ii)Two dimensional ndarrays(NumPy arrays)
(iii) Series type object
(iv)Another Dataframe object
(v)Text/CSV files
Creating Data frame from List:
Output:
Creating Data frame from array:
Output:
Creating Data frame from Series:
Output:
Creating Data frame from another Data frame:
Output:
(i) Two dimensional dictionaries
We can create Dataframe from Two dimensional
dictionaries-
Output:
Creating Data frame from dictionary of Series:
Output:
(v) Text/CSV files:
We can Create Dataframe from Text/CSV Files by
using read_csv() function.
Syntax:
<data frame name>
=pandas.read_csv(filepath_or_buffer, sep=',',
delimiter=None, header='infer', names=None,
index_col=None, usecols=None, …)
(v) Text/CSV files (cont..):
Output:
Accessing values in dataframe:
Accessing a particular value:
<Data frame name>[<column name>][<index>]
Output:
NaN variable in Python:
NaN , standing for not a number, is a numeric data type
used to represent any value that is undefined or
unpresentable. For example, 0/0 is undefined as a real
number and is, therefore, represented by NaN.
Iteration on Dataframes:
Output:
iteritems():
Output:
itertuples():
Output:
Iterating over Columns :In order to iterate over columns,
we need to create a list of dataframe columns and then
iterating through that list to pull out the data frame
columns.
Operations on rows and columns:
● Add
● Select
● Delete
● Rename
Column selection:
Output:
Column addition:
Output:
Column Deletion:
Output:
Column Rename:
Output:
Row selection:
Output:
Row Addition:
Output:
Row Deletion:
Output:
Row Rename:
Output:
Head and Tail functions in Data Frames:
head(n):
Returns the first n rows.
tail(n):
Returns last n rows.
Default value for n is 5
Indexing using Labels in Data Frames: We can make one
of the columns as row index label for the data frame by
using the function set_index().
Output:
Boolean indexing in Data Frames: Boolean indexing helps
us to select the data from the Data Frames using a
boolean vector.
Joining, Merging and Concatenation on Data Frames:
Merge:
pandas.merge() method is used for merging two data
frames. It will have three arguments.
● Data frame names
● how - how will take any of the three values i.e.,
left,right or inner
● on - on the common column name
Merge (cont..):
Join:The join method uses the index of the
dataframes. Use <dataframe 1>.join(<dataframe
2>) to join
Concatenation:Concatenate uses pandas.concat(<List of
data frames>).
Importing/Exporting Data between CSV files and Data
Frames:
Import data from CSV file to Data Frame:We can import
data from CSV File to Data Frame by using read_csv()
function.
Output:
Export data from Data Frame to CSV File:We can export
data from Data Frame to CSV File by using to_csv()
function.
Syntax:
<data frame name>.to_csv(<File
Python module- A python module is a python script file(.py file) containing variables, python classes, functions,
statements etc.
Python Library/package- A Python library is a collection of modules that together cater to a specific type of need or
application. The advantage of using libraries is that we can directly use functions/methods for performing specific
type of application instead of rewriting the code for that particular use. They are used by using the import command
as-
import libraryname
at the top of the python code/script file.
Some examples of Python Libraries-
1. Python standard library-It is a collection of library which is normally distributed along with Python installation.
Some of them are-
a. math module- provides mathematical functions
b. random module- provides functions for generating pseudo-random numbers.
c. statistics module- provides statistical functions
2. Numpy (Numerical Python) library- It provides functions for working with large multi-dimensional arrays(ndarrays)
and matrices. NumPy provides a large set of mathematical functions that can operate quickly on the entries of the
ndarray without the need of loops.
3. Pandas (PANel + DAta) library- Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool. Pandas is built on top of NumPy, relying on ndarray and its fast and efficient array based
mathematical functions.
4. Matplotlib library- It provides functions for plotting and drawing graphs.
Data Structure- Data structure is the arrangement of data in such a way that permits efficient access and
modification.
Pandas Data Structures- Pandas offers the following data structures-
a) Series - 1D array
b) DataFrame - 2D array
c) Panel - 3D array (not in syllabus)
Series- Series is a one-dimensional array with homogeneous data.
Key features of Series-
• A Series has only one dimension, i.e. one axis 1D Data values
• Each element of the Series can be associated with an index/label that can be used to access the data value. By
default the index starts with 0,1,2,3… but it can be set to any other data type also.
• Series is data mutable i.e. the data values can be changed in-place in memory
• Series is size immutable i.e. once a series object is created in memory with a fixed number of elements, then the
number of elements cannot be changed in place. Although the series object can be assigned a different set of values
it will refer to a different location in memory. • All the elements of the Series are homogenous data i.e. their data
type is the same.
Structure
The most obvious difference between a Series and a DataFrame is their structure. A Series is a one-dimensional
object, while a DataFrame is two-dimensional. This means that a Series has only one index, while a DataFrame has
both row and column indexes.
Dimensions
Another key difference between Series and DataFrame is their dimensions. A Series has only one dimension, while a
DataFrame has two. This means that a Series has only one axis, while a DataFrame has both row and column axes.
Data Types
While both Series and DataFrame can hold any data type, they have some differences in how they handle data
types. A Series can hold only one data type at a time, while a DataFrame can hold multiple data types in different
columns. This means that a DataFrame can be thought of as a collection of Series, where each column is a Series.
Operations
Series and DataFrame also have some differences in the types of operations that can be performed on them. For
example, arithmetic operations can be performed directly on a Series, but not on a DataFrame. To perform
arithmetic operations on a DataFrame, you need to specify the columns or rows that you want to operate on.