Python Pandas
Python Pandas
Convenor:
Ms.Sylvia Mary D
(HOD)
04 02 R&D
03
Industrial Design Services Lab
Services Equipment Manufacturer
Workshops, FDP’s Working on Funded Projects
Internships, Value
Added Courses
Online Training
Project Guidance
Follow us on:
Instagram--https://instagram.com/pantechelearning?igshid=1fohp030onteu
Telegram--https://t.me/pantechelearning
YouTube--https://youtube.com/c/PantecheLearning.
Podcasts--https://www.instagram.com/tv/COPQIigJZFi/?igshid=4gjhz4dlls1p
Python - Pandas
Pandas is a open-source python library.
It provides highly efficient data structures and data
analysis tools for python programming language.
Python with pandas is used in a variety of domains
like academics,finance,economics,statistics and
PRE-REQUISITES
In order to learn pandas , one should be aware of the
computer programming terminologies.
A basic knowledge of other programming languages is
essential.
Pandas use most of the functionalities of numpy.
A basic understanding of numpy is necessary to
understand pandas.
Pandas
It is a open-source python library used for data manipulation and data
analysis.
Python with pandas is used in a variety of domains like statistics , finance
and web-analytics.
Using pandas , the following five steps will be accomplished.
Load
Organise
Manipulate
Model
Analyze
Important Features Of Pandas:
Efficient data frame object with customized indexing.
Supports different file formats and used for loading data into in-
memory data objects.
Aligns data and handles missing data.
Used for reshaping date sets.
Performs slicing based on labels , indexing and extracts subsets from
large datasets.
Can insert columns and delete columns from a dataset.
Data are grouped for aggregation and transformation.
Important Features Of Pandas:
Performance is higher while merging and joining of
data.
Provides time-series functionality.
PANDAS – ENVIRONMENT SETUP
Standard python distribution doesn’t have a Pandas module.
Pandas is installed using python package installer pip
Pip install pandas.
Once Anaconda is installed , Pandas will be installed with it.
Anaconda is a open-source python distribution for scipy.
It is available for linux and mac.
Pandas
It deals with the following three data structures.
Series
Data frame
Panel
These data-structures are built on top of numpy
array ,thus making them fast and efficient.
Dimension and Description:
Higher dimensional data structure is a container of lower dimensional
data structure.
Data-Frame is a container of series and panel is a container of data frame.
Series – It is a one-dimensional collection of similar elements .Series is
nothing but a collection of integers.
Points to Consider:
Collection of similar elements.
Size cannot be changed(i.e, it is immutable).
Values of the data can be changed(i.e , it is mutable).
Data Frame:
It is a heterogeneous collection of data elements and
the size of the table can be changed.
Data Frame is used in a variety of fields and it is a most
useful data structure.
It is a 2D labelled size-mutable tabular data structure.
Panel:
It is nothing but a 3D labelled , size mutable array.
It is difficult to build and handle two or more dimensional arrays .
More burden is placed on the user to consider the orientation of the data when writing
functions.
Using Pandas data structure, the mental effort of the user is reduced.
With tabular dataframe , it is useful to think of the index(rows) and the columns rather than
axis 0 and axis 1.
All pandas data structure are value mutable(values can be changed).
Except series, all are size mutable.
Series is size immutable.
Data frame is widely used and one of the most important data structure.
Panel is less frequently used data structure.
Panel:
Panel is nothing but a three-dimensional data structure with
heterogeneous collection of data.
Panel can’t be represented in a graphical format.
Panel can be illustrated as a container of dataframe.
Important Points:
Heterogeneous data
Size mutable
Data mutable
Series:
It is a one-dimensional data structure with homogeneous
collection of elements.
For example , it contains a collection of integers like
10,20,30,40,50.
Pandas – Series:
A pandas series can be denoted by the following constructor:
pandas.Series( data, index, dtype, copy)
Series:
The parameters of the constructor are as follows:
Data: It takes various forms like ndarray, lists and constants.
Index: Index values must be unique values.
The hashtable should be of the same length as the data.
Default is np.arange(n) if no index is passed.
Dtype: dtype indicates the datatype.
If no values are passed , then datatypes will be inferred.
Copy: It contains a copy of the data.
Default value is false.
Series:
A series can be created using various inputs like:
Array
Dict
Scalar value or constant.
Creation of Empty Series:
A basic series can be created and it is called an empty
series.
Example:
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print (s)
Output:
Series([], dtype: float64)
Create a series from ndarray:
If data passed is an ndarray , then index passed must
be of the same length.
If no index is passed, then by default index will be of
range(n) where n is the array length.
[0,1,2,3…. range(len(array))-1].
Example 1:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
OUTPUT:
0 a
1 b
2 c
3 d
dtype: object
No index values are passed.
By default , it assigned the indices ranging from 0 to
len(data)-1, i.e from 0 to 3.
EXAMPLE 2:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array([‘b',‘c',‘d',‘a'])
s = pd.Series(data,index=[10,11,12,13])
print (s)
Output:
10 b
11 c
12 d
13 a
dtype: object
Create a series from Dict:
A dictionary can be passed as an input.
If no index is specified , then dictionary keys are taken
in a sorted order to create an index.
If index is passed , then values in the data
corresponding to the labels in the index will be pulled
out.
EXAMPLE 1:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
Print(s)
OUTPUT:
a 0.0
b 1.0
c 2.0
dtype: float64
We can see that the index order is persisted and the missing
element is filled with NaN(not a number)
Create a Series From a Scalar:
If the data is a scalar value , then an index must be
provided.
Value will be repeated to match the length of the
index.
Example:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print (s)
Output:
0 5
1 5
2 5
3 5
dtype: int64
Accessing data from series with position:
Data in a series can be accessed similar to that in an ndimensional array.
Example:
Retrieve the first element.
Counting starts from zero in the array .
It means that the first element is stored at the 0th position and so on.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve the first
element
print s[0]
Output:
1
Example 3:
Retrieve the first three elements in the series
If a: is inserted in front of it, all items from that index
onwards will be extracted.
If two parameters (with : between them is used),items
between these two index positions will be extracted.
End index will be excluded.
Example 3:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve the first three element
print (s[:3])
Output:
a 1
b 2
c 3
dtype: int64
Example 4:
Retrieve the last three elements:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the last three element
print (s[-3:])
Output:
c 3
d 4
e 5
dtype: int64
Retrieve the Data using Label(Index)
A series is like a fixed-size dict .
In a dictionary , we can get and set values by index label.
Example: Retrieve a single element using index label value.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve a single
element
print (s['a'])
Output:
1
Example:
Retrieve multiple elements using a list of label index values.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve multiple elements
print (s[['a','c','d']])
Output:
a 1
c 3
d 4
dtype: int64
Example 5:
If a label is not contained , then an exception is raised.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve multiple
elements
print (s['f'])
Output:
KeyError: 'f'
Data Frame:
It is two dimensional array with different data elements(i.e , it is a
heterogeneous collection of data elements).
Data is stored in a tabular format in the form of rows and columns.
Foreg , consider the following dataframe.
Name Dept Semester Percentage
Sam ECE I 78
Geetha CSE II 85
Kala ECE III 75
Mala CSE IV 70
Features Of DataFrame:
Columns are of different types.
The size of the dataframe can be changed(i.e size – mutable)
Labeled axes(rows and columns)
Various arithmetic operations can be performed on rows and columns.
Pandas.DataFrame:
A pandas dataframe can be created using the following constructor.
pandas.DataFrame( data, index, columns, dtype, copy)
Parameters:
Data: data takes various forms like ndarray , series,map,list,dict,const
and also DataFrame.
Index: For row labels , the index to be used for the resulting frame is
optional.
Default is np.arange(n) if no index is passed.
Columns: For column labels , the default value is np.arange(n).
This is only true if no index is passed.
Dtype: Specifies the datatype of each column.
Copy: This command is used for copying the data if the default is false.
DataFrame Creation:
A pandas dataframe can be created by using various
inputs like:
Lists
Dict
Series
Numpy adarrays
Another DataFrame
Data Frame Description:
The table contains the students performance
department –wise and their percentage marks in each
semseter. Data is represented in the form of rows and
columns. Each column denotes an attribute and each
row denotes a student/person.
Data Type Of Columns
Name – String
Dept – String
Semester – String
Percentage – Integer
Telegram--https://t.me/pantechelearning
YouTube--https://youtube.com/c/PantecheLearning.
Podcasts--https://www.instagram.com/tv/COPQIigJZFi/?igshid=4gjhz4dlls1p
After Internship Registration what you have to do?
1. Login to www.pantechelearning.com
2.Access the Video on daily basis for next 30 Days. Practice the
Concept and submit assignments
3. Ask your doubts in VIP Group. Group link is avail in your
dashboard.
4.Finish all the videos and download your Certificate from your
dashboard
Internship Certificate (Sample)
30 Days Internship on Machine Learning Master Class
Reg Link: https://imjo.in/Rb6xqe
Discount Coupon Code: WELCOMEML
Happy learning
Call / whatsapp : +91 9840974408
THANKS
Further Information:
www.pantechelearning.com |
CREDITS: This presentation template was created by Slidesgo,
[email protected]
including icons by Flaticon, and infographics & images by Freepik