0% found this document useful (0 votes)
23 views

Pandas

Uploaded by

Yash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Pandas

Uploaded by

Yash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Pandas

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
What is the Python Libraries?

● A library is a collection of files (called modules) which contain pre-


written code that other developers have created for us
● A library is a collection of modules.
● A module is a file consisting of Python code.
● A module can define functions, classes and variables. A module
can also include runnable code.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
function/module/package/library in Python

Function

Module

Package

Library

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
How to use Python Libraries?
NumPy: NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and
advanced array operations.

SciPy : This useful library includes modules for linear algebra, integration, optimization, and statistics.

Pandas: Pandas is a library created to help developers work with "labeled" and "relational" data
intuitively.

Matplotlib: Matplotlib helps with data analyzing, and is a numerical plotting library.

Pillow: Pillow is a friendly fork of PIL (Python Imaging Library), but is more user-friendly.

SciKit-Learn: This is an industry-standard for data science projects based in Python.

Keras: Keras is a great library for building neural networks and modeling.

PyTorch: PyTorch is a framework that is perfect for data scientists who want to perform deep learning
tasks easily.

TensorFlow : TensorFlow is a popular Python framework for machine learning and deep learning,
which was developed at Google Brain
Department of Computer Engineering and Information Technology
College of Engineering Pune (COEP)
Forerunners in Technical Education
Why Use Pandas?
❖ Create Data - We begin by creating our own data set for analysis.
❖ Get Data - We will learn how to read in the text file.
❖ Prepare Data - Here we will simply take a look at the data and make
sure it is clean.
❖ Analyze Data - We will simply find the most popular name in a specific
year.
❖ Present Data - Through tabular data and a graph, clearly show the end
user what is the most popular name in a specific year.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
How to use Python Libraries?

Installation of pandas
If you have Python and PIP already installed on a system, then installation of
pandas is very easy.

Initially, we have to install it on our working


environment

we can use pip,


pip: Python’s package manager, to install and manage Python
packages.

Example: pip install pandas

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
How to use Python Libraries?

Import pandas

Once pandas is installed, import it in your applications by adding the


import keyword:

A program must import a library module before using it.

Example: import pandas


● Then refer to things from the module as module_name.thing_name
● Python uses . to mean “part of”.
Example: pandas.Series()

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
What is pandas

❖ Pandas is the most popular python library that is used for data
analysis.

We can analyze data in pandas with:

Series:

It is one dimensional(1-D) array defined in pandas that can be


used to store any data type.

DataFrames: It is two-dimensional(2-D) data structure defined


in pandas which consists of rows and columns.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
How to use Python Libraries?

ü Pandas Series is a one-dimensional labeled array capable


of holding data of any type (integer, string, float, python
objects, etc.).
ü The axis labels are collectively called index.

ü Pandas Series is nothing but a column in an excel sheet.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
How to use Python Libraries?
Ø Labels need not be unique but must be a hashable type.
Ø The object supports both integer and label-based indexing and provides a host of
methods for performing operations involving the index.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : series
● Series is one dimensional(1-D) array defined in pandas that can be
used to store any data type.

● Example: import pandas as pd


# Create series with Data, and Index
a = pd.Series(Data, index = Index)

Here, Data can be:

● A Scalar value which can be integerValue, string


● A Python Dictionary which can be Key, Value pair
● A Ndarray
Index by default is from 0, 1, 2, …(n-1) where n is length of data.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : series

● Example: import pandas as pd


# Create series with Data, and Index
a = pd.Series(Data, index = Index)

● Output:
● Example: import pandas as pd
0 2
a = pd.Series([2,7,4,1])
1 7
2 4
● Example: import pandas as pd 3 1
a = pd.Series(data=[2,7,4,1]) dtype: int64

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : series

● Example: import pandas as pd


# Create series with Data, and Index
a = pd.Series(Data, index = Index)

● Output:
● Example: import pandas as pd
A 2
a=
pd.Series([2,7,4,1],[“A”,”B”,”C”,”D”]) B 7
C 4
● Example: import pandas as pd D 1
a = pd.Series(data=[2,7,4,1], dtype: int64
index=[“A”,”B”,”C”,”D”])

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : series-> min/max

● Example: import pandas as pd


# Create series with Data, and Index
a = pd.Series(Data, index = Index)

● Example: import pandas as pd ● Output:


a= 1
pd.Series([2,7,4,1],[“A”,”B”,”C”,”D”])
print(pd.min())
● Example: import pandas as pd
● Output:
pd.Series([2,7,4,1],[“A”,”B”,”C”,”D”])
7
print(pd.max())

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : series-> scalar value

Create a Series from Scalar


If data is a scalar value, an index must be provided. The value will be
repeated to match the length of index

● Example: import pandas as pd ● Output:


a = pd.Series(10,[“A”,”B”,”C”,”D”]) A 10
print(a) B 10
C 10
D 10
dtype: int64

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : series-> scalar value

Create a Series from Dictionary

● Output:
● Example: import pandas as pd day1 420
● calories = {"day1": 420, "day2": 380, "day3": 390} day2 380
day3 390
● myvar = pd.Series(calories) dtype: int64
● print(myvar)

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
● DataFrame is a widely used data structure which works with a two-
dimensional array with labeled axes (rows and columns).
● DataFrame is defined as a standard way to store data that has two
different indexes, i.e., row index and column index

Features of DataFrame
● Potentially columns are of different types
● Size – Mutable
● Labeled axes (rows and columns)
● Can Perform Arithmetic operations on rows and columns

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Structure: Let us assume that we are creating a data frame with student’s
data.

We can think of it
as an SQL table or
a spreadsheet data
representation.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
pandas.DataFrame: A pandas DataFrame can be created using the following constructor

pandas.DataFrame( data, index, columns, dtype)


data: It consists of different forms like ndarray, series, map, constants,
lists, array.

index: For the row labels, the Index to be used for the resulting frame is
Optional Default np.arange(n) if no index is passed.

columns: For column labels, the optional default syntax is - np.arange(n).


This is only true if no index is passed.
dtype
Data type of each column.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe

#import the pandas library and aliasing as pd


Output:
import pandas as pd Empty DataFrame
df = pd.DataFrame() Columns: []
Index: []
print(df)

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create a DataFrame from Lists
The DataFrame can be created using a single list or a list of lists

import pandas as pd
data = [1,2,3,4,5] Output:
df = pd.DataFrame(data)
0
0 1
print(df)
1 2
2 3
3 4
4 5

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create a DataFrame from Lists
List with index

import pandas as pd
data = [1,2,3,4] Output:
index=[“A”,”B”,”C”,”D”] 0
df = pd.DataFrame(data,index) A 1
print(df) B 2
C 3
D 4

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create a DataFrame from Lists
List with index with column name

import pandas as pd
data = [1,2,3,4] Output:
index=[“A”,”B”,”C”,”D”] Roll_no
A 1
df = pd.DataFrame(data,index, B 2
columns=[“Roll_no”]) C 3
print(df) D 4

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Exercise 1:
We want to display like this: Name Age

0 Abc 10
1 pqr 12
2 xyz 13

import pandas as pd
data=[ ["Abc", 10 ],[ "pqr",12],[ "xyz ",14 ] ]
df=pd.DataFrame(data,columns=["Name","Age"])
print(df)

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create a DataFrame from List of Dicts
List of Dictionaries can be passed as input data to create a DataFrame. The
dictionary keys are by default taken as column names.

import pandas as pd Output:


df=pd.DataFrame({"A":[1],"B":[2]}) A B
print(df) 0 1 2

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create a DataFrame from List of Dicts
List of Dictionaries can be passed as input data to create a DataFrame. The
dictionary keys are by default taken as column names.
Output:
a b
0 1 2
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] a b
df = pd.DataFrame(data) 0 1 2
1 5 10
print(df) a b c
0 1 2 NaN
Note − Observe, NaN (Not a Number) is appended in
missing areas.
1 5 10 20

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame
Create a DataFrame from Dict of ndarrays / Lists
All the ndarrays must be of same length. If index is passed, then the length of the
index should equal to the length of the arrays.
Output:

Name Age
import pandas as pd 0 Tom 28
1 Jack 34
data = { 'Name': ['Tom', 'Jack', 'Steve',
2 Steve 29
'Ricky'], 'Age':[28,34,29,42] }
3 Ricky 42
df = pd.DataFrame(data)
print(df)

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Column Selection

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> column selection
We will understand this by selecting a column from the DataFrame.

import pandas as pd
Output:
data = { 'Name':['Tom', 'Jack', 'Steve','asda'],
'Age':[2,34,29,42], Name Age Roll_no
0 Tom 2 111
'Roll_no':[111,222,333,444] } 1 Jack 34 222
2 Steve 29 333
3 asda 42 444
df = pd.DataFrame(data)

print(df) 0 111
1 222
2 333
print(df['Roll_no']) 3 444
Name: Roll_no, dtype: int64

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Exercise :

How to select multiple columns in pandas dataframe

print(df[['Roll_no', ‘Age’ ]])

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Column Addition

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> column addition
We will understand this by selecting a column from the DataFrame.
import pandas as pd Output:
data = { 'Name':['Tom', 'Jack', 'Steve','asda'],
Name Age Roll_no R1
'Age':[2,34,29,42], 0 Tom 2 111
'Roll_no':[111,222,333,444] 1
1 Jack 34 222
'R1':[1,2,3,4] 2
} 2 Steve 29 333
3
3 asda 42 444
df = pd.DataFrame(data) 4

0 112
print(df) 1 224
2 336
3 448
print(df["Roll_no"]+df["R1"]) dtype: int64

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Column Deletion

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> column deletion
We will understand this by selecting a column from the DataFrame.
import pandas as pd Output:
data = { 'Name':['Tom', 'Jack', 'Steve','asda'],
Name Age Roll_no R1
'Age':[2,34,29,42], 0 Tom 2 111
'Roll_no':[111,222,333,444] 1
1 Jack 34 222
'R1':[1,2,3,4] 2
} 2 Steve 29 333
3
3 asda 42 444
df = pd.DataFrame(data) 4

Name Age Roll_no


print(df) 0 Tom 2 111
1 Jack 34 222
df.pop(‘R1’) 2 Steve 29 333
print(df) 3 asda 42 444

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Row Selection

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> Row Selection
import pandas as pd Output:
data={'one':[1,2,3],'two':[6,7,8]}
one two
df=pd.DataFrame(data) 0 1 6
print(df) 1 2 7
2 3 8

Rows can be selected by passing row label to a loc function.

one 1
print(df.loc[0]) two 6
Name: 0, dtype: int64

print(df.loc[0].values) [1 6]

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Row Selection by integer location

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> Row Selection
import pandas as pd Output:
data={'one':[1,2,3],'two':[6,7,8]}
one two
df=pd.DataFrame(data,index=["A1","B1","C1"]) A1 1 6
print(df) B1 2 7
C1 3 8

one 1
print(df.loc[“A1”]) two 6
Name: A1, dtype: int64
Rows can be selected by passing row label to a iloc function.

print(df.iloc[0]) one 1
two 6
Name: A1, dtype: int64

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Multiple Row Selection

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> Multiple Row Selection
import pandas as pd Output:
data={'one':[1,2,3],'two':[6,7,8]}
one two
df=pd.DataFrame(data,index=["A1","B1","C1"]) A1 1 6
print(df) B1 2 7
C1 3 8

Multiple rows can be selected using ‘ : ’ operator.

one two
print(df.iloc[0:2]) A1 1 6
B1 2 7

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Rank () function

● The rank() function is used to compute numerical data ranks (1


through n) along axis.

● By default, equal values are assigned a rank that is the average of


the ranks of those values.

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
pandas : DataFrame-> rank function
import pandas as pd Output:
data={'name':["X","Y","Z"],'per':[70,65,80]}
name per
df=pd.DataFrame(data,index=["A1","B1","C1"]) A1 X
print(df) 70
B1 Y
65
C1 Z
df["rank"]=df['per'].rank() name per rank
80
print(df) A1 X 70 2.0
B1 Y 65 1.0
C1 Z 80 3.0
name per rank
df["rank"]=df['per'].rank(ascending A1 X 70 2.0
=False) B1 Y 65 3.0
C1 Z 80 1.0
print(df)
Department of Computer Engineering and Information Technology
College of Engineering Pune (COEP)
Forerunners in Technical Education
Sorting

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
sort_index()
g=pd.DataFrame(data=[4,5,7,2],index=["Z","M","L","A"],
Output:
columns=["Roll"])
print(g) Roll
Z 4
M 5
L 7
A 2

Output:

print(g.sort_index()) Roll
A 2
L 7
M 5
Z 4

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
sort_valuesvalues()
g=pd.DataFrame( data=[ [12,45,56,78],[14,21,33,53],[1,2,3,4],[9,7,5,2] ],
index=["Z","M","L","A"],columns=["R1","R2","R3","R4"])
print(g)

Output:
R1 R2 R3 R4
Z 12 45 56 78
M 14 21 33 53 Output:
L 1 2 3 4
A 9 7 5 2
R1 R2 R3 R4
L 1 2 3 4
print(g.sort_values(by=["R1"])) A 9 7 5 2
Z 12 45 56
78
M 14 21 33 53

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education
Thank You!

Department of Computer Engineering and Information Technology


College of Engineering Pune (COEP)
Forerunners in Technical Education

You might also like