0% found this document useful (0 votes)

69 views13 pages

Introduction to Python Pandas Library

Uploaded by

Vineet Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views13 pages

Introduction to Python Pandas Library

Uploaded by

Vineet Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

The Unique Computers

Neelmatha Lucknow
Python Pandas
What is Pandas?
Pandas is a powerful Python library that is specifically designed to work on
data frames that have "relational" or "labeled" data. Its aim aligns with doing
real-world data analysis using Python. Its flexibility and functionality make it
indispensable for various data-related tasks. Hence, this Python package
works well for data manipulation, operating a dataset, exploring a data
frame, data analysis, and machine learning-related tasks.

Generally, Pandas operates a data frame using Series and DataFrame; where
Series works on a one-dimensional labeled array holding data of any type
like integers, strings, and objects, while a DataFrame is a two-dimensional
data structure that manages and operates data in tabular form (using rows
and columns).

Why Pandas?
The beauty of Pandas is that it simplifies the task related to data frames and
makes it simple to do many of the time-consuming, repetitive tasks involved
in working with data frames, such as:

• Import datasets - available in the form of spreadsheets, comma-

separated values (CSV) files, and more.

• Data cleansing - dealing with missing values and representing them

as NaN, NA, or NaT.

• Size mutability - columns can be added and removed from

DataFrame and higher-dimensional objects.

• Data normalization – normalize the data into a suitable format for

analysis.

• Data alignment - objects can be explicitly aligned to a set of labels.

Intuitive merging and joining data sets – we can merge and join
datasets.
• Reshaping and pivoting of datasets – datasets can be reshaped
and pivoted as per the need.

• Efficient manipulation and extraction - manipulation and

extraction of specific parts of extensive datasets using intelligent label-
based slicing, indexing, and subsetting techniques.

• Statistical analysis - to perform statistical operations on datasets.

• Data visualization - Visualize datasets and uncover insights.

Applications of Pandas
The most common applications of Pandas are as follows:

• Data Cleaning: Pandas provides functionalities to clean messy data,

deal with incomplete or inconsistent data, handle missing values,
remove duplicates, and standardize formats to do effective data
analysis.

• Data Exploration: Pandas easily summarize statistics, find trends,

and visualize data using built-in plotting functions, Matplotlib, or
Seaborn integration.

• Data Preparation: Pandas may pivot, melt, convert variables, and

merge datasets based on common columns to prepare data for
analysis.

• Data Analysis: Pandas supports descriptive statistics, time series

analysis, group-by operations, and custom functions.

• Data Visualisation: Pandas itself has basic plotting capabilities; it

integrates and supports data visualization libraries like Matplotlib,
Seaborn, and Plotly to create innovative visualizations.

• Time Series Analysis: Pandas supports date/time indexing,

resampling, frequency conversion, and rolling statistics for time series
data.

• Data Aggregation and Grouping: Pandas HYPERLINK

"https://www.tutorialspoint.com/python_pandas/python_pandas_groupb
y.htm"groupby HYPERLINK
"https://www.tutorialspoint.com/python_pandas/python_pandas_groupb
y.htm"() function lets you aggregate data and compute group-wise
summary statistics or apply functions to groups.
• Data Input/Output: Pandas makes data input and export easy by
reading and writing CSV, Excel, JSON, SQL databases, and more.

• Machine Learning: Pandas works well with Scikit-learn for data

preparation, feature engineering, and model input data.

• Financial Analysis: Pandas is commonly used in finance for stock

market data analysis, financial indicator calculation, and portfolio
optimization.

• Text Data Analysis: Pandas' string manipulation, regular expressions,

and text mining functions help analyse textual data.

• Experimental Data Analysis: Pandas makes manipulating and

analysing large datasets, performing statistical tests, and visualizing
results easy.

Python Pandas Data Structures

Data structures in Pandas are designed to handle data efficiently. They allow
for the organization, storage, and modification of data in a way that
optimizes memory usage and computational performance. Python Pandas

−
library provides two primary data structures for handling and analyzing data

• Series

• DataFrame

Dimension and Description of Pandas Data Structures

Data Dimensio Description

Structure ns
Series 1 A one-dimensional labeled homogeneous array, sizeimmutable.
Data Frames 2 A two-dimensional labeled, size-mutable tabular structure with
potentially heterogeneously typed columns.

Series

A Series is a one-dimensional labeled array that can hold any data type. It
can store integers, strings, floating-point numbers, etc. Each value in a
Series is associated with a label (index), which can be an integer or a string.

Name Steve
Age 35

Gender Male

Rating 3.5

Example

Consider the following Series which is a collection of different data types

import pandas as pd
data = ['Steve', '35', 'Male', '3.5']
series = pd.Series(data, index=['Name', 'Age', 'Gender', 'Rating'])
print(series)

On executing the above program, you will get the following output −

Name Steve

Age 35

Gender Male

Rating 3.5

dtype: object

Key Points

Following are the key points related to the Pandas Series.

• Homogeneous data

• Size Immutable

• Values of Data Mutable

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns that

can hold different data types. It is similar to a table in a database or a

rating of a sales team −

spreadsheet. Consider the following data representing the performance

Name Age Gender Rating

Steve 32 Male 3.45

Lia 28 Female 4.6

Vin 45 Male 3.9

Katie 38 Female 2.78

Example

The above tabular data can be represented in a DataFrame as follows −

Open Compiler

import pandas as pd

# Data represented as a dictionary

data = {

'Name': ['Steve', 'Lia', 'Vin', 'Katie'],

'Age': [32, 28, 45, 38],

'Gender': ['Male', 'Female', 'Male', 'Female'],

'Rating': [3.45, 4.6, 3.9, 2.78]

# Creating the DataFrame

df = pd.DataFrame(data)
print(df)

Output

On executing the above code you will get the following output −

Name Age Gender Rating

0 Steve 32 Male 3.45

1 Lia 28 Female 4.60

2 Vin 45 Male 3.90

3 Katie 38 Female 2.78

Key Points

Following are the key points related the Pandas DataFrame −

• Heterogeneous data

• Size Mutable

• Data Mutable

Creation of Data Frames

Creation New dataFrames

import pandas as pd
data={"name":["rahul","neha","amit"],
"age":[12,15,27],
"Salary":[1200,1500,1200]
}
df=pd.DataFrame(data)
print(df)

Reading CSV File

import pandas as pd
data=pd.read_csv("book.csv")
print(data)

reading Excel File

import pandas as pd
data=pd.read_excel("book1.xlsx")
print(data)

Exploring Data in Pandas

There are some Function in Pandas

Head()

Tail()

Info()

Describe()

Isnull()
Isnull().sum()

Dealing With Duplicate Values

Data.duplicated()

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data.duplicated())

Data[“emp_id”].duplicated()

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data["Emp_ID"].duplicated())

Data[“emp_id”].duplicated().sum()

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data["Emp_ID"].duplicated().sum())

Data.drop_duplicates(“emp_id”)

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data.drop_duplicates("Emp_ID"))

Working with missing values

Data.isnull
To print null values
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data.isnull())
data.isnull().sum())
to count null values
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data.isnull().sum())
data.dropna()
To delete null values
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data)
print("\n\n\n")
print(data.dropna())
data.replace(np.nan,"hii")
to replace nan
import numpy as np
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data)
data.replace(np.nan,"hii")
data["Salary"]=data["Salary"].replace(np.nan,30000)
to replace any special char
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
data["Salary"]=data["Salary"].replace(np.nan,30000)
print(data)
data["Salary"].mean()
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
print(data["Salary"].mean())
data.fillna(method="bfill")
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
print(data)
print("\n\n\n")
print(data.fillna(method="bfill"))
data.fillna(method="ffill")
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
print(data)
print("\n\n\n")
print(data.fillna(method="ffill"))
Column transformation in Pandas

To create new column

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data,"\n\n")

data.loc[(data["Bonus"] == 0),"GetBonus"]="No Bonus"

data.loc[(data["Bonus"] > 0,"GetBonus")]="Bonus"

print(data)

To marge two column

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data,"\n\n")

data["Full name"]=data["Name"]+" "+data["Last Name"]

print(data)

To Add Calculation in column

import pandas as pd

data=pd.read_excel("salary.xlsx")

print(data,"\n\n")

data["Bonus"]=(data["Salary"]/100)*20

print(data)

To extract some latter from dataFrame

import pandas as pd

data={"Month":["January","Fabruary","March","April"]}

a=pd.DataFrame(data)
print(a)

def extract(value):

return value[0:3]

a["Short_Months"]=a["Month"].map(extract)

print(a)

GroupBy In Pandas
Count gender by Deparment

import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
gp=data.groupby("Department").agg({"Gender":"count"})
print(gp)
By Job Title count Emp_id

import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
gp=data.groupby("Job Title").agg({"Emp_ID":"count"})
print(gp)

By Gender

import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
gp=data.groupby(["Department","Gender"]).agg({"Emp_ID":"count"})
print(gp)
By Age
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
print("\n\n\n")
a=data.groupby("Countries").agg({"Age":"max"})
print(a)
By Age and Gender
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
print("\n\n\n")
a=data.groupby(["Countries","Gender"]).agg({"Age":"max"})
print(a)

Merge Join and Concatenate in Pandas

Merge
On the basis of EEID
import pandas as pd
data1={"EEID":["A01","A02","A03","A04","A05","A06"],
"Name":["Amit","priya","Neha","Lovely","Karab","Mohit"],
"Age":[34,56,24,27,28,26]}
print(data1)
data2={"EEID":["A01","A02","A03","A04","A05","A06"],
"Salary":[45000,47000,30000,14200,42300,456600]}
print(data2)
print("\n\n\n")
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print()
print(df2)
print()
print(pd.merge(df1,df2,on="EEID"))
Use of how
P1
import pandas as pd
data1={"EEID":["A01","A02","A03","A04","A05","A06"],
"Name":["Amit","priya","Neha","Lovely","Karab","Mohit"],
"Age":[34,56,24,27,28,26]}
print(data1)
data2={"EEID":["A01","A02","A03","A04","A05","A06"],
"Salary":[45000,47000,30000,14200,42300,456600]}
print(data2)
print("\n\n\n")
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print()
print(df2)
print()
print(pd.merge(df1,df2,on="EEID", how="inner"))
P2
print(pd.merge(df1,df2,on="EEID", how="left"))
P3
print(pd.merge(df1,df2,on="EEID", how="right"))

Concatenate
import pandas as pd
data1={"EEID":["A01","A02","A03","A04","A05","A06"],
"Name":["Amit","priya","Neha","Lovely","Karan","Mohit"]}
data2={"EEID":["A07","A08","A09","A010","A11","A12"],
"Name":["Atin","Pankaj","Alia","Suman","Sanjay","Karan"]}
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print(df2)
print()
ndf=pd.concat([df1,df2])
print(ndf)

Join
import pandas as pd
data1={"EEI":["A01","A02","A03","A04","A05","A06"],
"Name":["Amit","priya","Neha","Lovely","Karab","Mohit"]}
print(data1)
data2={"EEID":["A09","A02","A03","A010","A05","A06"],
"Salary":[45000,47000,30000,14200,42300,456600]}
print(data2)
print("\n\n\n")
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print()
print(df2)
print()
print(df1.join(df2))

Introduction to Pandas Library in Python
No ratings yet
Introduction to Pandas Library in Python
39 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Pandas
No ratings yet
Pandas
8 pages
Python Pandas Tutorial
No ratings yet
Python Pandas Tutorial
6 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Python Pandas DataFrame Guide
No ratings yet
Python Pandas DataFrame Guide
4 pages
Unit 4
No ratings yet
Unit 4
36 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
31 pages
Pandas Basics: Data Structures & Features
No ratings yet
Pandas Basics: Data Structures & Features
30 pages
Python Pandas Tutorial For Beginners
100% (1)
Python Pandas Tutorial For Beginners
203 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
12 pages
Pandas
No ratings yet
Pandas
11 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas
No ratings yet
Pandas
13 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
Python Pandas Module - Introduction-07-11-2023
No ratings yet
Python Pandas Module - Introduction-07-11-2023
84 pages
Python Pandas
No ratings yet
Python Pandas
2 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
22 pages
Notes On Pandasmanpreet
No ratings yet
Notes On Pandasmanpreet
4 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Pandas
No ratings yet
Pandas
21 pages
Week 4.1
No ratings yet
Week 4.1
16 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Unit V Pandas AIML A B Lastupdated 18-06-2024
No ratings yet
Unit V Pandas AIML A B Lastupdated 18-06-2024
33 pages
Data Analysis with Pandas Overview
No ratings yet
Data Analysis with Pandas Overview
49 pages
Data Wrangling with Pandas Guide
No ratings yet
Data Wrangling with Pandas Guide
16 pages
Pandas Library: Features and Usage
No ratings yet
Pandas Library: Features and Usage
4 pages
Using rbind with Pandas DataFrames
No ratings yet
Using rbind with Pandas DataFrames
17 pages
Data Analytics Preparation & Visualization
No ratings yet
Data Analytics Preparation & Visualization
54 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Pandas
No ratings yet
Pandas
3 pages
JOINS
No ratings yet
JOINS
10 pages
Practical - 3 (Ai)
No ratings yet
Practical - 3 (Ai)
12 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Unit III - Notes
No ratings yet
Unit III - Notes
12 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Pandas
No ratings yet
Pandas
42 pages
Pandas
No ratings yet
Pandas
10 pages
Pandas
No ratings yet
Pandas
82 pages
Getting Started with Pandas DataFrames
No ratings yet
Getting Started with Pandas DataFrames
38 pages
Subject IP
No ratings yet
Subject IP
9 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (46)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
HTML Basics and Coding Notes
No ratings yet
HTML Basics and Coding Notes
22 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Python Notes PDF
100% (1)
Python Notes PDF
7 pages
Introduction To HTML
No ratings yet
Introduction To HTML
103 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
EBOOK - Python Crash Course For Data Analysis
100% (12)
EBOOK - Python Crash Course For Data Analysis
168 pages
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
100% (9)
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
167 pages
100 Skills To Better Python
100% (10)
100 Skills To Better Python
80 pages
Data Analysis With PANDAS: Cheat Sheet
86% (7)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Python in Excel (2024)
100% (14)
Python in Excel (2024)
607 pages
Learn Python Visually
100% (10)
Learn Python Visually
134 pages
Beginners Python Cheat Sheet
89% (9)
Beginners Python Cheat Sheet
28 pages
Python Full Notes - Working
100% (5)
Python Full Notes - Working
645 pages
Practical Projects
100% (32)
Practical Projects
478 pages
Actc HTML Notes
No ratings yet
Actc HTML Notes
48 pages
Analytics Python Programming
92% (13)
Analytics Python Programming
203 pages
Complete HTML Notes 1681809769
No ratings yet
Complete HTML Notes 1681809769
27 pages
Python For Data Science - Cheat Sheets
100% (4)
Python For Data Science - Cheat Sheets
10 pages
Data Visualization With Python PDF
93% (15)
Data Visualization With Python PDF
662 pages
Python Data Science Cheat Sheet
97% (33)
Python Data Science Cheat Sheet
11 pages
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
100% (15)
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
244 pages
Learn Python in A Day
93% (15)
Learn Python in A Day
141 pages
201 Python Programming Exercises For All
100% (3)
201 Python Programming Exercises For All
425 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
Python Cheat Sheet: Ata Tructures
100% (12)
Python Cheat Sheet: Ata Tructures
2 pages
File Handling
100% (1)
File Handling
52 pages
LibreOffice Shortcut Key
No ratings yet
LibreOffice Shortcut Key
6 pages
O'LEVEL PREVIOUS PAPERS Hindi & English Medium
No ratings yet
O'LEVEL PREVIOUS PAPERS Hindi & English Medium
11 pages
LibreOffice Shortcut Keys Guide
No ratings yet
LibreOffice Shortcut Keys Guide
7 pages
Chapter 5 File Handling
No ratings yet
Chapter 5 File Handling
52 pages
FANUC PMC Basics for Engineers
No ratings yet
FANUC PMC Basics for Engineers
8 pages
Teknologi Informasi, Internet Dan Pengguna
No ratings yet
Teknologi Informasi, Internet Dan Pengguna
30 pages
LaTeX & Mendeley for Academics
No ratings yet
LaTeX & Mendeley for Academics
17 pages
Martin Final Report
No ratings yet
Martin Final Report
50 pages
CS8392: Object Oriented Programming
No ratings yet
CS8392: Object Oriented Programming
36 pages
Programmable Logic Controller PLC in Automation
No ratings yet
Programmable Logic Controller PLC in Automation
10 pages
Log Analysis vs. Insider Attacks by Dr. Anton Chuvakin
No ratings yet
Log Analysis vs. Insider Attacks by Dr. Anton Chuvakin
8 pages
Instructions SFG 2025 Entrance Test-1
No ratings yet
Instructions SFG 2025 Entrance Test-1
3 pages
LAB 28 - Installing Anti-Virus & Firewall Configuration
No ratings yet
LAB 28 - Installing Anti-Virus & Firewall Configuration
16 pages
GH 300 Demo
No ratings yet
GH 300 Demo
5 pages
1.1 (Dimensional Modelling)
No ratings yet
1.1 (Dimensional Modelling)
51 pages
Growatt OSS Installationsprogram (Engelsk)
No ratings yet
Growatt OSS Installationsprogram (Engelsk)
11 pages
Computer Fundamentals and Key Inventions
78% (18)
Computer Fundamentals and Key Inventions
69 pages
Feide System Architecture: December 2007 English Translation: Apr. 15th, 2008
No ratings yet
Feide System Architecture: December 2007 English Translation: Apr. 15th, 2008
20 pages
Installing Microsoft Report Builder
No ratings yet
Installing Microsoft Report Builder
17 pages
Best Practices for Cloud Transformation
No ratings yet
Best Practices for Cloud Transformation
16 pages
Presentation On SDLC (Software Development Life Cycle)
100% (1)
Presentation On SDLC (Software Development Life Cycle)
19 pages
Eset Activator
100% (1)
Eset Activator
4 pages
OU500554475 2 PSI 500i Conn To Autom
100% (1)
OU500554475 2 PSI 500i Conn To Autom
50 pages
Parts and Service News-At22111
No ratings yet
Parts and Service News-At22111
15 pages
Oracle Apps R12 Startup and Shutdown Guide
No ratings yet
Oracle Apps R12 Startup and Shutdown Guide
3 pages
VS863 Update Instructions
No ratings yet
VS863 Update Instructions
5 pages
Practical 2
No ratings yet
Practical 2
9 pages
RCS Overview for Linux Systems
No ratings yet
RCS Overview for Linux Systems
19 pages
ISO 9001:2015 QMS Scope Guide
No ratings yet
ISO 9001:2015 QMS Scope Guide
3 pages
Project Selector
No ratings yet
Project Selector
12 pages
Livecycle Es2 5 Guidelines FINAL
No ratings yet
Livecycle Es2 5 Guidelines FINAL
36 pages
Py 6
No ratings yet
Py 6
3 pages
PC-CS603 CN Sugestion Set1
No ratings yet
PC-CS603 CN Sugestion Set1
27 pages
DPKG
No ratings yet
DPKG
2 pages

Introduction to Python Pandas Library

Uploaded by

Introduction to Python Pandas Library

Uploaded by

The Unique Computers

• Import datasets - available in the form of spreadsheets, comma-

• Data cleansing - dealing with missing values and representing them

• Size mutability - columns can be added and removed from

• Data normalization – normalize the data into a suitable format for

• Data alignment - objects can be explicitly aligned to a set of labels.

• Efficient manipulation and extraction - manipulation and

• Statistical analysis - to perform statistical operations on datasets.

• Data visualization - Visualize datasets and uncover insights.

• Data Cleaning: Pandas provides functionalities to clean messy data,

• Data Exploration: Pandas easily summarize statistics, find trends,

• Data Preparation: Pandas may pivot, melt, convert variables, and

• Data Analysis: Pandas supports descriptive statistics, time series

• Data Visualisation: Pandas itself has basic plotting capabilities; it

• Time Series Analysis: Pandas supports date/time indexing,

• Data Aggregation and Grouping: Pandas HYPERLINK

• Machine Learning: Pandas works well with Scikit-learn for data

• Financial Analysis: Pandas is commonly used in finance for stock

• Text Data Analysis: Pandas' string manipulation, regular expressions,

• Experimental Data Analysis: Pandas makes manipulating and

Python Pandas Data Structures

Dimension and Description of Pandas Data Structures

Data Dimensio Description

Consider the following Series which is a collection of different data types

Following are the key points related to the Pandas Series.

• Values of Data Mutable

A DataFrame is a two-dimensional labeled data structure with columns that

rating of a sales team −

Name Age Gender Rating

Lia 28 Female 4.6

Vin 45 Male 3.9

Katie 38 Female 2.78

The above tabular data can be represented in a DataFrame as follows −

# Data represented as a dictionary

'Name': ['Steve', 'Lia', 'Vin', 'Katie'],

'Age': [32, 28, 45, 38],

'Gender': ['Male', 'Female', 'Male', 'Female'],

'Rating': [3.45, 4.6, 3.9, 2.78]

# Creating the DataFrame

Name Age Gender Rating

0 Steve 32 Male 3.45

1 Lia 28 Female 4.60

2 Vin 45 Male 3.90

3 Katie 38 Female 2.78

Following are the key points related the Pandas DataFrame −

Creation of Data Frames

Creation New dataFrames

Reading CSV File

reading Excel File

Exploring Data in Pandas

There are some Function in Pandas

Dealing With Duplicate Values

Working with missing values

To create new column

data.loc[(data["Bonus"] == 0),"GetBonus"]="No Bonus"

data.loc[(data["Bonus"] > 0,"GetBonus")]="Bonus"

To marge two column

data["Full name"]=data["Name"]+" "+data["Last Name"]

To Add Calculation in column

To extract some latter from dataFrame

Merge Join and Concatenate in Pandas

You might also like