0% found this document useful (0 votes)

22 views12 pages

05 Pandas

Pandas is a Python library designed for data analysis, providing flexible data structures like Series and DataFrame for easy manipulation of labeled data. It facilitates data wrangling, exploration, cleaning, and analysis through various functions and methods, including handling missing values and feature engineering techniques. The library also supports data visualization and statistical analysis, making it a fundamental tool for practical data analysis in Python.

Uploaded by

Rochit Limje

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views12 pages

05 Pandas

Uploaded by

Rochit Limje

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Pandas – Python’s Panel Data Analysis Libary

• pandas is a Python package providing fast, flexible, and expressive data

structures designed to make working with “relational” or “labeled” data both
easy and intuitive.
• It aims to be the fundamental high-level building block for doing practical,
real-world data analysis in Python.
• It is very useful in data wrangling, the process of gathering, collecting, and
transforming Raw data into another format for better understanding,
decision-making, accessing, and analysis in less time. Data Wrangling is
also known as Data Munging.
• Series and DataFrame are two data structure available to enable data
processing intuitively

1
Pandas as a data exploration tool
• Import the package
import pandas as pd
• Read file
df = pd.read_csv('C:\\Rekha\\InputFiles\\DAUP\\Student.csv’)
• Explore/Analyze data
• df.size – Total values
• df.shape – Rows and Columns
• df.describe() – Gives statistical information on the columns. Descriptive Statistics
• df.info() - Displays the column names and their data types
• df.columns : Access column names
• df[‘gender’], df.gender : Access data by column name or index values df[0:3]
• df["Team"].value_counts() – Frequency Distribution
• df.Team.unique() - All unique values for the column
• Selection of data by applying conditions
• Using loc, iloc, query

2
• Cleanup the data
• Rename columns
• Replace values in the dataframe
• Drop columns
• Remove duplicates
• Address null values : Drop
• Prepare the Data
• Add new columns if necessary
• Format the Date Column
• Apply selection, Filter to analyze a subset of data

3
• Analyze the data
• Sorting data
• Aggregation using group by, pivot table, crosstab– Sum, Count, mean, max
• Visualization

4
Functions that we will be working on:
• shape, size, index, columns

• head, tail, info, describe

• Acess columns : df[‘gender’], df.gender, or df[[‘column list’]]

• Acess rows : df[:3][‘gender’], df[0:3][['gender','group’]]

• df.loc[2:5,['gender','group’]]

• df.iloc[3:5,0:3] (Using index)

• Form filters and query

• df[df.group.isin(['group A' ,'group B’])]

• df.query('group == "group A" and math_score <60')[:3]

5
• Renaming: df.rename
(columns = {'gender':'Gender','group':'Group'},inplace=True)

• Replace Values:
df.replace({'gender': {'female':'F', 'male' : 'M'}},inplace=True)

• Handling Null values : df.isna().sum(axis=1) , df.isna().sum(axis=0)

• Drop rows or columns : df.dropna(axis=0)

• Drop duplicates : drop_duplicates(inplace=True)

6
• df.nunique() – To obtain the count of unique values for each column

• df.group.value_counts() – To obtaine the count for each of the unique values in the
columns ‘group’

• Fill null values df1['math_score'].fillna(df1.math_score.median(),inplace=True)

• Grouping data :
df.groupby(by='gender').mean()
df.groupby(by=['group','gender']).sum()
df.groupby(['gender', 'group']).agg({'total' : ['min', 'max', 'mean', 'std'], 'math_score': ['mean’]})

• Adding new column

df['total'] = df.sum(axis=1)

7
• df.sort_values(by=['math_score'], na_position='first',ascending=False)[:5]

• df.pivot_table(index=['group','gender’])

• pd.crosstab(df.gender,df.group,margins=True)

• pd.cut(df.Age,bins=bins, labels=bins[1:])

• df.corr()

8
Data Cleaning
• Exploring Data
• Shape, info, columns, indexes, describe, head, tail
• Filter the data
• Handling Missing Data
• Drop data (delete), fill the values with mean, median, mode.
• Use ML algorithm to identify highly probable value using regression
• Handling Outliers
• Using box plot, scatter plot to identify outliers and handle using z-score or inter
quartile range method

9
Feature Engineering
• Feature Encoding Technique
• One-hot coding, Label Coding, Ordinal encoding
• Feature Scaling
• Features have different ranges, magnitudes and units. To be able to compare
data in multiple scales like salary, age. This is known has feature normalization or
feature scaling.
• Feature transformation:
• Converting numerical data to categorical data
• Splitting a categorical data to multiple columns

10
Feature encoding

One-hot coding transforms the categorical into

labels and splits the column into multiple columns

Label Coding is also called a s integer coding.

Converting. Here, the unique values in variables
are replaced with a sequence of integer values.
For example
categories: red, green, and blue.
encoded value : red is 0, green is 1,
and blue is 2.

Ordinal Encoding is similar to Label coding,

except there is an order to the encoding.
Example category : low , medium , high.

Library to use : sklearn.prepocessing

11
Feature Scaling
• Standard scaling or Z score Normalization. Derive value based on its z value.
It is best suited for normally suited distribution. Suppose is the mean and is the standard deviation of
the feature column. Then z score is as follows.

• Min Max Scaling : This method linearly transforms the original data into the given
range. It preserves the relationships between the scaled data and the original data. If
the distribution is not normally distributed and the value of the standard deviation is
very small, then the min-max scaler works better since it is more sensitive to outliers.

Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Lenovo Laptop Motherboard Schematic Diagram
50% (6)
Lenovo Laptop Motherboard Schematic Diagram
35 pages
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
100% (1)
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
37 pages
Information Practices
No ratings yet
Information Practices
141 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Drones and The Creative Industry PDF
No ratings yet
Drones and The Creative Industry PDF
164 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Pandas Plots
No ratings yet
Pandas Plots
14 pages
Data Treatment
No ratings yet
Data Treatment
6 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
Part A Assignment - No - 1
No ratings yet
Part A Assignment - No - 1
7 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Exploring Categorical Data - Students
No ratings yet
Exploring Categorical Data - Students
40 pages
347 862932 Datawrangling
No ratings yet
347 862932 Datawrangling
17 pages
Hint Sheet
No ratings yet
Hint Sheet
13 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Even Students
No ratings yet
Even Students
36 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Pandas
No ratings yet
Pandas
5 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Python Basics - Hamza Zahoor
No ratings yet
Python Basics - Hamza Zahoor
6 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
7 - InnovatiCS - Categorical Data & Data Transformation
No ratings yet
7 - InnovatiCS - Categorical Data & Data Transformation
20 pages
Philips Hd15 Ultrasound Machine
No ratings yet
Philips Hd15 Ultrasound Machine
5 pages
9500 MPR - CorEvo Board Introduction-1
No ratings yet
9500 MPR - CorEvo Board Introduction-1
29 pages
Grade 4-Q2W2 (Matatag DLL) - Mathematics
No ratings yet
Grade 4-Q2W2 (Matatag DLL) - Mathematics
10 pages
Feritscope FMP30: Operators Manual
No ratings yet
Feritscope FMP30: Operators Manual
240 pages
Edoc List
No ratings yet
Edoc List
11 pages
GJU Hisar Non Teaching Recruitment 2024 Notification
No ratings yet
GJU Hisar Non Teaching Recruitment 2024 Notification
16 pages
جلب الحبيب بالسحر الرهيب (١)
0% (1)
جلب الحبيب بالسحر الرهيب (١)
2 pages
EE653 OpenDSS Tutorial and Cases
100% (1)
EE653 OpenDSS Tutorial and Cases
403 pages
BSSC Pervious Year 08-12-23 (1st Shift)
No ratings yet
BSSC Pervious Year 08-12-23 (1st Shift)
5 pages
1703mo Manual
No ratings yet
1703mo Manual
6 pages
Soft Computing Technique Based Economic Load Dispatch Using Improved Particle Swarm Optimization
No ratings yet
Soft Computing Technique Based Economic Load Dispatch Using Improved Particle Swarm Optimization
7 pages
Aws Certified Sysops Administrator - Associate (Soa-C01) Exam Guide
No ratings yet
Aws Certified Sysops Administrator - Associate (Soa-C01) Exam Guide
3 pages
Information Security Manual
No ratings yet
Information Security Manual
18 pages
Optimization of Blasting Patterns in Esfordi Phosphate Mine Using Hybrid
No ratings yet
Optimization of Blasting Patterns in Esfordi Phosphate Mine Using Hybrid
9 pages
HTML VIVA Questions
No ratings yet
HTML VIVA Questions
3 pages
Laravel 10 CRUD (Create, Read, Update and Delete) - Tutorial101
No ratings yet
Laravel 10 CRUD (Create, Read, Update and Delete) - Tutorial101
6 pages
A Novel Portable Augmented Reality Surgical Navi - 2024 - International Journal
No ratings yet
A Novel Portable Augmented Reality Surgical Navi - 2024 - International Journal
7 pages
BDA University Question Paper
No ratings yet
BDA University Question Paper
10 pages
PPK - Knowledge Associate - Knowledge Management - Campus JD - 2024-25 - Compressed
No ratings yet
PPK - Knowledge Associate - Knowledge Management - Campus JD - 2024-25 - Compressed
10 pages
Important Topics INT 251
No ratings yet
Important Topics INT 251
2 pages
CS432 S17 DistributedFileSystems
No ratings yet
CS432 S17 DistributedFileSystems
68 pages
TheAceBase - SIBM Bengaluru - PreliminaryRound - CASEino2024
No ratings yet
TheAceBase - SIBM Bengaluru - PreliminaryRound - CASEino2024
1 page
Markets Are Found Not Created
No ratings yet
Markets Are Found Not Created
6 pages
Vivek GBE Hacktivisim, Wikileaks, Anonymous, Pegasus, Personal Cybersecurity
No ratings yet
Vivek GBE Hacktivisim, Wikileaks, Anonymous, Pegasus, Personal Cybersecurity
4 pages
Day 29
No ratings yet
Day 29
1 page
30
No ratings yet
30
1 page
76367349732
No ratings yet
76367349732
2 pages
fbdd6 - Sony mbx250 z50 hr1m0302
No ratings yet
fbdd6 - Sony mbx250 z50 hr1m0302
74 pages
Case Study - Hard Rock Cafe
No ratings yet
Case Study - Hard Rock Cafe
1 page
Novel Approach On Audio To Text Sentiment Analysis On Product Reviews
No ratings yet
Novel Approach On Audio To Text Sentiment Analysis On Product Reviews
8 pages
Styling Your Text!
No ratings yet
Styling Your Text!
17 pages
6.034 Quiz 1 September 27, 2006: Name Email
No ratings yet
6.034 Quiz 1 September 27, 2006: Name Email
12 pages
WEB Security: Henric Johnson Blekinge Institute of Technology, Sweden Henric - Johnson@bth - Se
No ratings yet
WEB Security: Henric Johnson Blekinge Institute of Technology, Sweden Henric - Johnson@bth - Se
22 pages
WC Quiz Lab2019
No ratings yet
WC Quiz Lab2019
2 pages

05 Pandas

Uploaded by

05 Pandas

Uploaded by

Pandas – Python’s Panel Data Analysis Libary

• pandas is a Python package providing fast, flexible, and expressive data

• head, tail, info, describe

• Acess columns : df[‘gender’], df.gender, or df[[‘column list’]]

• Acess rows : df[:3][‘gender’], df[0:3][['gender','group’]]

• df.iloc[3:5,0:3] (Using index)

• Form filters and query

• df.query('group == "group A" and math_score <60')[:3]

• Handling Null values : df.isna().sum(axis=1) , df.isna().sum(axis=0)

• Drop rows or columns : df.dropna(axis=0)

• Drop duplicates : drop_duplicates(inplace=True)

• Fill null values df1['math_score'].fillna(df1.math_score.median(),inplace=True)

• Adding new column

One-hot coding transforms the categorical into

Label Coding is also called a s integer coding.

Ordinal Encoding is similar to Label coding,

Library to use : sklearn.prepocessing

You might also like