0% found this document useful (0 votes)

46 views13 pages

Pandas

A tutorial in Pandas

Uploaded by

hikmatbaniya20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views13 pages

Pandas

A tutorial in Pandas

Uploaded by

hikmatbaniya20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Pandas

Full Pandas Tutorial

Pandas is a powerful data manipulation and analysis library for Python. It is built
on top of NumPy and provides data structures such as DataFrames, which allow
you to efficiently work with structured data.
Here’s a detailed tutorial that will help you understand Pandas from basic
concepts to advanced techniques.

1. Installation
You can install Pandas using pip:

bashCopy code
pip install pandas

2. Importing Pandas
Pandas is typically imported with the alias pd :

pythonCopy code
import pandas as pd

3. Pandas Data Structures

3.1 Series
A Series is a one-dimensional labeled array capable of holding any data type.

pythonCopy code
s = pd.Series([1, 3, 5, 7, 9])

Pandas 1
print(s)

You can specify an index for the Series:

pythonCopy code
s = pd.Series([1, 3, 5, 7], index=['a', 'b', 'c', 'd'])
print(s)

3.2 DataFrame
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous
tabular data structure with labeled axes (rows and columns).
You can create a DataFrame from a dictionary:

pythonCopy code
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)

4. DataFrame Operations

4.1 Viewing Data

First few rows:

pythonCopy code
df.head() # By default, it shows the first 5 rows

Last few rows:

Pandas 2
pythonCopy code
df.tail(3) # Shows the last 3 rows

DataFrame shape (rows, columns):

pythonCopy code
df.shape

Column names:

pythonCopy code
df.columns

Data types of each column:

pythonCopy code
df.dtypes

4.2 Selecting Data

Select a single column:

pythonCopy code
df['Name']

Select multiple columns:

pythonCopy code
df[['Name', 'Age']]

Pandas 3
Select rows by label using .loc[] :

pythonCopy code
df.loc[0] # Select the row with index 0
df.loc[0:2, ['Name', 'City']] # Select rows 0 to 2 and colum
ns 'Name' and 'City'

Select rows by position using .iloc[] :

pythonCopy code
df.iloc[0] # First row
df.iloc[0:2, 0:2] # First two rows and first two columns

Boolean indexing (filtering rows):

pythonCopy code
df[df['Age'] > 30]

5. Data Cleaning

5.1 Handling Missing Data

Check for missing values:

pythonCopy code
df.isnull().sum() # Sum of null values in each column

Drop missing values:

Pandas 4
pythonCopy code
df.dropna() # Drop rows with missing values

Fill missing values:

pythonCopy code
df.fillna(value=0) # Replace NaN with 0

5.2 Renaming Columns

pythonCopy code
df.rename(columns={'Name': 'Full Name'}, inplace=True)

5.3 Changing Data Types

pythonCopy code
df['Age'] = df['Age'].astype(float) # Convert Age to float

6. Data Transformation

6.1 Adding a New Column

pythonCopy code
df['Country'] = ['USA', 'France', 'Germany', 'UK']

6.2 Removing Columns

Pandas 5
pythonCopy code
df.drop(['City'], axis=1, inplace=True) # Drop the 'City' co
lumn

6.3 Sorting Data

pythonCopy code
df.sort_values(by='Age', ascending=False) # Sort by 'Age' in
descending order

6.4 Applying Functions to Data

You can apply a function to each element in a column or row using apply() .

Apply a function to a column:

pythonCopy code
df['Age'] = df['Age'].apply(lambda x: x + 1)

7. Grouping and Aggregation

Pandas makes it easy to group data and apply aggregations to it.

7.1 Grouping Data

You can group data based on one or more columns and apply an aggregate
function (such as sum , mean , count , etc.).

Group by one column and aggregate:

pythonCopy code
grouped = df.groupby('Country')['Age'].mean()

Pandas 6
print(grouped)

Group by multiple columns:

pythonCopy code
grouped = df.groupby(['Country', 'City'])['Age'].mean()
print(grouped)

7.2 Aggregating Data

You can also apply multiple aggregation functions at once.

Multiple aggregations:

pythonCopy code
df.groupby('Country').agg({'Age': ['mean', 'sum'], 'Name': 'c
ount'})

8. Merging and Joining DataFrames

8.1 Concatenating DataFrames

You can concatenate DataFrames along a particular axis (row-wise or column-
wise).

Row-wise concatenation:

pythonCopy code
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
pd.concat([df1, df2], axis=0)

Pandas 7
Column-wise concatenation:

pythonCopy code
pd.concat([df1, df2], axis=1)

8.2 Merging DataFrames

Merge DataFrames based on a key column, similar to SQL joins.

Inner join (only matching keys):

pythonCopy code
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2,
3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5,
6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')

Left join:

pythonCopy code
pd.merge(df1, df2, on='key', how='left')

Outer join:

pythonCopy code
pd.merge(df1, df2, on='key', how='outer')

9. Input and Output

9.1 Reading from CSV

Pandas 8
Pandas can read data from a variety of file formats, with CSV being the most
common.

pythonCopy code
df = pd.read_csv('data.csv')

9.2 Writing to CSV

pythonCopy code
df.to_csv('output.csv', index=False)

9.3 Reading from Excel

pythonCopy code
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

9.4 Writing to Excel

pythonCopy code
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)

10. Time Series Data

Pandas has robust support for time series data, including date indexing,
resampling, and time-based filtering.

10.1 Creating a Date Range

pythonCopy code
dates = pd.date_range('2024-01-01', periods=6, freq='D')

Pandas 9
df = pd.DataFrame({'Date': dates, 'Value': [1, 2, 3, 4, 5,
6]})

10.2 Indexing by Date

You can set the index of the DataFrame to be a date column.

pythonCopy code
df.set_index('Date', inplace=True)

10.3 Resampling
You can resample time series data (e.g., daily to monthly).

pythonCopy code
df.resample('M').mean() # Resample to monthly data, taking t
he mean of each month

10.4 Time-based Filtering

You can filter data based on a date range.

pythonCopy code
df['2024-01-01':'2024-01-03'] # Filter rows within this date
range

11. Pivot Tables

Pandas allows you to create pivot tables, which are useful for summarizing data.

11.1 Creating a Pivot Table

Pandas 10
pythonCopy code
pivot = df.pivot_table(values='Age', index='Country', columns
='City', aggfunc='mean')
print(pivot)

12. Visualization with Pandas

Pandas integrates with Matplotlib for simple visualizations.

12.1 Line Plot

pythonCopy code
df.plot(x='Date', y='Value')

12.2 Bar Plot

pythonCopy code
df.plot(kind='bar', x='Name', y='Age')

12.3 Histogram

pythonCopy code
df['Age'].plot(kind='hist')

12.4 Scatter Plot

pythonCopy code
df.plot(kind='scatter', x='Age', y='Value')

Pandas 11
13. Advanced Topics

13.1 MultiIndex
You can work with multiple levels of indexing (hierarchical indexing) in Pandas.

Creating a MultiIndex DataFrame:

pythonCopy code
arrays = [['bar', 'bar', 'baz', 'baz'], ['one', 'two', 'one',
'two']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'se
cond'))
df = pd.DataFrame({'A': [1, 2, 3, 4]}, index=index)

Accessing data with MultiIndex:

pythonCopy code
df.loc['bar', 'one']

13.2 Handling Large Datasets

Reading large CSV files in chunks:

pythonCopy code
chunk_size = 1000
for chunk in pd.read_csv('large_file.csv', chunksize=chunk_si
ze):
process(chunk) # You can process each chunk separately

13.3 Categorical Data

Pandas 12
Categorical data is a type of data with a fixed number of possible values
(categories).

pythonCopy code
df['Gender'] = df['Gender'].astype('category')

Pandas 13

Individual Differences, Factors, Benefits of Diversity and Classroom Strategies
100% (1)
Individual Differences, Factors, Benefits of Diversity and Classroom Strategies
18 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Marine Diesel Engine
100% (1)
Marine Diesel Engine
5 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
Python Unit 4&5 Que
No ratings yet
Python Unit 4&5 Que
33 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Getting Start With Pandas
No ratings yet
Getting Start With Pandas
11 pages
Pandas
No ratings yet
Pandas
20 pages
Rural Electrification
No ratings yet
Rural Electrification
40 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Business Continuity Specialist Exam
No ratings yet
Business Continuity Specialist Exam
45 pages
Pandas
No ratings yet
Pandas
7 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Pandas Notes
No ratings yet
Pandas Notes
44 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Pandas
No ratings yet
Pandas
94 pages
JOINS
No ratings yet
JOINS
10 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Development On The Four Domain Skills of English Language by Grade 12 Contact Center Services Students Through Work Immersion
No ratings yet
Development On The Four Domain Skills of English Language by Grade 12 Contact Center Services Students Through Work Immersion
55 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas
No ratings yet
Pandas
25 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Pandas
No ratings yet
Pandas
27 pages
Pandas
No ratings yet
Pandas
4 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Unit 4
No ratings yet
Unit 4
36 pages
Mypnotes
No ratings yet
Mypnotes
3 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Lista Uf PDF
No ratings yet
Lista Uf PDF
10 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
Iso 17 (1973)
No ratings yet
Iso 17 (1973)
8 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
KET Speaking TIPS
No ratings yet
KET Speaking TIPS
3 pages
2856practical Decision Making Using Super Decisions v3 An Introduction To The Analytic Hierarchy Process 1st Edition Enrique Mu Download
No ratings yet
2856practical Decision Making Using Super Decisions v3 An Introduction To The Analytic Hierarchy Process 1st Edition Enrique Mu Download
57 pages
Chakras Book PDF
100% (17)
Chakras Book PDF
89 pages
Evaluation of Threat Models
No ratings yet
Evaluation of Threat Models
5 pages
Diseases of Potato
No ratings yet
Diseases of Potato
8 pages
Pandas
No ratings yet
Pandas
5 pages
RIL Index 12-JUN-2020
No ratings yet
RIL Index 12-JUN-2020
36 pages
Eco Chill Leaflet Final - Web - 20.02.2024
No ratings yet
Eco Chill Leaflet Final - Web - 20.02.2024
6 pages
Specimen MS - Paper 1H Edexcel Maths (A) IGCSE
No ratings yet
Specimen MS - Paper 1H Edexcel Maths (A) IGCSE
12 pages
A Simplified Method of Three Dimensional Technique For The Detection of AmpC Beta-Lactamases
No ratings yet
A Simplified Method of Three Dimensional Technique For The Detection of AmpC Beta-Lactamases
7 pages
1229 Sketching Curves c1
No ratings yet
1229 Sketching Curves c1
13 pages
Lesson 05
No ratings yet
Lesson 05
20 pages
Modernism & Postmodernism
No ratings yet
Modernism & Postmodernism
1 page
UNit1 - Database Design
No ratings yet
UNit1 - Database Design
18 pages
Check My Accounting Homework
100% (1)
Check My Accounting Homework
5 pages
Stamford - S0L2-K1 - Technical Data Sheet
No ratings yet
Stamford - S0L2-K1 - Technical Data Sheet
9 pages
Grade 3 Idioms 4
No ratings yet
Grade 3 Idioms 4
2 pages
Ge CWP PH 2 O&m Manual
No ratings yet
Ge CWP PH 2 O&m Manual
2 pages
Robb 2009 - Metalsucks
No ratings yet
Robb 2009 - Metalsucks
7 pages
WiFi Module - ESP8266 - WRL-13252 - SparkFun Electronics PDF
No ratings yet
WiFi Module - ESP8266 - WRL-13252 - SparkFun Electronics PDF
4 pages
FetalSim PS320
No ratings yet
FetalSim PS320
2 pages
Great Florida Birding Trail Map - South Section Updates
No ratings yet
Great Florida Birding Trail Map - South Section Updates
4 pages
Turbo Straight
No ratings yet
Turbo Straight
1 page
Muthish Thangam Resume1
No ratings yet
Muthish Thangam Resume1
4 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet