0% found this document useful (0 votes)

6 views53 pages

10 20241104 Data-Analysis Pandas

The document provides an overview of the Pandas library in Python, highlighting its importance for data manipulation and analysis through its data structures: Series and DataFrame. It details key features, functionalities, and practical examples of using Series for various applications, such as tracking temperatures, stock prices, and construction materials. Additionally, it introduces DataFrames as a two-dimensional data structure suitable for handling heterogeneous data.

Uploaded by

scs623170

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views53 pages

10 20241104 Data-Analysis Pandas

Uploaded by

scs623170

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Data Analysis with Pandas

Prof. Murali Krishna Gurram

Dept. of Geo-Engineering & RDT
Centre for Remote Sensing, AUCE
Andhra University, Visakhapatnam – 530 003
Dt. 04/11/2024
Data Analysis with Pandas

Data Analysis with Pandas

a. An overview of the Pandas package
b. The Pandas data structure-Series
c. The DataFrame
d. The essential basic functionality:
 Reindexing and altering labels
 Head and tail
 Binary operations
 Functional statistics
 Function application Sorting
 Indexing and selecting data
Overview of the Pandas Package in Python
What is PANDAS?
 Pandas is an open-source Python library that provides high-
performance data manipulation and analysis tools.

 Pandas is built on top of the NumPy library and is particularly

useful for working with structured data.

 The name "Pandas" is derived from "Panel Data," a term used in

econometrics to represent data collected over time for the
same individuals.
Overview of the Pandas Package in Python
Why use PANDAS?
 Data analysis often requires handling large datasets, and Pandas
makes this easier by providing functions for manipulating data,
performing statistical analysis, handling missing data, and
more.

 Pandas is well-suited for tasks that involve data cleaning, data

transformation, data visualization, and even complex analyses
like grouping and aggregation.
Overview of the Pandas Package in Python
Why use PANDAS?
 Pandas is a versatile and essential library for data analysis in
Python, providing tools for data manipulation, transformation,
aggregation, and visualization.

 It serves as the backbone for many data science and machine

learning workflows due to its flexibility and powerful
functionalities.

 Learning Pandas opens up opportunities to work efficiently with

real-world data, conduct complex analyses, and prepare data for
advanced applications like machine learning.
Overview of the Pandas Package in Python
Key Features of Pandas
 High-level data structures: Series and DataFrame.

 Easy handling of missing data.

 Powerful tools for data alignment and data manipulation.

 Flexible reshaping and pivoting of datasets.

 Time-series functionality.

 Integration with other Python libraries like Matplotlib and

Seaborn for data visualization.
The Pandas Data Structures - Series and
DataFrame
The Pandas Data Structures - Series and DataFrame
1. Introduction to Pandas Data Structures
• Overview:
– Pandas provides two main data structures that simplify
handling and analyzing structured data: Series and
DataFrame.

– Understanding these structures is essential because they are

the foundation of most data analysis tasks in Pandas.
The Pandas Data Structures - Series and DataFrame
2. Series
What is a Series?
• A Series is a one-dimensional labeled array that can hold any
data type (integers, floats, strings, etc.).

• A Series is similar to a single column in a table or an Excel

spreadsheet.

• Every element in a Series has an index label, allowing access to

values through their index.

• A Series can be created from various data types, including lists,

dictionaries, and scalar values.
The Pandas Data Structures - Series and DataFrame
2. Series
Syntax and Practical Examples
import pandas as pd

Creating a Series from a List:

data = [10, 20, 30, 40] # a list with data elements
series = pd.Series(data)
print(series)

This will display a Series with default integer indexing starting from 0.
The Pandas Data Structures - Series and DataFrame
2. Series
Creating a Series with Custom Index:
data = [10, 20, 30, 40] # a list with data elements
index = ['a', 'b', 'c', 'd'] # a list of items intended as labels in index variable
series = pd.Series(data, index=index) # index labels assigned to 'index‘ attribute
print(series)
The Series will have custom labels a, b, c, and d.

Creating a Series from a Dictionary:

data = {'a': 10, 'b': 20, 'c': 30} # a dictionary with key-value pairs
series = pd.Series(data)
print(series)
Accessing Elements in a Series:
# Access by position
print(series[0])

# Access by label
print(series['a'])
The Pandas Data Structures - Series and DataFrame
2. Series
Attributes and Methods of Series:
Attributes:
• series.index - Returns the index of the Series.
• series.values - Returns the values as a NumPy array.
• series.dtype - Shows the data type of the elements.

Methods:
• series.head(n) - Returns the first n elements.
• series.tail(n) - Returns the last n elements.
• series.sum() - Returns the sum of the Series.
• series.mean() - Calculates the mean of values in the Series.
The Pandas Data Structures - Series and DataFrame
2. Series
Attributes and Methods of Series:
Example
# Example: Summary statistics
print("Sum:", series.sum())
print("Mean:", series.mean())
print("First 2 elements:", series.head(2))
The Pandas Data Structures - Series and DataFrame
2. Series
Attributes and Methods of Series:
Example
# Example: Summary statistics
print("Sum:", series.sum())
print("Mean:", series.mean())
print("First 2 elements:", series.head(2))
The Pandas Data Structures - Series and DataFrame
2. Series
• Exercise 1: Series for Daily Temperatures
• Objective:
– Create a Series to represent daily temperatures for a week.
– Use custom indices (labels) to name each day of the week.
– Calculate and print the average temperature.
import pandas as pd
# List of daily temperatures Output
temperatures = [23, 25, 22, 26, 24, 28, 27] Daily Temperatures:
# Custom indices for each day of the week Monday 23
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', Tuesday 25
'Saturday', 'Sunday'] Wednesday 22
Thursday 26
# Create the Series Friday 24
temp_series = pd.Series(temperatures, index=days) Saturday 28
# Display the Series Sunday 27
print("Daily Temperatures:\n", temp_series) dtype: int64

# Calculate the average temperature Average Temperature for

avg_temp = temp_series.mean() the Week: 25.0
print("\nAverage Temperature for the Week:", avg_temp)
The Pandas Data Structures - Series and DataFrame
2. Series
Exercise 2: Series for Stock Prices
• Objective:
– Create a Series using a dictionary where keys represent company names, and
values represent their stock prices.
– Access a specific stock price using the company's name as the label.
import pandas as pd
# Dictionary representing stock prices of various companies Output
stock_prices = {
'Apple': 150, Stock Prices:
'Microsoft': 280, Apple 150
'Google': 2700, Microsoft 280
'Amazon': 3300, Google 2700
'Facebook': 340 Amazon 3300
}
Facebook 340
# Create the Series dtype: int64
stock_series = pd.Series(stock_prices)
Stock Price of Google:
# Display the Series
print("Stock Prices:\n", stock_series) 2700
# Access the stock price for Google
google_price = stock_series['Google']
print("\nStock Price of Google:", google_price)
The Pandas Data Structures - Series and DataFrame
2. Series
Example : Analyzing Concrete Strength Test Results
Objective: Record the compressive strength of concrete samples
tested over different days and analyze the data.
Solution:
• Record compressive strength (in MPa) of concrete samples
tested after curing for 7, 14, and 28 days.
• Use a Pandas DataFrame to store the data.
• Calculate the mean compressive strength for each curing period.
The Pandas Data Structures - Series and DataFrame
2. Series
Example : Analyzing Concrete Strength Test Results
import pandas as pd
Output
# Daily temperatures recorded during the concrete curing Daily Temperatures (°C):
period (in °C) Monday 22
temperatures = [22, 23, 25, 21, 24, 26, 23] Tuesday 23
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', Wednesday 25
'Saturday', 'Sunday'] Thursday 21
Friday 24
# Create a Series for temperatures Saturday 26
temperature_series = pd.Series(temperatures, index=days) Sunday 23
dtype: int64
# Calculate the average temperature
avg_temperature = temperature_series.mean() Average Temperature (°C):
23.428571428571427
# Identify the highest and lowest temperatures Highest Temperature on: Saturday
max_temp_day = temperature_series.idxmax() Lowest Temperature on: Thursday
min_temp_day = temperature_series.idxmin()

print("Daily Temperatures (°C):\n", temperature_series)

print("\nAverage Temperature (°C):", avg_temperature)
print("Highest Temperature on:", max_temp_day)
print("Lowest Temperature on:", min_temp_day)
The Pandas Data Structures - Series and DataFrame
2. Series
Example 2: Monitoring Construction Site Inventory
• In this example, you can track the quantity of various materials
(e.g., cement, sand, gravel, steel) available at a construction site.
This is useful for managing inventory and knowing when to
reorder materials.

Objective:
• Create a Series representing quantities of different construction
materials.
• Check the quantity of a specific material.
• Calculate the total inventory.
The Pandas Data Structures - Series and DataFrame
2. Series
Example : Monitoring Construction Site Inventory
import pandas as pd
Output
# Material quantities at the construction site Construction Site Inventory:
materials = { Cement Bags 100
'Cement Bags': 100, Sand (cubic meters) 50
'Sand (cubic meters)': 50, Gravel (cubic meters) 30
'Gravel (cubic meters)': 30, Steel (tons) 20
'Steel (tons)': 20 dtype: int64
}
Quantity of Cement Bags: 100
# Create a Series for materials Total Inventory Quantity: 200
inventory_series = pd.Series(materials)

# Access quantity of a specific material

cement_quantity = inventory_series['Cement Bags']

# Calculate total inventory in terms of distinct items

total_inventory = inventory_series.sum()

print("Construction Site Inventory:\n", inventory_series)

print("\nQuantity of Cement Bags:", cement_quantity)
print("Total Inventory Quantity:", total_inventory)
The Pandas Data Structures - Series and DataFrame
2. Series
Example 2: Monitoring Construction Site Inventory
• In this example, you can track the quantity of various materials
(e.g., cement, sand, gravel, steel) available at a construction site.
This is useful for managing inventory and knowing when to
reorder materials.

# Access quantity of a specific material

cement_quantity = inventory_series['Cement Bags']

# Calculate total inventory in terms of distinct items

total_inventory = inventory_series.sum()

print("Construction Site Inventory:\n", inventory_series)

print("\nQuantity of Cement Bags:", cement_quantity)
print("Total Inventory Quantity:", total_inventory)
The Pandas Data Structures - Series and DataFrame
2. Series
Example 3: Road Survey Traffic Volume
• In road design and traffic analysis, civil engineers often monitor
traffic flow on specific road segments. Here’s an example of
using a Series to represent the number of vehicles passing a
road segment over several days.

Objective:
• Create a Series to store traffic volume for each day of the week.
• Calculate the average traffic volume.
• Identify the day with peak traffic.
The Pandas Data Structures - Series and DataFrame
2. Series
Example : Road Survey Traffic Volume
import pandas as pd
Output
# Traffic volume (vehicles) recorded each day on a road Daily Traffic Volume:
segment Monday 1200
traffic_data = [1200, 1350, 1400, 1300, 1250, 1600, 1500] Tuesday 1350
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', Wednesday 1400
'Saturday', 'Sunday'] Thursday 1300
Friday 1250
# Create a Series for traffic volume Saturday 1600
traffic_series = pd.Series(traffic_data, index=days) Sunday 1500
dtype: int64
# Calculate average traffic volume
avg_traffic = traffic_series.mean() Average Traffic Volume:
1371.4285714285713
# Identify peak traffic day Peak Traffic Day: Saturday
peak_day = traffic_series.idxmax()

print("Daily Traffic Volume:\n", traffic_series)

print("\nAverage Traffic Volume:", avg_traffic)
print("Peak Traffic Day:", peak_day)
The Pandas Data Structures - Series and DataFrame
2. Series
Example 4: Structural Steel Beam Loads
• This example can help in analyzing the loads on different steel
beams in a structure. Each beam has a different load capacity,
and it’s essential to keep track of these values.

Objective:
• Create a Series for different beams and their load capacities (in
kN).
• Find the maximum load capacity among the beams.
• Calculate the average load capacity.
The Pandas Data Structures - Series and DataFrame
2. Series
Example 4: Structural Steel Beam Loads
import pandas as pd
Output
# Load capacity (kN) for different beams in a structure
Beam Load Capacities (kN):
beam_loads = {
Beam A 45
'Beam A': 45,
Beam B 50
'Beam B': 50,
Beam C 55
'Beam C': 55,
Beam D 60
'Beam D': 60,
Beam E 52
'Beam E': 52
dtype: int64
}
# Create a Series for beam loads Maximum Load Capacity (kN): 60
beam_series = pd.Series(beam_loads) Average Load Capacity (kN): 52.4

# Find maximum load capacity

max_load = beam_series.max()
# Calculate average load capacity
avg_load = beam_series.mean()
print("Beam Load Capacities (kN):\n", beam_series)
print("\nMaximum Load Capacity (kN):", max_load)
print("Average Load Capacity (kN):", avg_load)
The Pandas Data Structures - Series and DataFrame
3. DataFrame
What is a DataFrame?
 A DataFrame is a two-dimensional, size-mutable, and
potentially heterogeneous data structure.

 A DataFrame is similar to a table in SQL or an Excel spreadsheet

with rows and columns.

 Each column can contain data of a different type (e.g., integers,

floats, strings).

 A DataFrame has two axes: rows and columns, each with its own
label.
The Pandas Data Structures - Series and DataFrame
3. DataFrame
What is a DataFrame?
 Creating a DataFrame from a Dictionary of Lists:
# Creating a DataFrame from a Dictionary of Lists
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
The Pandas Data Structures - Series and DataFrame
3. DataFrame
What is a DataFrame?
 Creating a DataFrame from a List of Dictionaries:
# Creating a DataFrame from a List of Dictionaries
data = [
{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]
df = pd.DataFrame(data)
print(df)
The Pandas Data Structures - Series and DataFrame
3. DataFrame
What is a DataFrame?
 Creating a DataFrame from a NumPy Array:
# Creating a DataFrame from a NumPy Array
import numpy as np
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Attributes and Methods of DataFrame
Attributes:
• df.columns - Returns column labels of the DataFrame.
• df.index - Returns row labels of the DataFrame.
• df.dtypes - Shows the data types of each column.

Basic Methods:
• df.head(n) - Returns the first n rows.
• df.tail(n) - Returns the last n rows.
• df.info() - Displays a summary of the DataFrame, including
column names, non-null counts, and data types.
• df.describe() - Provides summary statistics for numerical
columns.
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Attributes and Methods of DataFrame

# Example: DataFrame Info and Summary Statistics

print(df.info())
print(df.describe())
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Data Selection and Manipulation in DataFrames
Selecting Columns:
print(df['Name']) # Select a single column
print(df[['Name', 'City']]) # Select multiple columns

Selecting Rows by Index:

print(df.loc[0]) # Select first row by label
print(df.iloc[1]) # Select second row by position
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Data Selection and Manipulation in DataFrames
Adding and Removing Columns:
# Adding a new column
df['Salary'] = [50000, 60000, 70000]

# Removing a column
df.drop(columns=['City'], inplace=True)

Filtering Data:
# Filter rows where Age > 30
print(df[df['Age'] > 30])
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Example 1: Analyzing Concrete Strength Test Results

Objective: Record the compressive strength of concrete samples

tested over different days and analyze the data.

Solution:
• Record compressive strength (in MPa) of concrete samples
tested after curing for 7, 14, and 28 days.
• Use a Pandas DataFrame to store the data.
• Calculate the mean compressive strength for each curing period.
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Example 1: Analyzing Concrete Strength Test Results
import pandas as pd
# Data: Compressive strength values (MPa) for 3 samples tested on different days
data = {
'Sample ID': ['S1', 'S2', 'S3'],
'7 Days': [18.5, 19.0, 18.0],
'14 Days': [24.0, 25.5, 24.5],
'28 Days': [32.0, 31.5, 33.0]
}
# Create DataFrame
df = pd.DataFrame(data)
# Set 'Sample ID' as the index
df.set_index('Sample ID', inplace=True)
# Calculate mean compressive strength for each testing period
mean_strength = df.mean()
print("Concrete Compressive Strength Data:\n", df)
print("\nAverage Compressive Strength (MPa) for each curing period:\n", mean_strength)
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Example 1: Analyzing Concrete Strength Test Results

Expected Output:
Concrete Compressive Strength Data:
7 Days 14 Days 28 Days
Sample ID
S1 18.5 24.0 32.0
S2 19.0 25.5 31.5
S3 18.0 24.5 33.0

Average Compressive Strength (MPa) for each curing period:

7 Days 18.500000
14 Days 24.666667
28 Days 32.166667
dtype: float64
The Pandas Data Structures - Series and DataFrame
3. DataFrame

Example 2: Structural Analysis of a Building Load

Objective: Calculate the total load on each floor of a building based

on floor area and load per square meter.

Solution:
• Create a DataFrame where each row represents a floor with its
area (in square meters) and load per square meter (in kN/m²).
• Calculate the total load on each floor and add it as a new
column.
The Pandas Data Structures - Series and DataFrame
3. DataFrame
Example 2: Structural Analysis of a Building Load
import pandas as pd

# Data: Floor area and load per square meter for each floor
data = {
'Floor': ['Ground', 'First', 'Second', 'Third', 'Fourth'],
'Area (m²)': [500, 400, 350, 300, 250],
'Load per m² (kN/m²)': [2.5, 2.2, 2.4, 2.3, 2.1]
}

# Create DataFrame
df = pd.DataFrame(data)

# Calculate total load for each floor

df['Total Load (kN)'] = df['Area (m²)'] * df['Load per m² (kN/m²)']

print("Building Load Analysis:\n", df)

The Pandas Data Structures - Series and DataFrame
3. DataFrame
Example 2: Structural Analysis of a Building Load

Expected Output:
Building Load Analysis:
Floor Area (m²) Load per m² (kN/m²) Total Load (kN)
0 Ground 500 2.5 1250.0
1 First 400 2.2 880.0
2 Second 350 2.4 840.0
3 Third 300 2.3 690.0
4 Fourth 250 2.1 525.0
The Pandas Data Structures - Series and DataFrame
4. Summary and Best Practices
Key Points:
• A Series is a one-dimensional labeled array, suitable for representing a
single column of data.

• A DataFrame is a two-dimensional, tabular data structure, useful for

representing structured datasets.

Best Practices:

• Use appropriate data types to save memory.

• Leverage indexing in Series and DataFrames for efficient data selection.

• Clean and preprocess data before performing analyses.

The Essential Basic Functionality of
Pandas
The Essential Basic Functionality of Pandas
1. Reindexing and Altering Labels
 Reindexing involves creating a new index for a DataFrame or Series,
effectively aligning it to a new structure. Reindexing is critical in reshaping or
merging data from different sources, allowing data to be aligned on a
common index before any operations. Reindexing is particularly useful when
working with time series data or merging multiple datasets with potentially
mismatched indices.

 Altering Labels is about renaming rows or columns, providing clearer and

more descriptive names for data attributes. Altering Labels improves
readability and makes data easier to work with, especially in multi-step
analyses where column names need to be standardized.
Example
# Example: Reindexing
df = df.reindex(new_index)

# Renaming columns
df = df.rename(columns={'old_name': 'new_name'})
The Essential Basic Functionality of Pandas
2. Head and Tail

 These methods (head() and tail()) display the first and last few rows of a
DataFrame, giving a quick snapshot of data. This is essential for initial data
exploration, where users can confirm if data has loaded correctly and observe
general characteristics (e.g., data types, column names, any visible patterns).

Example
# Display the first 5 rows
df.head(5)

# Display the last 5 rows

df.tail(5)
The Essential Basic Functionality of Pandas
3. Binary Operations

 Binary operations allow element-wise arithmetic between DataFrames, using

arithmetic operators like +, -, *, and /. These operations can align data on
indexes before computing.

 Binary Operations allow for element-wise arithmetic between two

DataFrames or Series. This includes addition, subtraction, multiplication, and
division, and can align data automatically by index. Binary operations are
fundamental in scenarios where multiple data sources need to be combined
or compared.
 For example, calculating growth rates, differences, or ratios between two
datasets can be achieved by aligning the indices and applying binary
operations.
Example
# Element-wise addition
result = df1 + df2
The Essential Basic Functionality of Pandas
4. Functional Statistics
 Functional statistics are built-in statistical functions that summarize data,
offering measures like mean, median, standard deviation, minimum,
maximum, count, etc.

 This functionality is foundational for exploring data distribution and central

tendencies, enabling analysts to get a quick overview of data’s quantitative
properties.

 Functional statistics provide summary statistics like mean, sum, min, max,
count, and describe(), which give insights into data distribution and central
tendencies.

Example
# Get summary statistics
df.describe()

# Calculate mean
df['column_name'].mean()
The Essential Basic Functionality of Pandas
5. Function Application
 This feature is powerful for data transformation, as it enables custom
manipulations and complex operations that go beyond built-in functions.
Examples include creating new calculated fields, cleaning data, or performing
any custom analysis needed on each element or column.

 Function application allows you to apply custom or built-in functions to rows

or columns using apply() or element-wise with applymap().

Example
# Apply function to a column
df['column_name'] = df['column_name'].apply(lambda x: x * 2)
The Essential Basic Functionality of Pandas
6. Sorting
 Sorting organizes data based on specified criteria, either by labels or values.

 Sorting by row or column labels or values within a column can be done using
sort_index() or sort_values().

 Sorting is useful in ranking data, creating ordered lists, or identifying top or

bottom records, which are essential in exploratory data analysis and
reporting.

Example
# Sort by column values
df = df.sort_values(by='column_name')
The Essential Basic Functionality of Pandas
7. Indexing and Selecting Data
 Indexing and selection facilitate subsetting specific rows, columns, or
elements.

 Selecting specific rows and columns is fundamental in data manipulation,

done using loc (label-based) and iloc (position-based) indexing.

 This functionality is crucial for working with specific parts of large datasets
without loading unnecessary data.

 Efficient indexing and selection allow focused data analysis on relevant parts
of data and can improve performance by reducing memory usage and
computation.

Example
# Sort by column values
df = df.sort_values(by='column_name')
The Essential Basic Functionality of Pandas
8. Computational Tools

 Computational tools allow efficient data manipulation using vectorized

operations and NumPy integration, which speeds up data analysis.

 This includes operations on DataFrames and Series, such as applying

arithmetic or aggregation across rows or columns.

 Vectorized operations are significantly faster than iterating through data due
to their use of low-level optimizations.

Example
# Vectorized operation
df['new_column'] = df['column1'] * df['column2']
The Essential Basic Functionality of Pandas
9. Working with Missing Data
 Handling missing data includes methods like fillna(), dropna(), and isnull() to
identify, replace, or drop missing values.

 Missing data handling is essential to prevent errors in analysis, especially in

statistical or machine learning applications where missing values can affect
model accuracy.

 Methods for dealing with NaNs include imputation (filling missing values with
substitutes like mean or median) or dropping them based on the analysis’s
needs.

Example
# Replace missing values with 0
df.fillna(0, inplace=True)

# Drop rows with missing values

df.dropna(inplace=True)
The Essential Basic Functionality of Pandas
10. Advanced Uses of Pandas for Data Analysis
 Hierarchical Indexing: A multi-level index that allows you to work with data
hierarchically, which is useful for complex data.

 This is especially useful for organizing and analyzing complex, high-

dimensional datasets, enabling analysts to group data in a hierarchical
structure.

 It’s commonly used in time-series data where multiple variables are recorded
at different times and hierarchically indexed.
Example
# Setting hierarchical index
df.set_index(['level1', 'level2'], inplace=True)

Panel Data: Pandas has deprecated the Panel class.

But multi-dimensional data can still be handled using DataFrames with hierarchical
indexing or the xarray library for more complex data.
Q&A

BAPI ACC Document Post
No ratings yet
BAPI ACC Document Post
4 pages
Verilum® 5.2: Video Display Calibration and Conformance Tracking
No ratings yet
Verilum® 5.2: Video Display Calibration and Conformance Tracking
19 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
Data Handling Using Pandas I - Series
No ratings yet
Data Handling Using Pandas I - Series
11 pages
Pandas
No ratings yet
Pandas
163 pages
Informatics Practices Class 12
No ratings yet
Informatics Practices Class 12
225 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
Pandas
No ratings yet
Pandas
82 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Python Pandas - I
No ratings yet
Python Pandas - I
32 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
Pandas 1 Series
No ratings yet
Pandas 1 Series
14 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
Data Handling Using Pandas-1 - Series Object Notes PDF
No ratings yet
Data Handling Using Pandas-1 - Series Object Notes PDF
25 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
Pandas in Py: A Detailed Overview Into Series and Dataframe Functions in Pandas
No ratings yet
Pandas in Py: A Detailed Overview Into Series and Dataframe Functions in Pandas
21 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Pandas
No ratings yet
Pandas
57 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
Pandas Notoes For XII PDF
No ratings yet
Pandas Notoes For XII PDF
12 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
Data Handling Using Pandas - I
No ratings yet
Data Handling Using Pandas - I
42 pages
PYTHON UNIT-5 Part-C
No ratings yet
PYTHON UNIT-5 Part-C
4 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
Leip 102
No ratings yet
Leip 102
36 pages
Python Pandas
No ratings yet
Python Pandas
230 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
9 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Pandas
No ratings yet
Pandas
49 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
CH 2
No ratings yet
CH 2
36 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Ip 102
No ratings yet
Ip 102
36 pages
Pandas
No ratings yet
Pandas
20 pages
Python Exp12.
No ratings yet
Python Exp12.
2 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
Study Material IP 2022
No ratings yet
Study Material IP 2022
55 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Module 6
No ratings yet
Module 6
48 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
Python Pandas Series
No ratings yet
Python Pandas Series
30 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Blazor and API Example: Classroom Quiz Application
From Everand
Blazor and API Example: Classroom Quiz Application
Taurius Litvinavicius
No ratings yet
05 20241010 ControlStructures OOP
No ratings yet
05 20241010 ControlStructures OOP
81 pages
06 20241021 LibraryModules
No ratings yet
06 20241021 LibraryModules
26 pages
04 20241004 Collections
No ratings yet
04 20241004 Collections
109 pages
GATE CE 1 2024 Question Paper
No ratings yet
GATE CE 1 2024 Question Paper
39 pages
Rapidstream: P2P Streaming On Android: Philipp M. Eittenberger, Matthias Herbst, Udo R. Krieger
No ratings yet
Rapidstream: P2P Streaming On Android: Philipp M. Eittenberger, Matthias Herbst, Udo R. Krieger
6 pages
Chapter 7. Managing Object-Oriented Software Engineering
No ratings yet
Chapter 7. Managing Object-Oriented Software Engineering
19 pages
Data Models
No ratings yet
Data Models
40 pages
Learn How To Display Surpac Data in Google Earth
No ratings yet
Learn How To Display Surpac Data in Google Earth
9 pages
Go Tools Cheat Sheet Golangbot
No ratings yet
Go Tools Cheat Sheet Golangbot
4 pages
STS Installation Instructions 2.7.1.RELEASE
No ratings yet
STS Installation Instructions 2.7.1.RELEASE
15 pages
Assignment 5 (PivotTables)
No ratings yet
Assignment 5 (PivotTables)
3 pages
13.feature Usage Card - FLP
No ratings yet
13.feature Usage Card - FLP
3 pages
Topic #4 - SIM ARM - Part 3
No ratings yet
Topic #4 - SIM ARM - Part 3
31 pages
Relecloud Presents
No ratings yet
Relecloud Presents
5 pages
Individual Assignment
No ratings yet
Individual Assignment
2 pages
Evolution of Cyber Attacks and Their Economic Impa
No ratings yet
Evolution of Cyber Attacks and Their Economic Impa
6 pages
100 Essential Resources For Hardware & Electrical Engineers Ebook
No ratings yet
100 Essential Resources For Hardware & Electrical Engineers Ebook
65 pages
Advanced Web Programming - Chapter 3
No ratings yet
Advanced Web Programming - Chapter 3
10 pages
Forcepoint Ipsec Guide: Forcepoint Web Security Cloud
No ratings yet
Forcepoint Ipsec Guide: Forcepoint Web Security Cloud
36 pages
Inheritance
No ratings yet
Inheritance
23 pages
OmniPC v4.2 User's Guide (2015)
No ratings yet
OmniPC v4.2 User's Guide (2015)
8 pages
Assignment 02 - Spring 21
No ratings yet
Assignment 02 - Spring 21
4 pages
Installation Log
No ratings yet
Installation Log
56 pages
Varun Lakhyani Tech
No ratings yet
Varun Lakhyani Tech
2 pages
Likhit Hegu
No ratings yet
Likhit Hegu
3 pages
Business Intelligence Masters Training PDF
No ratings yet
Business Intelligence Masters Training PDF
14 pages
ZTE UMTS UR15 Handover Control Feature Guide - V1.2
No ratings yet
ZTE UMTS UR15 Handover Control Feature Guide - V1.2
342 pages
Mastering Citrix® XenDesktop® - Sample Chapter
No ratings yet
Mastering Citrix® XenDesktop® - Sample Chapter
63 pages
Setup Instructions
No ratings yet
Setup Instructions
3 pages
BTCS-602 - (IOT) Internet of Things Notes
No ratings yet
BTCS-602 - (IOT) Internet of Things Notes
119 pages
Ge2155 Set 4
No ratings yet
Ge2155 Set 4
7 pages
Pharmaceutical Manufacturing Broucher
No ratings yet
Pharmaceutical Manufacturing Broucher
11 pages

10 20241104 Data-Analysis Pandas

Uploaded by

10 20241104 Data-Analysis Pandas

Uploaded by

Data Analysis with Pandas

Prof. Murali Krishna Gurram

Data Analysis with Pandas

 Pandas is built on top of the NumPy library and is particularly

 The name "Pandas" is derived from "Panel Data," a term used in

 Pandas is well-suited for tasks that involve data cleaning, data

 It serves as the backbone for many data science and machine

 Learning Pandas opens up opportunities to work efficiently with

 Easy handling of missing data.

 Powerful tools for data alignment and data manipulation.

 Flexible reshaping and pivoting of datasets.

 Integration with other Python libraries like Matplotlib and

– Understanding these structures is essential because they are

• A Series is similar to a single column in a table or an Excel

• Every element in a Series has an index label, allowing access to

• A Series can be created from various data types, including lists,

Creating a Series from a List:

Creating a Series from a Dictionary:

# Calculate the average temperature Average Temperature for

print("Daily Temperatures (°C):\n", temperature_series)

# Access quantity of a specific material

# Calculate total inventory in terms of distinct items

print("Construction Site Inventory:\n", inventory_series)

# Access quantity of a specific material

# Calculate total inventory in terms of distinct items

print("Construction Site Inventory:\n", inventory_series)

print("Daily Traffic Volume:\n", traffic_series)

# Find maximum load capacity

 A DataFrame is similar to a table in SQL or an Excel spreadsheet

 Each column can contain data of a different type (e.g., integers,

# Example: DataFrame Info and Summary Statistics

Selecting Rows by Index:

Objective: Record the compressive strength of concrete samples

Average Compressive Strength (MPa) for each curing period:

Example 2: Structural Analysis of a Building Load

Objective: Calculate the total load on each floor of a building based

# Calculate total load for each floor

print("Building Load Analysis:\n", df)

• A DataFrame is a two-dimensional, tabular data structure, useful for

• Use appropriate data types to save memory.

• Leverage indexing in Series and DataFrames for efficient data selection.

• Clean and preprocess data before performing analyses.

 Altering Labels is about renaming rows or columns, providing clearer and

# Display the last 5 rows

 Binary operations allow element-wise arithmetic between DataFrames, using

 Binary Operations allow for element-wise arithmetic between two

 This functionality is foundational for exploring data distribution and central

 Function application allows you to apply custom or built-in functions to rows

 Sorting is useful in ranking data, creating ordered lists, or identifying top or

 Selecting specific rows and columns is fundamental in data manipulation,

 Computational tools allow efficient data manipulation using vectorized

 This includes operations on DataFrames and Series, such as applying

 Missing data handling is essential to prevent errors in analysis, especially in

# Drop rows with missing values

 This is especially useful for organizing and analyzing complex, high-

Panel Data: Pandas has deprecated the Panel class.

You might also like