0% found this document useful (0 votes)

4 views7 pages

Python 2.1.2

The document provides an overview of data manipulation using the Pandas library in Python, detailing its main data structures, Series and DataFrames. It covers data indexing and selection, operations on data, handling missing data, hierarchical indexing, and methods for combining datasets using concat() and append(). Key concepts include creating Series and DataFrames, performing arithmetic operations, detecting and filling missing values, and utilizing multi-level indexing.

Uploaded by

hritikp266

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views7 pages

Python 2.1.2

Uploaded by

hritikp266

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

2.

Data Manipulation with Pandas: Introducing Pandas Objects, Data Indexing and Selection,
Operating on Data in Pandas, Handling Missing Data, Hierarchical Indexing, Combining Datasets:
Concat and Append.

1. Introducing Pandas Objects

Pandas is a powerful and widely used library in Python for data manipulation and analysis. It
provides two main data structures:

1. Series: A one-dimensional labeled array, similar to a list, that can hold data of any
type (integers, strings, floats, etc.).
2. DataFrame: A two-dimensional labeled data structure, similar to a table in a
database, an Excel spreadsheet, or a dictionary of Series objects. It has both rows and
columns with labels.

Creating a Pandas Series

A Series can be created from a list, numpy array, or dictionary. Here's an example of creating
a Series from a Python list:

import pandas as pd

# Create a Series from a list

data = [10, 20, 30, 40, 50]
series = pd.Series(data)

print(series)

Output:

0 10
1 20
2 30
3 40
4 50
dtype: int64

The index is automatically assigned as integers starting from 0.

Creating a DataFrame

A DataFrame can be created from a dictionary, lists, or NumPy arrays. Here's an example of
creating a DataFrame from a dictionary:

import pandas as pd

# Create a DataFrame from a dictionary

data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [24, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

Output:

Name Age City

0 Alice 24 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago

The DataFrame has both row labels (index) and column labels (column names).

2. Data Indexing and Selection

Pandas provides multiple ways to select and index data from Series and DataFrames.

Selecting Data from a DataFrame

 Selecting a single column: You can access a column by using the column name.

# Select a single column

print(df['Name'])

Output:

0 Alice
1 Bob
2 Charlie
Name: Name, dtype: object

 Selecting multiple columns: Use a list of column names.

# Select multiple columns

print(df[['Name', 'Age']])

Output:

Name Age
0 Alice 24
1 Bob 30
2 Charlie 35

Selecting Rows by Index

You can select rows using .loc[] and .iloc[]:

 iloc[] is used for integer-location based indexing (by position).

 loc[] is used for label-based indexing.

# Selecting by position (integer-based)

print(df.iloc[1]) # Select the second row (index 1)
# Selecting by label (index-based)
print(df.loc[1]) # Select the row with index label 1

Output:

Name Bob
Age 30
City Los Angeles
Name: 1, dtype: object

3. Operating on Data in Pandas

Once you have selected data, Pandas allows you to perform a variety of operations.

Arithmetic Operations

Pandas supports arithmetic operations like addition, subtraction, multiplication, and division.
These operations can be performed element-wise on Series or DataFrames.

# Create a DataFrame
data = {'A': [10, 20, 30], 'B': [5, 15, 25]}
df = pd.DataFrame(data)

# Add 10 to each element

df = df + 10
print(df)

Output:

A B
0 20 15
1 30 25
2 40 35

Applying Functions

You can apply functions element-wise or column-wise using .apply().

# Apply a function to each column

df['A'] = df['A'].apply(lambda x: x * 2)
print(df)

Output:

A B
0 40 15
1 60 25
2 80 35

In this example, the function lambda x: x * 2 was applied to the 'A' column.
4. Handling Missing Data

Missing data is common in real-world datasets. Pandas provides powerful tools for detecting,
removing, or replacing missing data.

Detecting Missing Data

Use isnull() to detect missing values and notnull() for the opposite.

import numpy as np

# Create a DataFrame with missing data (NaN)

data = {'Name': ['Alice', 'Bob', np.nan], 'Age': [24, np.nan, 35]}
df = pd.DataFrame(data)

# Check for missing data

print(df.isnull())

Output:

Name Age
0 False False
1 False True
2 True False

Filling Missing Data

You can fill missing values using .fillna().

# Fill missing data with a default value

df_filled = df.fillna({'Name': 'Unknown', 'Age': df['Age'].mean()})
print(df_filled)

Output:

Name Age
0 Alice 24.0
1 Bob 29.5
2 Unknown 35.0

Here, missing values in the Name column are filled with 'Unknown', and missing values in the
Age column are filled with the mean of the Age column.

Dropping Missing Data

You can drop rows or columns that contain missing data using .dropna().

# Drop rows with missing data

df_dropped = df.dropna()
print(df_dropped)

Output:

Name Age
0 Alice 24.0
2 Charlie 35.0

5. Hierarchical Indexing

Hierarchical indexing allows you to have multiple levels of indexing, which can be helpful
when working with more complex data structures.

Creating a Hierarchical Index

You can create a multi-level index by passing a list of arrays to

pd.MultiIndex.from_arrays().

# Create a DataFrame with multi-level index

arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Letter', 'Number'))

df = pd.DataFrame({'Data': [10, 20, 30, 40]}, index=index)

print(df)

Output:

Data
Letter Number
A 1 10
2 20
B 1 30
2 40

Selecting Data with Multi-level Index

You can use .loc[] to access data in a multi-level index DataFrame.

# Select data for 'A' with Number 2

print(df.loc[('A', 2)])

Output:

Data 20
Name: (A, 2), dtype: int64

6. Combining Datasets: Concat and Append

Pandas provides functions like concat() and append() to combine data from different
DataFrames.

Using concat() to Combine DataFrames

The concat() function can concatenate DataFrames along rows or columns.

# Concatenate DataFrames along rows

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

df_combined = pd.concat([df1, df2], ignore_index=True)

print(df_combined)

Output:

A B
0 1 3
1 2 4
2 5 7
3 6 8

Using append() to Add Rows to DataFrame

The append() function is another way to add rows to a DataFrame. However, concat() is
generally more efficient and flexible.

# Append rows to a DataFrame

df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})
df_appended = df1.append(df3, ignore_index=True)
print(df_appended)

Output:

A B
0 1 3
1 2 4
2 9 11
3 10 12

Summary of Key Concepts:

1. Pandas Objects: Series and DataFrames are the primary data structures.
2. Data Indexing and Selection: Pandas allows easy indexing and selection of data
using labels and positions.
3. Operating on Data: Element-wise operations and functions can be applied to Series
and DataFrames.
4. Handling Missing Data: Missing data can be detected, filled, or dropped.
5. Hierarchical Indexing: Pandas supports multi-level indexes to handle complex data.
6. Combining Datasets: Pandas provides concat() and append() to combine multiple
DataFrames.

Questions:

1. What are the two main data structures in Pandas, and how do they differ? types of
data.
2. How can you fill missing values in a Pandas DataFrame with a default value or a
calculated value (like the mean)?
3. What is hierarchical indexing in Pandas, and how is it useful?
4. How do you access data from a multi-level indexed DataFrame in Pandas?
5. What is the difference between the concat() and append() functions in Pandas?
6. How do you concatenate DataFrames along rows using concat() in Pandas?
7. Explain how to add rows to an existing DataFrame using the append() function in
Pandas.

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
How To Use VLOOKUP in Excel For Dummies (2022 Tutorial)
No ratings yet
How To Use VLOOKUP in Excel For Dummies (2022 Tutorial)
9 pages
Higher Nationals in Computing: WEBG301: WEB Project Assignment
No ratings yet
Higher Nationals in Computing: WEBG301: WEB Project Assignment
62 pages
Training Report
86% (7)
Training Report
88 pages
Define Technical Settings For All Involved Systems: Prerequisites
No ratings yet
Define Technical Settings For All Involved Systems: Prerequisites
2 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Pandas
No ratings yet
Pandas
7 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Python Unit 3 4
No ratings yet
Python Unit 3 4
92 pages
Pandas
No ratings yet
Pandas
26 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas
No ratings yet
Pandas
27 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Eda Unit 2
No ratings yet
Eda Unit 2
65 pages
Unit 3
No ratings yet
Unit 3
10 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Exp 6
No ratings yet
Exp 6
9 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Pandas
No ratings yet
Pandas
4 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
Unit 4
No ratings yet
Unit 4
36 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Pandas
No ratings yet
Pandas
63 pages
Python Unit Iv - Pandas
No ratings yet
Python Unit Iv - Pandas
36 pages
Pandas
No ratings yet
Pandas
94 pages
Data Frames
No ratings yet
Data Frames
60 pages
Python Pandas Dataframe: Parameter & Description
No ratings yet
Python Pandas Dataframe: Parameter & Description
12 pages
Getting Start With Pandas
No ratings yet
Getting Start With Pandas
11 pages
EDA Unit2
No ratings yet
EDA Unit2
99 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Lab 9
No ratings yet
Lab 9
9 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Pandas
No ratings yet
Pandas
13 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Python Pandas Presentation
No ratings yet
Python Pandas Presentation
32 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Pandas
No ratings yet
Pandas
5 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Session2-DM Using Pandas
No ratings yet
Session2-DM Using Pandas
51 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
JOINS
No ratings yet
JOINS
10 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
SQL - Quick Guide
No ratings yet
SQL - Quick Guide
163 pages
Sa 3
No ratings yet
Sa 3
3 pages
9-MongoDB Limitations
No ratings yet
9-MongoDB Limitations
6 pages
Unit 9 Library and Information Networks and Consortia: 9.0 Objectives
No ratings yet
Unit 9 Library and Information Networks and Consortia: 9.0 Objectives
20 pages
The Operational Data Store - Tactical Analysis at Your Fingertips
86% (7)
The Operational Data Store - Tactical Analysis at Your Fingertips
64 pages
SPDD and SPAU
No ratings yet
SPDD and SPAU
5 pages
ITS 204 Midterm Exam
No ratings yet
ITS 204 Midterm Exam
16 pages
Micro Project Report: (Your Guide Name)
No ratings yet
Micro Project Report: (Your Guide Name)
16 pages
En Options Current GIM Book
No ratings yet
En Options Current GIM Book
120 pages
MySQL - Learn Data Analytics Together's Group
No ratings yet
MySQL - Learn Data Analytics Together's Group
96 pages
Library Management System
No ratings yet
Library Management System
3 pages
Resume AWS1
No ratings yet
Resume AWS1
2 pages
Terminal Examinationspring2021: Only For Teacher'S Use: Q. No. Marks Obtained 1 2 3
No ratings yet
Terminal Examinationspring2021: Only For Teacher'S Use: Q. No. Marks Obtained 1 2 3
13 pages
T SQL
No ratings yet
T SQL
8 pages
Drishti
No ratings yet
Drishti
12 pages
I.P. College, Campus-Ii, Bulandshahr: "BCA Previous Year Paper Management"
No ratings yet
I.P. College, Campus-Ii, Bulandshahr: "BCA Previous Year Paper Management"
23 pages
NATCAT Team TORS - Compressed
No ratings yet
NATCAT Team TORS - Compressed
14 pages
SQL Capstone Project
No ratings yet
SQL Capstone Project
12 pages
Quiz6 Solution PDF
No ratings yet
Quiz6 Solution PDF
3 pages
PL-SQL Questions and Answers
No ratings yet
PL-SQL Questions and Answers
10 pages
DBA Notes
67% (3)
DBA Notes
102 pages
Read Smart Card Chip Data With APDU Commands ISO 7816
0% (1)
Read Smart Card Chip Data With APDU Commands ISO 7816
1 page
Create A User, Grant Permission and Alter Its Password
No ratings yet
Create A User, Grant Permission and Alter Its Password
28 pages
ITP4903 Laboratory 8 (v2.1 - LWL) - Answer Sheet
No ratings yet
ITP4903 Laboratory 8 (v2.1 - LWL) - Answer Sheet
4 pages
Advanced Programming Using Visual Basic 2008 4th Edition by Julia Case Bradley, Anita Millspaugh ISBN 0073517224 9780073517223
100% (7)
Advanced Programming Using Visual Basic 2008 4th Edition by Julia Case Bradley, Anita Millspaugh ISBN 0073517224 9780073517223
81 pages
Nagaramesh Puligeti
No ratings yet
Nagaramesh Puligeti
2 pages

Python 2.1.2

Uploaded by

Python 2.1.2

Uploaded by

2.

1. Introducing Pandas Objects

Creating a Pandas Series

# Create a Series from a list

The index is automatically assigned as integers starting from 0.

# Create a DataFrame from a dictionary

Name Age City

2. Data Indexing and Selection

Selecting Data from a DataFrame

# Select a single column

 Selecting multiple columns: Use a list of column names.

# Select multiple columns

Selecting Rows by Index

You can select rows using .loc[] and .iloc[]:

 iloc[] is used for integer-location based indexing (by position).

# Selecting by position (integer-based)

3. Operating on Data in Pandas

# Add 10 to each element

You can apply functions element-wise or column-wise using .apply().

# Apply a function to each column

Detecting Missing Data

# Create a DataFrame with missing data (NaN)

# Check for missing data

Filling Missing Data

You can fill missing values using .fillna().

# Fill missing data with a default value

Dropping Missing Data

# Drop rows with missing data

Creating a Hierarchical Index

You can create a multi-level index by passing a list of arrays to

# Create a DataFrame with multi-level index

df = pd.DataFrame({'Data': [10, 20, 30, 40]}, index=index)

Selecting Data with Multi-level Index

You can use .loc[] to access data in a multi-level index DataFrame.

# Select data for 'A' with Number 2

6. Combining Datasets: Concat and Append

Using concat() to Combine DataFrames

The concat() function can concatenate DataFrames along rows or columns.

# Concatenate DataFrames along rows

df_combined = pd.concat([df1, df2], ignore_index=True)

Using append() to Add Rows to DataFrame

# Append rows to a DataFrame

Summary of Key Concepts:

You might also like