Python 2.1.2
Python 2.1.2
Data Manipulation with Pandas: Introducing Pandas Objects, Data Indexing and Selection,
Operating on Data in Pandas, Handling Missing Data, Hierarchical Indexing, Combining Datasets:
Concat and Append.
Pandas is a powerful and widely used library in Python for data manipulation and analysis. It
provides two main data structures:
1. Series: A one-dimensional labeled array, similar to a list, that can hold data of any
type (integers, strings, floats, etc.).
2. DataFrame: A two-dimensional labeled data structure, similar to a table in a
database, an Excel spreadsheet, or a dictionary of Series objects. It has both rows and
columns with labels.
A Series can be created from a list, numpy array, or dictionary. Here's an example of creating
a Series from a Python list:
import pandas as pd
print(series)
Output:
0 10
1 20
2 30
3 40
4 50
dtype: int64
Creating a DataFrame
A DataFrame can be created from a dictionary, lists, or NumPy arrays. Here's an example of
creating a DataFrame from a dictionary:
import pandas as pd
Output:
The DataFrame has both row labels (index) and column labels (column names).
Pandas provides multiple ways to select and index data from Series and DataFrames.
Selecting a single column: You can access a column by using the column name.
Output:
0 Alice
1 Bob
2 Charlie
Name: Name, dtype: object
Output:
Name Age
0 Alice 24
1 Bob 30
2 Charlie 35
Output:
Name Bob
Age 30
City Los Angeles
Name: 1, dtype: object
Once you have selected data, Pandas allows you to perform a variety of operations.
Arithmetic Operations
Pandas supports arithmetic operations like addition, subtraction, multiplication, and division.
These operations can be performed element-wise on Series or DataFrames.
# Create a DataFrame
data = {'A': [10, 20, 30], 'B': [5, 15, 25]}
df = pd.DataFrame(data)
Output:
A B
0 20 15
1 30 25
2 40 35
Applying Functions
Output:
A B
0 40 15
1 60 25
2 80 35
In this example, the function lambda x: x * 2 was applied to the 'A' column.
4. Handling Missing Data
Missing data is common in real-world datasets. Pandas provides powerful tools for detecting,
removing, or replacing missing data.
Use isnull() to detect missing values and notnull() for the opposite.
import numpy as np
Output:
Name Age
0 False False
1 False True
2 True False
Output:
Name Age
0 Alice 24.0
1 Bob 29.5
2 Unknown 35.0
Here, missing values in the Name column are filled with 'Unknown', and missing values in the
Age column are filled with the mean of the Age column.
You can drop rows or columns that contain missing data using .dropna().
Output:
Name Age
0 Alice 24.0
2 Charlie 35.0
5. Hierarchical Indexing
Hierarchical indexing allows you to have multiple levels of indexing, which can be helpful
when working with more complex data structures.
Output:
Data
Letter Number
A 1 10
2 20
B 1 30
2 40
Output:
Data 20
Name: (A, 2), dtype: int64
Pandas provides functions like concat() and append() to combine data from different
DataFrames.
Output:
A B
0 1 3
1 2 4
2 5 7
3 6 8
The append() function is another way to add rows to a DataFrame. However, concat() is
generally more efficient and flexible.
Output:
A B
0 1 3
1 2 4
2 9 11
3 10 12
1. Pandas Objects: Series and DataFrames are the primary data structures.
2. Data Indexing and Selection: Pandas allows easy indexing and selection of data
using labels and positions.
3. Operating on Data: Element-wise operations and functions can be applied to Series
and DataFrames.
4. Handling Missing Data: Missing data can be detected, filled, or dropped.
5. Hierarchical Indexing: Pandas supports multi-level indexes to handle complex data.
6. Combining Datasets: Pandas provides concat() and append() to combine multiple
DataFrames.
Questions:
1. What are the two main data structures in Pandas, and how do they differ? types of
data.
2. How can you fill missing values in a Pandas DataFrame with a default value or a
calculated value (like the mean)?
3. What is hierarchical indexing in Pandas, and how is it useful?
4. How do you access data from a multi-level indexed DataFrame in Pandas?
5. What is the difference between the concat() and append() functions in Pandas?
6. How do you concatenate DataFrames along rows using concat() in Pandas?
7. Explain how to add rows to an existing DataFrame using the append() function in
Pandas.