Lab Session 06: Perform Following Operations Using Pandas
Lab Session 06: Perform Following Operations Using Pandas
5. Why might we need to add a new column to a DataFrame, and how can we do it in Pandas?
A. We might need to add a new column to a DataFrame for several reasons, including:
1. Data Transformation: Creating new features from existing ones, such as combining two columns or
applying a mathematical operation.
2. Data Enrichment: Adding external information to the DataFrame, like adding a calculated field or
merging data from another source.
In Lab Task:
a. Creating dataframe
Code:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print("DataFrame:")
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston
b. concat()
Code:
data2 = {
'Name': ['Eve', 'Frank'],
'Age': [45, 50],
'City': ['Seattle', 'Boston']
}
df2 = pd.DataFrame(data2)
concat_df = pd.concat([df, df2], axis=0)
print("Concatenated DataFrame:")
print(concat_df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston
0 Eve 45 Seattle
1 Frank 50 Boston
c. Setting conditions
Code:
filtered_df = df[df['Age'] > 30]
print("Filtered DataFrame (Age > 30):")
print(filtered_df)
Output:
Nam Age City
2 Charlie 35 Chicago
3 David 40 Houston
Code:
df['Age_in_5_years'] = df['Age'] + 5
print("DataFrame with New Column:")
print(df)
Output:
Name Age City Age_in_5_years
0 Alice 25 New York 30
1 Bob 30 Los Angeles 35
2 Charlie 35 Chicago 40
3 David 40 Houston 45
B. Given two DataFrames, df1 and df2, how would you concatenate them vertically and horizontally?
Vertically:
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [24, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})
df_vertical = pd.concat([df1, df2], axis=0)
print(df_vertical)
Output:
Name Age
0 Alice 24
1 Bob 30
0 Charlie 35
1 David 40
Horizontally:
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [24, 30]})
df2 = pd.DataFrame({'City': ['New York', 'Los Angeles']})
df_horizontal = pd.concat([df1, df2], axis=1)
print(df_horizontal)
Output:
Name Age City
0 Alice 24 New York
1 Bob 30 Los Angeles
C. How would you filter out rows where the values in the “Age” column are greater than 25?
Code:
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], 'Age': [24, 30, 35, 40, 22]})
filtered_df = df[df['Age'] <= 25]
print(filtered_df)
Output:
Name Age City
0 Alice 24 New York
4 Eve 22 Seattle
D. If you have a DataFrame containing employee names and salaries, how would you add a new column
for a "Bonus" (10% of salary)?
Code:
df = pd.DataFrame({'Employee': ['Alice', 'Bob', 'Charlie', 'David'],
'Salary': [50000, 60000, 70000, 80000]})
df['Bonus'] = df['Salary'] * 0.10
print(df)
Output:
Employee Salary Bonus
0 Alice 50000 5000.0
1 Bob 60000 6000.0
2 Charlie 70000 7000.0
3 David 80000 8000.0
E. Explain a real-world scenario where using Pandas operations like concatenation and filtering conditions
would be beneficial.
A. In a business scenario:
Concatenation would be used to combine sales data from multiple regions (e.g., North America and
Europe).
Filtering would help you analyze high-performing products or employees. For example, filtering out
employees earning above a certain salary to calculate bonuses or analyzing products with revenue
greater than a specific threshold.
Students Signature