c
c
5. Why might we need to add a new column to a DataFrame, and how can we do it in Pandas?
A. We might need to add a new column to a DataFrame to:
- Perform calculations based on existing columns
- Add new data from an external source
- Transform existing data into a new format
- Create a new feature for data analysis or modeling
To add a new column in Pandas, you can use the following methods:
1. Assign a new column: df['new_column'] = values
2. Use the assign function: df.assign(new_column=values)
3. Use the insert function: df.insert(loc, 'new_column', values)
In Lab Task:
1. Creating dataframe.
SOURCE CODE:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
print(df)
OUTPUT:
2. concat()
SOURCE CODE:
import pandas as pd
df1 = pd.DataFrame({'Name': ['John', 'Anna'],
'Age': [28, 24],
'Country': ['USA', 'UK']})
df2 = pd.DataFrame({'Name': ['Peter', 'Linda'],
'Age': [35, 32],
'Country': ['Australia',
'Germany']}) df_concat = pd.concat([df1, df2])
print(df_concat)
OUTPUT:
3. Setting conditions
SOURCE CODE:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
filtered_df = df[(df['Age'] > 25) & (df['Country'] != 'Australia')]
print(filtered_df)
OUTPUT:
2. Given two DataFrames, df1 and df2, how would you concatenate them vertically and horizontally?
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],
'City': ['New York', 'London', 'Paris', 'Tokyo',
'Sydney']} df = pd.DataFrame(data)
print(df)
# Concatenate DataFrames vertically (row-wise)
df_vertical = pd.concat([df1, df2], ignore_index=True)
print("\nVertical Concatenation:\n", df_vertical)
3. How would you filter out rows where the values in the “Age” column are greater than 25?
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],
'City': ['New York', 'London', 'Paris', 'Tokyo',
'Sydney']} df = pd.DataFrame(data)
print(df)
# Filter rows where Age > 25
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame (Age > 25):\n", filtered_df)
OUTPUT:
4. If you have a DataFrame containing employee names and salaries, how would you add a new
column for a "Bonus" (10% of salary)?
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],
5. Explain a real-world scenario where using Pandas operations like concatenation and filtering
conditions would be beneficial.
A. Customer Data Analysis for Marketing Campaigns
Suppose you're a marketing analyst at an online retail company, and you need to analyze customer data
to create targeted marketing campaigns.
You have three datasets:
1. Customer Information: Contains customer demographics, such as name, email, age, and location.
2. Purchase History: Contains customer purchase history, including product IDs, purchase dates,
and amounts.
3. Product Catalog: Contains product information, including product IDs, names, categories, and
prices. You need to:
1. Combine the customer information and purchase history datasets.
2. Filter out customers who haven't made a purchase in the last 6 months.
3. Identify customers who have purchased products from specific categories (e.g., electronics, clothing).
4. Create targeted marketing campaigns based on customer demographics and purchase
behavior. By using Pandas operations like concatenation and filtering conditions, you can:
- Efficiently analyze large datasets
- Identify specific customer segments
- Create targeted marketing campaigns
- Improve customer engagement and sales
Students Signature
(For Evaluator’s use only)
Comment of the Evaluator (if Any) Evaluator’s Observation