0% found this document useful (0 votes)
9 views

c

The document outlines a laboratory session for a Data Science course using Python and Pandas, detailing pre-lab and post-lab tasks. It includes explanations of key concepts such as DataFrames, the concat() function, and methods for filtering and adding columns. The lab tasks provide practical coding examples for creating DataFrames, concatenating them, and applying conditions to filter data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

c

The document outlines a laboratory session for a Data Science course using Python and Pandas, detailing pre-lab and post-lab tasks. It includes explanations of key concepts such as DataFrames, the concat() function, and methods for filtering and adding columns. The lab tasks provide practical coding examples for creating DataFrames, concatenating them, and applying conditions to filter data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

REGD. NO.

238W1A5449 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-


2025

Lab Session 06: Perform following operations using pandas

Date of the Session: 17/02/2025 Time of the Session:10:20AM to 1:00PM

Pre-Lab Task: Write answers before entering into lab.


Writing space for pre task :( For Student’s use only)
1. What is a DataFrame in Pandas, and how is it different from a NumPy array?
A. A Pandas DataFrame (DF) is a 2D, heterogeneous data structure with labeled axes. It's different from
a NumPy array in that:
- DF has labeled rows and columns
- DF can handle missing data and different data types per column
- DF has advanced data manipulation and analysis capabilities
NumPy arrays are ideal for numerical computations, while Pandas DataFrames are designed for data
manipulation, analysis, and visualization.

2. How can we create a DataFrame in Pandas using a dictionary?


A. DataFrame in Pandas using a dictionary by passing the dictionary to the pd.DataFrame()
constructor. Example:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
print(df)
OUTPUT:

3. What is the purpose of the concat() function in Pandas?


A. The concat() function in Pandas is used to concatenate two or more DataFrames, Series, or panels
along a particular axis. This allows you to:
- Combine data from different sources into a single DataFrame
- Merge data with different structures or indices
- Create a new DataFrame by stacking or joining existing ones
The concat() function can concatenate along either the rows (axis=0) or columns (axis=1)
of the DataFrames.

4. How can we filter rows in a DataFrame based on a condition?


A. You can filter rows in a DataFrame based on a condition using the following methods:
-Boolean Indexing: df[df['column_name'] > value]
-Query Function: df.query('column_name > value')
-Loc Function: df.loc[df['column_name'] > value]
These methods allow you to select rows where the condition is true.

LAB No.6 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO. 238W1A5449 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-
2025

5. Why might we need to add a new column to a DataFrame, and how can we do it in Pandas?
A. We might need to add a new column to a DataFrame to:
- Perform calculations based on existing columns
- Add new data from an external source
- Transform existing data into a new format
- Create a new feature for data analysis or modeling
To add a new column in Pandas, you can use the following methods:
1. Assign a new column: df['new_column'] = values
2. Use the assign function: df.assign(new_column=values)
3. Use the insert function: df.insert(loc, 'new_column', values)

In Lab Task:
1. Creating dataframe.
SOURCE CODE:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
print(df)
OUTPUT:

2. concat()
SOURCE CODE:
import pandas as pd
df1 = pd.DataFrame({'Name': ['John', 'Anna'],
'Age': [28, 24],
'Country': ['USA', 'UK']})
df2 = pd.DataFrame({'Name': ['Peter', 'Linda'],
'Age': [35, 32],
'Country': ['Australia',
'Germany']}) df_concat = pd.concat([df1, df2])
print(df_concat)
OUTPUT:

3. Setting conditions
SOURCE CODE:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
filtered_df = df[(df['Age'] > 25) & (df['Country'] != 'Australia')]

LAB No.6 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO. 238W1A5449 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-
2025

print(filtered_df)
OUTPUT:

4. Adding a new column


SOURCE CODE:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Score': [85, 90, 78, 92]}
df = pd.DataFrame(data)
df['Grade'] = df['Score'].apply(lambda x: 'A' if x >= 90 else 'B' if x >= 80 else 'C' if x >= 70 else 'D' if x
>= 60 else 'F')
print(df)
OUTPUT:

Post Lab Task:


1. Write a Python code snippet to create a Pandas DataFrame with at least three columns and five rows
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],
'City': ['New York', 'London', 'Paris', 'Tokyo',
'Sydney']} df = pd.DataFrame(data)
print(df)
OUTPUT:

2. Given two DataFrames, df1 and df2, how would you concatenate them vertically and horizontally?
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],
'City': ['New York', 'London', 'Paris', 'Tokyo',
'Sydney']} df = pd.DataFrame(data)
print(df)
# Concatenate DataFrames vertically (row-wise)
df_vertical = pd.concat([df1, df2], ignore_index=True)
print("\nVertical Concatenation:\n", df_vertical)

LAB No.6 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO. 238W1A5449 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-
2025

# Concatenate DataFrames horizontally (column-wise)


df_horizontal = pd.concat([df1, df2], axis=1) print("\
nHorizontal Concatenation:\n", df_horizontal) OUTPUT:

3. How would you filter out rows where the values in the “Age” column are greater than 25?
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],
'City': ['New York', 'London', 'Paris', 'Tokyo',
'Sydney']} df = pd.DataFrame(data)
print(df)
# Filter rows where Age > 25
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame (Age > 25):\n", filtered_df)
OUTPUT:

4. If you have a DataFrame containing employee names and salaries, how would you add a new
column for a "Bonus" (10% of salary)?
SOURCE CODE:
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 22, 28, 24],

LAB No.6 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |


REGD. NO. 238W1A5449 DATA SCIENCE USING PYTHON LABORATORY-23AI&DS4354 ACADEMIC YEAR: 2024-
2025

'City': ['New York', 'London', 'Paris', 'Tokyo',


'Sydney']} df = pd.DataFrame(data)
df['Salary']= [60000, 75000, 50000,45000,55000]
df['Bonus'] = df['Salary'] * 0.10 print("\
nDataFrame with Bonus:\n", df) OUTPUT:

5. Explain a real-world scenario where using Pandas operations like concatenation and filtering
conditions would be beneficial.
A. Customer Data Analysis for Marketing Campaigns
Suppose you're a marketing analyst at an online retail company, and you need to analyze customer data
to create targeted marketing campaigns.
You have three datasets:
1. Customer Information: Contains customer demographics, such as name, email, age, and location.
2. Purchase History: Contains customer purchase history, including product IDs, purchase dates,
and amounts.
3. Product Catalog: Contains product information, including product IDs, names, categories, and
prices. You need to:
1. Combine the customer information and purchase history datasets.
2. Filter out customers who haven't made a purchase in the last 6 months.
3. Identify customers who have purchased products from specific categories (e.g., electronics, clothing).
4. Create targeted marketing campaigns based on customer demographics and purchase
behavior. By using Pandas operations like concatenation and filtering conditions, you can:
- Efficiently analyze large datasets
- Identify specific customer segments
- Create targeted marketing campaigns
- Improve customer engagement and sales

Students Signature
(For Evaluator’s use only)
Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: out of

Signature of the Evaluator with Date:

LAB No.6 VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE Page |

You might also like