0% found this document useful (0 votes)
7 views3 pages

Batch 1

The document outlines three distinct analysis tasks using Python: Sales Analysis, Employee Performance Analysis, and Housing Market Analysis. Each task includes a code snippet for generating a dataset and a series of analytical steps to perform, such as calculating averages, creating plots, and identifying trends. The tasks focus on different domains, including sales revenue, employee performance scores, and housing prices.

Uploaded by

Ankit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views3 pages

Batch 1

The document outlines three distinct analysis tasks using Python: Sales Analysis, Employee Performance Analysis, and Housing Market Analysis. Each task includes a code snippet for generating a dataset and a series of analytical steps to perform, such as calculating averages, creating plots, and identifying trends. The tasks focus on different domains, including sales revenue, employee performance scores, and housing prices.

Uploaded by

Ankit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Question 1: Sales Analysis

Problem Statement:

1. Use the following Python code snippet to generate the dataset:


1 import pandas as pd
2 import numpy as np
3
4 np . random . seed (42)
5 dates = pd . date_range ( start = " 2023 -01 -01 " , end = " 2023 -12 -31
" , freq = " D " )
6 categories = [ " Electronics " , " Clothing " , " Furniture " ]
7 data = {
8 " Date " : np . tile ( dates , 1) ,
9 " Product_Category " : np . random . choice ( categories , len (
dates ) ) ,
10 " Revenue " : np . random . randint (500 , 5000 , len ( dates ) ) ,
11 " Discount " : np . random . uniform (0.05 , 0.25 , len ( dates ) )
,
12 " Units_Sold " : np . random . randint (1 , 20 , len ( dates ) ) ,
13 }
14 sales_data = pd . DataFrame ( data )
15 sales_data . to_csv ( " sales_data . csv " , index = False )
16 print ( sales_data . head () )

Listing 1: Sales Data Generation

2. Perform the following analysis:

(a) Calculate the average revenue per product category.


(b) Identify the best-performing product category in terms of
total revenue.
(c) Generate a time series plot of revenue over the year for each
product category.
(d) Create a scatter plot of Units Sold vs. Revenue, colored by
Product Category.
(e) Generate a heatmap showing the correlation between Revenue,
Units Sold, and Discount.

1
Question 2: Employee Performance Analysis
Problem Statement:

1. Use the following Python code snippet to generate the dataset:


1 import pandas as pd
2 import numpy as np
3
4 np . random . seed (42)
5 departments = [ " HR " , " Finance " , " Marketing " , " IT " ]
6 data = {
7 " Employee_ID " : np . arange (1 , 101) ,
8 " Department " : np . random . choice ( departments , 100) ,
9 " Y e a rs _ o f_ E x pe r i en c e " : np . random . randint (1 , 25 , 100) ,
10 " Salary " : np . random . randint (50000 , 200000 , 100) ,
11 " Perfo rmance _Score " : np . random . uniform (50 , 100 , 100) ,
12 }
13 employee_data = pd . DataFrame ( data )
14 employee_data . to_csv ( " employee_data . csv " , index = False )
15 print ( employee_data . head () )

Listing 2: Employee Performance Data Generation

2. Perform the following analysis:

(a) Calculate the average salary per department.


(b) Identify employees whose performance score is above 90 and
group them by department.
(c) Create a bar plot showing the average salary per department.
(d) Generate a boxplot to visualize the distribution of Performance Score
for each department.
(e) Plot a scatter plot of Years of Experience vs. Salary, and
color the points based on Performance Score ranges (e.g., 50–70,
70–90, 90+).

2
Question 3: Housing Market Analysis
Problem Statement:

1. Use the following Python code snippet to generate the dataset:


1 import pandas as pd
2 import numpy as np
3
4 np . random . seed (42)
5 locations = [ " Urban " , " Suburban " , " Rural " ]
6 data = {
7 " House_ID " : np . arange (1 , 201) ,
8 " Price " : np . random . randint (50000 , 500000 , 200) ,
9 " Bedrooms " : np . random . randint (1 , 6 , 200) ,
10 " Square_Feet " : np . random . randint (500 , 4000 , 200) ,
11 " Location " : np . random . choice ( locations , 200) ,
12 }
13 housing_data = pd . DataFrame ( data )
14 housing_data . to_csv ( " housing_data . csv " , index = False )
15 print ( housing_data . head () )

Listing 3: Housing Market Data Generation

2. Perform the following analysis:

(a) Identify and visualize outliers in Price using the IQR method.
(b) Calculate the average price for each Location.
(c) Generate a scatter plot of Price vs. Square Feet, colored by
Bedrooms.
(d) Create a correlation matrix for Price, Bedrooms, and Square Feet.
Visualize it using a heatmap.
(e) Generate a boxplot of Price by Location.

You might also like