0% found this document useful (0 votes)
17 views15 pages

SMDM Project Business Report - Group Assignment-Copy2

This document discusses a dataset related to a marketing campaign for an auto company. It provides technical details of the dataset, performs preliminary data cleaning and analysis of the variables, explores relationships between variables through univariate and bivariate analysis, and evaluates statements made by employees regarding preferences for vehicle types.

Uploaded by

Basavaraj Ky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

SMDM Project Business Report - Group Assignment-Copy2

This document discusses a dataset related to a marketing campaign for an auto company. It provides technical details of the dataset, performs preliminary data cleaning and analysis of the variables, explores relationships between variables through univariate and bivariate analysis, and evaluates statements made by employees regarding preferences for vehicle types.

Uploaded by

Basavaraj Ky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

DSBA – JULY 2023

Submitted by:
Sahil Shreshtha
SMDM
PROJEC
Basavaraj K Y
Nikhil Upadyaya
Lalit Soundar Venkataraman
Mittal Shah

T
Ravi Kumar Ethiraju

Problem 1 : Austo Motor Company


BUSINES
Austo Motor Company is a leading car manufacturer specializing in
SUV, Sedan, and Hatchback models. In its recent board meeting, the
members raised concerns about the efficiency of the marketing S
campaign currently being used. The board decides to rope in
analytics professional to improve the existing campaign.
REPORT
1A. What is the important technical information about the dataset that a database administrator would be
interested in?
 We have a 1581 business and working professions data as rows with 14 features
(Including both independent and dependent variables)
 The data set has 1 float64 value, 5 integer values and 8 objects.

If we look at some basic information about the data out of the 14 variables there are 6 numerical and 8
categorical variables Also, there are a few null values in the Gender and Partner_salary variables.

1B. Take a critical look at the data and do a preliminary analysis of the variables. Do a quality check of the data
so that the variables are consistent. Are there any discrepancies present in the data? If yes, perform preliminary
treatment of data.

Preliminary data analysis

From the above table we found that there are null values in Gender and Partner_salary variables
In Gender there are 53 nulls and
Partner_salary there are 106 nulls
Now to fill in the missing data or the nulls -
For Gender we can use the majority of the 2 outputs to fill in the nulls
In this case the nulls are imputed with ‘Male’ since there are in majority
For Partner_salary,
We are using conditional imputation since there are other variables related to salary-
Salary + Partner_salary = Total_salary
The condition is that if the Partner_working is YES then the
Partner_salary = Total_salary–Salary
If the Partner_working is NO the Partner_salary = 0
There were two spelling error found in the column “Gender” – ‘Femal & Femle’
This spelling error for both the error has been corrected and replaced
Null value (Total – 53) is replaced with Mode of the ‘Gender’ column
Replaced NaN value with – Yes in ‘Partner_salary’ with below calculation

Replacing the missing values in Gender and Partner_salary. Now the dataset is consistent and free from null
values and the discrepancies are resolved. Hereby the data cleaning is done and the dataset is good to
proceed with data analysis

Numerical Description of Data


1C. Explore all the features of the data separately by using appropriate visualizations and draw insights that can
be utilized by the business.

Univariate Analysis
 Analyzing the car buyers on Age variable

Observation: Younger age group (Range 20- 30) tends to buy more cars as compared to the middle aged
(Range 31-45) and older age group (range from 46-55). Also there is fluctuation in buying pattern for the
age group between 35-40.

 Analyzing the car buyers on Price variable

Observation: The number of cars purchased is higher in the lower price range compared to the costlier
cars.
 Analyzing the salary distribution of the car buyers
Observation: The highest number of cars purchased occurs within the salary range of 50000 to 70000.

 Analyzing the car buyers from their partner salary

Observation: Most car buyers have partners whose salaries are below 10,000, and the next most
frequent salary range falls between 30,000 and 40,000.

 Analyzing the car buyers from their Total salary

Observation: The majority of car buyers have total income range of 60000 to 100000.
Also there is a decline in car buyers within the salary range of 100000 to 160000.

 Analyzing of cars for each unique make


Count plot

Observation: From the above graph, we can conclude that sedan type cars are buying more than the
hatchback and SUV cars.

 Analyzing the car buyers by Gender

Observation: From the above graph, we can say that the male car buyers are higher than the female
buyers.
 Number of car buyers based on their education level

Observation: Post graduates have shown more interest in purchasing cars compared to graduates.

 Number of car buyers by Marital status


Observation: Married males are more interested in buying cars more than female in any category.

 Number of car buyers by Profession and Gender

Observation: The Salaried profession in male category has more numbers in buying cars than
comparing with other categories.

 Analyzing the car buyers on number of dependents

Observation: We can see that the data of dependents is bimodal i.e. there are two modes in the number
of dependents data in the dataset i.e. 2 and 3.

 Average salary of car buyers based on their loan status


Observation: From the above graph, we can understand that almost 50% of car buyers having personal
loan, and the car buyers with House loan is quite lesser.

1D. Understanding the relationships among the variables in the dataset is crucial for every analytical project.
Perform analysis on the data fields to gain deeper insights. Comment on your understanding of the data.

For understanding the relationship between the variables, we need to do bivariate analysis, to better
understand the dataset.

 Relationship between the level of education and type of car

Observation: Post graduates have shown more interest in purchasing cars compared to graduates.

 Relationship between car buyers Profession and the car type

Observation: By above graph we can see that majority of population in the dataset prefer to have
Sedan, in both Business and Salaried class.
 Relationship between car buyers marital status and car type
Observation: By above graph we can see that bar of Make and Martial status, there is a higher
preference for Sedan overall.

 Relationship between working partner and type of car

Observation: By above graph we can see that in general, the preference for Sedan is on higher side,
whether the partner is working or not.

 Relationship between House loan and type of car

Observation: By above graph we can see that the proportion of customers availing House loan are more
than 50% prefer Sedan, followed by Hatchback and SUV. While, the proportion of customers not availing
house loan are more than 41% prefer Sedan, followed by Hatchback and SUV.
 Relationship between Salary and Type of car
Observation: By above graph we can see that average salary of the customers who prefer SUV is greater than
Sedan and Hatchback. Which indirectly also implies that SUV is a high range car.

 Relationship between Total Salary and type of car

Observation: By above graph we can see that average total salary of the customers who prefer SUV is greater than
Sedan and Hatchback. Which indirectly also implies that SUV is a high range car.

 Analysis using correlation and heatmap


Observation: From this data, Age has high positive correlation towards Salary, Total_salary and
Price of the vehicle.
o This means as age increases, the salary increases and also the price of the vehicle increases.
o Also, there is high correlation between the salary and total salary. Similarly, high correlation between
Partner salary and total salary, which is understandable. Between the rests of variables, either there is
very weak or negative correlation.

Multivariate analysis
Observation: From the above pair plot we can see that in most of the variables of the dataset, there is a weak or
no correlation. However, there is a correlation between the data points for variables Salary and Total Salary,
Total Salary and Age etc.

1E. Employees working on the existing marketing campaign have made the following remarks. Based on the data
and your analysis state whether you agree or disagree with their observations. Justify your answer Based on the
data available.

E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”.
Observation: From the above graph and table, we can see that the E1 statement i.e. Steve Roger saying “Men
prefer SUV by a large margin, compared to the women”, does not hold true.

E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.

Observation: From the above graph we can conclude that the statement E2 holds true.
If we compare the preference of salaried class for the type of car preferred we see that the total salaried data
comparison to SUV, Sedan and Hatchback.
Hence, the probability of owning Sedan amongst the salaried class is high.
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target for a SUV
sale over a Sedan Sale.
Observation: From the above graph we can conclude that that the statement E3 doesn’t hold true. A salaried
male is an easier target for a Sedan sale than SUV sale.

Problem 2: A physiotherapist with a male football team is interested in studying the


relationship between foot injuries and the positions at which the players play from
the data collected
Striker Forward Attacking Midfielder Winger Total

Players
Injured 45 56 24 20 145

Players Not
Injured Hatchback 582 11 9 90

Total SUV 297 35 29 235

2.1 What is the probability that a randomly chosen player would suffer an injury?

The likelihood of a randomly selected player experiencing an injury is 61.7%.

2.2 What is the probability that a player is a forward or a winger?

The probability that a player is either a forward or a winger is approximately 0.523 or 52.3%.

2.3 What is the probability that a randomly chosen player plays in a striker position and has a foot injury?

The probability that a randomly chosen player plays in a striker position and has a foot injury is approximately
0.191 or 19.1%.

2.4 What is the probability that a randomly chosen injured player is a striker?

The probability that a randomly chosen injured player is a striker is approximately 0.310 or 31%.

2.5 What is the probability that a randomly chosen injured player is either a forward or an attacking midfielder?

The probability that a randomly chosen injured player is either a forward or an attacking midfielder is
approximately 0.552 or 55.2%.

You might also like