SMDM Project Business Report - Group Assignment-Copy2
SMDM Project Business Report - Group Assignment-Copy2
Submitted by:
Sahil Shreshtha
SMDM
PROJEC
Basavaraj K Y
Nikhil Upadyaya
Lalit Soundar Venkataraman
Mittal Shah
T
Ravi Kumar Ethiraju
If we look at some basic information about the data out of the 14 variables there are 6 numerical and 8
categorical variables Also, there are a few null values in the Gender and Partner_salary variables.
1B. Take a critical look at the data and do a preliminary analysis of the variables. Do a quality check of the data
so that the variables are consistent. Are there any discrepancies present in the data? If yes, perform preliminary
treatment of data.
From the above table we found that there are null values in Gender and Partner_salary variables
In Gender there are 53 nulls and
Partner_salary there are 106 nulls
Now to fill in the missing data or the nulls -
For Gender we can use the majority of the 2 outputs to fill in the nulls
In this case the nulls are imputed with ‘Male’ since there are in majority
For Partner_salary,
We are using conditional imputation since there are other variables related to salary-
Salary + Partner_salary = Total_salary
The condition is that if the Partner_working is YES then the
Partner_salary = Total_salary–Salary
If the Partner_working is NO the Partner_salary = 0
There were two spelling error found in the column “Gender” – ‘Femal & Femle’
This spelling error for both the error has been corrected and replaced
Null value (Total – 53) is replaced with Mode of the ‘Gender’ column
Replaced NaN value with – Yes in ‘Partner_salary’ with below calculation
Replacing the missing values in Gender and Partner_salary. Now the dataset is consistent and free from null
values and the discrepancies are resolved. Hereby the data cleaning is done and the dataset is good to
proceed with data analysis
Univariate Analysis
Analyzing the car buyers on Age variable
Observation: Younger age group (Range 20- 30) tends to buy more cars as compared to the middle aged
(Range 31-45) and older age group (range from 46-55). Also there is fluctuation in buying pattern for the
age group between 35-40.
Observation: The number of cars purchased is higher in the lower price range compared to the costlier
cars.
Analyzing the salary distribution of the car buyers
Observation: The highest number of cars purchased occurs within the salary range of 50000 to 70000.
Observation: Most car buyers have partners whose salaries are below 10,000, and the next most
frequent salary range falls between 30,000 and 40,000.
Observation: The majority of car buyers have total income range of 60000 to 100000.
Also there is a decline in car buyers within the salary range of 100000 to 160000.
Observation: From the above graph, we can conclude that sedan type cars are buying more than the
hatchback and SUV cars.
Observation: From the above graph, we can say that the male car buyers are higher than the female
buyers.
Number of car buyers based on their education level
Observation: Post graduates have shown more interest in purchasing cars compared to graduates.
Observation: The Salaried profession in male category has more numbers in buying cars than
comparing with other categories.
Observation: We can see that the data of dependents is bimodal i.e. there are two modes in the number
of dependents data in the dataset i.e. 2 and 3.
1D. Understanding the relationships among the variables in the dataset is crucial for every analytical project.
Perform analysis on the data fields to gain deeper insights. Comment on your understanding of the data.
For understanding the relationship between the variables, we need to do bivariate analysis, to better
understand the dataset.
Observation: Post graduates have shown more interest in purchasing cars compared to graduates.
Observation: By above graph we can see that majority of population in the dataset prefer to have
Sedan, in both Business and Salaried class.
Relationship between car buyers marital status and car type
Observation: By above graph we can see that bar of Make and Martial status, there is a higher
preference for Sedan overall.
Observation: By above graph we can see that in general, the preference for Sedan is on higher side,
whether the partner is working or not.
Observation: By above graph we can see that the proportion of customers availing House loan are more
than 50% prefer Sedan, followed by Hatchback and SUV. While, the proportion of customers not availing
house loan are more than 41% prefer Sedan, followed by Hatchback and SUV.
Relationship between Salary and Type of car
Observation: By above graph we can see that average salary of the customers who prefer SUV is greater than
Sedan and Hatchback. Which indirectly also implies that SUV is a high range car.
Observation: By above graph we can see that average total salary of the customers who prefer SUV is greater than
Sedan and Hatchback. Which indirectly also implies that SUV is a high range car.
Multivariate analysis
Observation: From the above pair plot we can see that in most of the variables of the dataset, there is a weak or
no correlation. However, there is a correlation between the data points for variables Salary and Total Salary,
Total Salary and Age etc.
1E. Employees working on the existing marketing campaign have made the following remarks. Based on the data
and your analysis state whether you agree or disagree with their observations. Justify your answer Based on the
data available.
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”.
Observation: From the above graph and table, we can see that the E1 statement i.e. Steve Roger saying “Men
prefer SUV by a large margin, compared to the women”, does not hold true.
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
Observation: From the above graph we can conclude that the statement E2 holds true.
If we compare the preference of salaried class for the type of car preferred we see that the total salaried data
comparison to SUV, Sedan and Hatchback.
Hence, the probability of owning Sedan amongst the salaried class is high.
E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target for a SUV
sale over a Sedan Sale.
Observation: From the above graph we can conclude that that the statement E3 doesn’t hold true. A salaried
male is an easier target for a Sedan sale than SUV sale.
Players
Injured 45 56 24 20 145
Players Not
Injured Hatchback 582 11 9 90
2.1 What is the probability that a randomly chosen player would suffer an injury?
The probability that a player is either a forward or a winger is approximately 0.523 or 52.3%.
2.3 What is the probability that a randomly chosen player plays in a striker position and has a foot injury?
The probability that a randomly chosen player plays in a striker position and has a foot injury is approximately
0.191 or 19.1%.
2.4 What is the probability that a randomly chosen injured player is a striker?
The probability that a randomly chosen injured player is a striker is approximately 0.310 or 31%.
2.5 What is the probability that a randomly chosen injured player is either a forward or an attacking midfielder?
The probability that a randomly chosen injured player is either a forward or an attacking midfielder is
approximately 0.552 or 55.2%.