100% found this document useful (1 vote)

474 views25 pages

Detail Project Report SMDM

Uploaded by

Deepak Padiyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

474 views25 pages

Detail Project Report SMDM

Uploaded by

Deepak Padiyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Graded Project SMDM

Problem 1

A. What is the important technical information about the dataset that a database
administrator would be interested in? (Hint: Information about the size of the dataset and the
nature of the variables)

Comments:
Data Frame Provided ----- Austo Motor Company

 Total Rows = 1581

 Total Column= 14
 Float Type Data = 1
 Integer Type Data = 5
 Object Type Data = 8
 Total Memory used = 173 KB

Jupyter Snap shot

The above table reveals that the dataset has in all 14 columns and 1581 rows. The
information above also reflects that there are missing values in few of the
column items such as Gender and Partner salary
 Column ‘Gender’ have 53 null values whereas column ‘ Partner_salary’ have 106 null values

B. Take a critical look at the data and do a preliminary analysis of the variables. Do a quality
check of the data so that the variables are consistent? Are there any discrepancies present
in the data?

 Statistical analysis of data as below

 Column ‘Gender’ have 53 null values whereas column ‘ Partner_salary’ have 106
null values
 There are no duplicate rows has been found

 Below unique number has been identified in column “Gender”.

 There were two spelling error found in the column “Gender” – ‘Femal & Femle’

This spelling error for both the error has been corrected and replaced , below is the
result of the correction in the data value

 After checking mode of the ‘Gender’ data is ‘Male’ as per below plot.

 Null value (Total – 53 ) is replaced with Mode of the ‘Gender’ column

 Replaced NaN value with – Yes in ‘Partner_salary’ with below calculation

 Now there are no null value in the data set after treatment of ‘Gender’ Column and
‘Partner_salary’ column as per above table

Imputation for missing data

Since we are aware that there are missing values in Partner_salary and Gender, we need to
do computation to make the data consistent.
There are techniques for treating missing values in the data set, and which is dependent on
the type of data dealt with. They are as below:
 Drop the missing values: In this case, the missing values are dropped from those variables. In case
there are very few missing values you can drop those values.
 Impute with mean value: For numerical column, the missing values can be imputed with
mean values. Before replacing with mean value, it is advisable to check that the variable
Shouldn’t have extreme values i.e. outliers.
 Impute with median value: For numerical column, you can also replace the missing values
with median values. In case you have extreme values such as outliers it is advisable to use
median approach.
 Impute with mode value: For categorical column, you can replace the missing values with
mode values i.e the frequent ones.

 Checking outliers in the data as per below boxplots

Age boxplot
Salary boxplot

No of Dependents Bloxpot

Price Boxplot
Total Salary Boxplot

Partner Salary Boxplot

 There are outliers in the ‘No_of Dependants’ column as well as ‘Total_salary’ as per above boxplot.
 I will proceed to treat the outliers for the "Total_Salary" only because there is probability of having ‘o’
 dependent value and treating dependent could led to misled the analysis.
 Also Taking mean for the Total salary in order to avoid creating any manipulative analysis and mean
will provide us overall correct representation of data.

Mean of the Total_salary is 79625.99620493359

Treating outliers (Total_salary)

Upper range = 149000

Lower Range = 7400

Q1 = 25%

Q3= 75%

Formula to be used =
IQR=Q3-Q1

lower range= Q1-(1.5 * IQR)

upper range= Q3+(1.5 * IQR

As we can see from the above plot that outliers has been treated, now there is no outliers for the
Total salary.

C. Explore all the features of the data separately by using appropriate

visualizations and draw insights that can be utilized by the business.

Comments:

 Statistical analysis of the data which helps to summarize the data for numeric value .
 Analyzing the Age Variable

With ref to above boxplot for the stated company (Austo Motor Company), we can say that buying
pattern for the car is w.r.t age group . Younger age group (Range 20- 30) tends to buy more cars as
compared to the middle aged (Range 31-45) and older age group (range from 46-55). Also there is
fluctuation in buying pattern for the age group between 35-40 , sales for the cars between this age
group is slightly better after young age group and compared to rest of the age group.

 Analyzing on No of Dependents

If we look at the histogram of number of dependents in the dataset, we can see that the dat
a of dependents is bimodal i.e. there are two modes in the number of dependents data in th
e dataset i.e. 3 and 2.
Also, have tried to plot the boxplot in order to better understand the number of dependen
ts data.We can see that there is no median line displayed in the box plot, that is because, 2
5% and 50% of allobservations had the same values i.e., 2, as a result, the median and low
er quartile are overlapping.In addition, from the above box plot, we can also infer that the
re is an outlier in the column no ofdependents, which may or may not be treated, dependi
ng on business context

 Analyzing the Salary Variable

From the above graph, we can see that the histogram for salary is right skewed.

In addition, the box plot of the salary variable does reconfirm the fact that the salary data i
s right skewed and there are no outliers in the data.
 Analyzing the Partner_salary Variable

From the above histogram plot, we can see that the partner salary data is right skewed.

 Analyzing the Total_salary Variable

From the above histogram, we can see that the total salary dataset is slightly right skewed.

Post outlier treatment, we can see that the boxplot doesn’t showcase any outliers. And th
e median is 80,000. However, if we take a closer look, we can see that boxplot also convey
s the message that the total salary data is slightly right skewed.
 Analyzing the Price Variable

From the above graph, we can see that the Prices of cars are right skewed.

 Analyzing the Gender Category Variable

From the above graph we can see that majority of the observations in Gender category bel
ong to Male, which stands at 1252 (comprising 79.2% of the gender data) and Female num
ber stands at 329 (20.8% of the gender data).

 Analyzing the Profession Category Variable

From the above countplot graph, we can see that Salaried customers are more than Business
customers in the dataset. In the dataset given, there are 896 salaried and 685 business
customers

 Analyzing the Martial_status Category Variable

From the above countplot graph, it can be seen that Married status outnumber the single st
atus customers in the dataset.

 Analyzing the Education Category Variable

From the above countplot graph, we can see that Post Graduate customers are more than
Graduate customers in the dataset. In the dataset given, there are 985 Post Graduates and
596 Graduates.

 Analyzing the Personal Loan Category Variable

From the below countplot, we can see that the number of customers without personal loan and with
personal loan are more or less the same. However, there is a slight variation in the numbers. The
number of customers with personal loan stand at 792 and customers without personal loan stand
at 789.
 Analyzing the House Loan Category Variable

From the above graph, we can see that numbers of customers not availing the house loan (
1054) are more than the number of customers availing the house loan (527).

 Analyzing the Partner Working Category Variable

From the above countplot, we can see that Partners working outnumber the partners not
working in the dataset. The number of Partners working stand at 868 (55% of the data), w
hile partners not working are 713 in number (45% of the data).
 Analyzing the Make Category Variable

From the above graph, we can see that preference for Sedan (702) amongst the customers
is high, followed by Hatchback (582) and thereafter SUV (297) in the given dataset.

Conculsion : All the features of the data (both categorical and numeric) can be analysed
separately by Univariate analysis.

D. Understanding the relationships among the variables in the dataset is

crucial for every analytical project. Perform analysis on the data fields to gain
deeper insights. Comment on your understanding of the data.

For understanding the relationship between the variables, we need to do bivariate analysis,
to better understand the dataset

BIVARIATE ANALYSIS:

A. Relationship between the level of education and type of car

By above graph, the level of education does not have significant impact on the type of car
possessed by the individual.

2) Relationship between profession and type of car

By above graph we can see that majority of population in the dataset prefer to have

Sedan, in both Business and Salaried class.

3) Relationship between marital status and type of car

By above graph we can see that bar of Make and Martial status there is a higher preference for
Sedan overall.

4) Relationship between working partner and type of car

From the above graph and corresponding cross tab, we can see that in general, the preference for
Sedan is on higher side, whether the partner is working or not.

5. Relationship between House loan and type of car

From the above graph, we can see that of the proportion of customers availing House
loan , more than 50% prefer Sedan , followed by Hatchback and SUV. While, of the proportion of
customers not availing house loan, more than 41% prefer Sedan, followed by Hatchback and SUV.
6. Relationship between Salary and Type of car

From the above bar plot, we can see that average salary of the customers who prefer SUV is great
er than Sedan and Hatchback. Which indirectly also implies that SUV is a high range car.

7. Relationship between Total Salary and type of car

From the above bar plot, we can see that average total salary of the customers (which includes the
ir partner salary also) who prefer SUV is greater than Sedan and Hatchback. Which indirectly also
implies that SUV is a high range car.

8.Relationship between age of customers and type of car purchased

From the above bar plot, we can see that average age of customers who buy SUV is greatest,
followed by average age of customers purchasing Sedan and thereafter Hatchback.

9) Bivariate Analysis using Pairplot (pls refer jupyter file as full image could not be captured):

The pair plot displays the relationships between two variables in the dataset. From the above pair pl
ot we can see that in most of the variables of the dataset, there is a
weak or no correlation. However, there is a correlation between the data points for variables
Salary and Total Salary, Total Salary and Age etc. The diagonal graphs all refer to the same
variable on both x and y axis. The graphs displayed above the diagonal graphs are mirror
image of the graphs below the diagonal graphs. The extent of correlation however, doesn’t
get depicted in the pair plot, for which correlation function need to be applied.

10. Bivariate Analysis using Correlation and Heatmap:

For doing the multivariate analysis, correlation was run and heatmap plotted, in order to understand
the extent of correlation between the numeric variables. The correlation function in python, gave th
e following result:

From the below table and heatmap, we can see that there is some correlation (although weak)
between Total Salary and Age and Salary and Price, meaning thereby as the Age increases, total
salary also increases. Similar is the case for Salary and Price (of the car). We also see a very strong
correlation between the Age and Price of the car, which implies, as age
increases, the customer’s spending capacity to buy higher priced car increases. Also, there is high
correlation between the salary and total salary. Similarly, high correlation between Partner salary
and total salary, which is understandable.
Between the rest of variables, either there is very weak or negative correlation.
E. Employees working on the existing marketing campaign have made the following
remarks. Based on the data and your analysis state whether you agree or disagree
with their observations. Justify your answer Based on the data available.
E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”
To analyse the above statement, following bar plot was plotted

From the above graph and table, we can see that the E1 statement i.e. Steve Roger saying “Men
prefer SUV by a large margin, compared to the women”, does not hold true.
E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.

Data visualization for make of the car and profession

From the above graph, we can conclude that the statement E2 holds true.
If we compare the preference of salaried class for the type of car preferred we see that , of the total
salaried data comparsion to SUV, Sedan and Hatchback .

Hence, the probability of owning Sedan amongst the Salaried class is high

E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target
for a SUV sale over a Sedan Sale.

From the above graph and table analysis, we can conclude that that the statement E3 doesn’t hold
true. A salaried male is an easier target for a Sedan sale than SUV sale.
F. From the given data, comment on the amount spent on purchasing automobiles across
the following categories. Comment on how a Business can utilize the results from this
exercise. Give justification along with presenting metrics/charts used for arriving at the
conclusions.
Give justification along with presenting metrics/charts used for arriving at the conclusions.
F1) Gender

As per above graph, we can say Female has bought altogether more expensive car than
Male .

F2) Personal_loan

With reference to above plot, customer who don’t take loan buy more expensive cars than who
avails Personal loan.
G. From the current data set comment if having a working partner leads to the
purchase of a higher-priced car.

With the above graph we can concludes that it doesn’t matter if partner is working or not working
Customer will buy their preferred car, there is although a marginal difference between them
which slightly shows that customers whose partner is not working buy more expensive cars.

H. The main objective of this analysis is to devise an improved marketing strategy to send targeted
information to different groups of potential buyers present in the data. For the current analysis
use the Gender and Marital_status - fields to arrive at groups with similar purchase history.

❖ For Female customer they buys more SUV as compared to Sedan, Hatchback takes last position in
the buying preference list for Females.
Taken the number of Gender and maker

 Married customers buys more Sedan as compared to Hatchback and SUV becomes the last
choice.
 Single customer buys more Hatchback as compared to Sedan and SUV takes last position
for the choice

As per below data’s and Graph below insights can be derived

✓ There are total 1443 married and 138 Singles , henceforth there are more married
customers in the company record.
✓ Married business professional they prefer to buy Sedan followed by Hatchback and SUV
became last choice for them.

 Single business professional tends to buy more Hatchback than Sedan with very few prefers
to buy SUV.
 Salaried and Married prefers to buy more Sedan than Hatchback and SUV is the last choice
for them.
 Salaried and single prefers to buy Hatchback followed by Sedan with fewer choice for SUV

Overall, we can see that SUVs have very low popularity amongst Male segment (both single and
married). The company can further investigate and try to find out reasons behind the same in order
to improve their topline for SUVs in Male category. Similarly, Sedan seems to be the top choice for
married males and Hatchback enjoys popularity amongst the single males. Based on the information
and analysis, the company can customise and provide targeted information
regarding festive offers, deals, new launches to the identified segments for their preferred car make,
in order to boost their topline and margins.

SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
This Study Resource Was: Quiz 3
100% (1)
This Study Resource Was: Quiz 3
5 pages
SMDM Project Report - Shubham Bakshi - 07.05.2023
0% (1)
SMDM Project Report - Shubham Bakshi - 07.05.2023
23 pages
A Wholesale Distributor
100% (3)
A Wholesale Distributor
5 pages
Assembler Instructions
50% (4)
Assembler Instructions
101 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
Business Report
No ratings yet
Business Report
12 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
FRA Report
100% (1)
FRA Report
30 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Pranjal - Singh - 27.11.2022 AS Project
No ratings yet
Pranjal - Singh - 27.11.2022 AS Project
9 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Advanced Statistics Project - Jayant Chandra
No ratings yet
Advanced Statistics Project - Jayant Chandra
20 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
Capstone Project
100% (1)
Capstone Project
7 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Project Report
100% (3)
Project Report
36 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
100% (1)
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
14 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Assignment Report - Advanced Statistics
No ratings yet
Assignment Report - Advanced Statistics
12 pages
SQL Quiz Results
No ratings yet
SQL Quiz Results
17 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Graded Project AS
No ratings yet
Graded Project AS
14 pages
Lifi
100% (1)
Lifi
16 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
AS Project Report
No ratings yet
AS Project Report
22 pages
ML Quiz-2
No ratings yet
ML Quiz-2
5 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Machine Learning Project - Sapan Parikh
100% (1)
Machine Learning Project - Sapan Parikh
12 pages
Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Distance Measures
No ratings yet
Distance Measures
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
4 pages
Geometric Explanation of PCA
No ratings yet
Geometric Explanation of PCA
5 pages
Geometric Explanation of PCA
No ratings yet
Geometric Explanation of PCA
5 pages
Example of How To Use Multiple Linear Regression
No ratings yet
Example of How To Use Multiple Linear Regression
4 pages
Autoregressive Integrated Moving Average
No ratings yet
Autoregressive Integrated Moving Average
2 pages
Types of Regression
No ratings yet
Types of Regression
3 pages
Linear Regression Vs Logistic Regression
No ratings yet
Linear Regression Vs Logistic Regression
2 pages
Nonlinear VS Linear Regression
No ratings yet
Nonlinear VS Linear Regression
2 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
2 pages
Interpreting Score Plots
No ratings yet
Interpreting Score Plots
5 pages
Linear Best Fit
No ratings yet
Linear Best Fit
2 pages
Interpreting Loading Plots
No ratings yet
Interpreting Loading Plots
3 pages
How I Created The Theory of Relativity
100% (1)
How I Created The Theory of Relativity
4 pages
APMOPS (SMOPS) 2008 First Round With Answers
No ratings yet
APMOPS (SMOPS) 2008 First Round With Answers
6 pages
2D Shapes
No ratings yet
2D Shapes
62 pages
Milestone 1
No ratings yet
Milestone 1
3 pages
Notes Key Topic 1.3 Rates of Change Linear and Quadratic Functions Ap PC
No ratings yet
Notes Key Topic 1.3 Rates of Change Linear and Quadratic Functions Ap PC
2 pages
Antoine
No ratings yet
Antoine
1 page
Relative Density
No ratings yet
Relative Density
205 pages
Lynn - Intro Folding in Architecture
No ratings yet
Lynn - Intro Folding in Architecture
7 pages
Dynamic Method
No ratings yet
Dynamic Method
4 pages
Bitsat Paper 4
No ratings yet
Bitsat Paper 4
19 pages
Vector Operations 1
No ratings yet
Vector Operations 1
4 pages
Typees of Graph
No ratings yet
Typees of Graph
13 pages
IGCSE1 Test Geometry
No ratings yet
IGCSE1 Test Geometry
13 pages
Information Retrieval 8 Term Weighting A
No ratings yet
Information Retrieval 8 Term Weighting A
11 pages
Written Output 1
No ratings yet
Written Output 1
4 pages
Electrical and Electronics Measurement and Instrumentation
100% (1)
Electrical and Electronics Measurement and Instrumentation
50 pages
PH.D Presentation
No ratings yet
PH.D Presentation
16 pages
2020 Electrical Engineering Paper-1 (PCC-EE-301) : Circuit Theory Total Marks - 70 Duration:3 Hrs
No ratings yet
2020 Electrical Engineering Paper-1 (PCC-EE-301) : Circuit Theory Total Marks - 70 Duration:3 Hrs
5 pages
MAT 1100 Inequalities - 2020
No ratings yet
MAT 1100 Inequalities - 2020
15 pages
Buble Sort
No ratings yet
Buble Sort
97 pages
DDMRP Study Material
No ratings yet
DDMRP Study Material
6 pages
Thermal Physics & Circular Motion
No ratings yet
Thermal Physics & Circular Motion
2 pages
Ansys Workbench 13: Theory - Applications - Case Studies
No ratings yet
Ansys Workbench 13: Theory - Applications - Case Studies
4 pages
Matlab Exercises 2: X (0:360) Y1 Sin (X Pi/180) Y2 Cos (X Pi/180) Y3 Tan (X Pi/180)
No ratings yet
Matlab Exercises 2: X (0:360) Y1 Sin (X Pi/180) Y2 Cos (X Pi/180) Y3 Tan (X Pi/180)
2 pages
Assignment-3 (Motion in A Plane)
No ratings yet
Assignment-3 (Motion in A Plane)
7 pages
A New Discrete Element Model For Simulating A Flexible Ring Net Barrier Under Rockfall Impact Comparing With Large-Scale Physical Model Test Data
No ratings yet
A New Discrete Element Model For Simulating A Flexible Ring Net Barrier Under Rockfall Impact Comparing With Large-Scale Physical Model Test Data
12 pages
PVP Valor Spreadsheet 2 (Make A Copy To Edit)
No ratings yet
PVP Valor Spreadsheet 2 (Make A Copy To Edit)
7 pages
Econometrics Method (Ecn 417)
No ratings yet
Econometrics Method (Ecn 417)
6 pages
Starmine Sovereign Risk Model Final
No ratings yet
Starmine Sovereign Risk Model Final
12 pages

Detail Project Report SMDM

Uploaded by

Detail Project Report SMDM

Uploaded by

Graded Project SMDM

 Total Rows = 1581

Jupyter Snap shot

 Statistical analysis of data as below

 Below unique number has been identified in column “Gender”.

 Null value (Total – 53 ) is replaced with Mode of the ‘Gender’ column

Imputation for missing data

 Checking outliers in the data as per below boxplots

Partner Salary Boxplot

Mean of the Total_salary is 79625.99620493359

Treating outliers (Total_salary)

Upper range = 149000

Lower Range = 7400

lower range= Q1-(1.5 * IQR)

upper range= Q3+(1.5 * IQR

C. Explore all the features of the data separately by using appropriate

 Analyzing the Salary Variable

 Analyzing the Total_salary Variable

 Analyzing the Gender Category Variable

 Analyzing the Profession Category Variable

 Analyzing the Martial_status Category Variable

 Analyzing the Education Category Variable

 Analyzing the Personal Loan Category Variable

 Analyzing the Partner Working Category Variable

D. Understanding the relationships among the variables in the dataset is

A. Relationship between the level of education and type of car

2) Relationship between profession and type of car

Sedan, in both Business and Salaried class.

3) Relationship between marital status and type of car

4) Relationship between working partner and type of car

5. Relationship between House loan and type of car

7. Relationship between Total Salary and type of car

8.Relationship between age of customers and type of car purchased

10. Bivariate Analysis using Correlation and Heatmap:

Data visualization for make of the car and profession

As per below data’s and Graph below insights can be derived

You might also like