0% found this document useful (0 votes)
43 views23 pages

Business Project Report

The document analyzes data from Austo Motor Company to provide insights for improving their marketing strategies. It includes exploratory data analysis, descriptive statistics, and answers to business questions regarding customer preferences and purchasing behavior. The findings aim to enhance customer experience and optimize sales based on demographic factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views23 pages

Business Project Report

The document analyzes data from Austo Motor Company to provide insights for improving their marketing strategies. It includes exploratory data analysis, descriptive statistics, and answers to business questions regarding customer preferences and purchasing behavior. The findings aim to enhance customer experience and optimize sales based on demographic factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

PDS coded project

Austo Motor Company

Abstract
Exploring, Analyzing the data of Austo Motor Company and providing
insights and answers to the business questions from the company for
improving the business

Swathi Karnamadakala
[email protected]
Contents
Introduction:.......................................................................................................... 2
Exploratory Data Analysis:..................................................................................... 2
Descriptive statistics:......................................................................................... 2
Head:............................................................................................................... 2
Tail:................................................................................................................. 2
Shape:............................................................................................................. 3
Info:................................................................................................................. 3
Describe:......................................................................................................... 3
Type:............................................................................................................... 4
Data Check:........................................................................................................ 4
Isnull:............................................................................................................... 4
Duplicates:...................................................................................................... 5
Unique:............................................................................................................ 5
Unique Categorical (Percentage):....................................................................6
Missing value detection and imputation:............................................................6
Incorrect values correction:................................................................................ 7
Outlier Detection:............................................................................................... 7
Univariate Analysis:............................................................................................. 10
Histogram......................................................................................................... 10
Box Plot:........................................................................................................... 12
Bi-Variate Analysis:.............................................................................................. 14
Count Plot(Bar):................................................................................................ 14
Scatter Plot....................................................................................................... 16
Joint Plot........................................................................................................... 18
Multi-Variate Analysis:......................................................................................... 18
Business Questions:............................................................................................. 19
Do men tend to prefer SUVs more compared to women?.................................19
What is the likelihood of a salaried person buying a Sedan?............................20
What evidence or data supports Sheldon Cooper's claim that a salaried male is
an easier target for a SUV sale over a Sedan sale?..........................................20
How does the amount spent on purchasing automobiles vary by gender?......21
How much money was spent on purchasing automobiles by individuals who
took a personal loan?....................................................................................... 21
How does having a working partner influence the purchase of higher-priced
cars?................................................................................................................. 21
Actionable Insights & Recommendations.............................................................22
Introduction:
Austo Motor Company is a leading car manufacturer specializing in SUV,
Sedan, and Hatchback models. We need to analysis to improve the
efficiency of already existing marketing campaign so that they will
understand the demand of customers which will help in improving the
experience for customers.

Exploratory Data Analysis:


Importing the packages and loading given data into the data frame in
python for analysis.

Descriptive statistics:
In this part we would be providing the details on the data like description,
head, tail, shape and other basic statistics on the data.

Head:
We can obtain information about the dataset's first five rows with the use
of this function. The system's default value for this function is '5.' If
necessary, we can provide other values.

Output:
Age Gender Profession Marital_status Education No_of_Dependents Personal_loan House_loan Partner_working Salary Partner_salary Total_salary Price Make
53 Male Business Married Post Graduate 4 No No Yes 99300 70700 170000 61000 SUV
53 Femal Salaried Married Post Graduate 4 Yes No Yes 95500 70300 165800 61000 SUV
53 Female Salaried Married Post Graduate 3 No No Yes 97300 60700 158000 57000 SUV
53 Female Salaried Married Graduate 2 Yes No Yes 72500 70300 142800 61000 SUV
53 Male Salaried Married Post Graduate 3 No No Yes 79700 60200 139900 57000 SUV

Tail:
We can obtain information about the dataset's last five rows with the use
of this function. The system's default value for this function is '5.' If
necessary, we can provide other values.

Output:
Age Gender Profession Marital_status Education No_of_Dependents Personal_loan House_loan Partner_working Salary Partner_salary Total_salary Price Make
22 Male Salaried Single Graduate 2 No Yes No 33300 0 33300 27000 Hatchback
22 Male Business Married Graduate 4 No No No 32000 NaN 32000 31000 Hatchback
22 Male Business Single Graduate 2 No Yes No 32900 0 32900 30000 Hatchback
22 Male Business Married Graduate 3 Yes Yes No 32200 NaN 32200 24000 Hatchback
22 Male Salaried Married Graduate 4 No No No 31600 0 31600 31000 Hatchback
Shape:
This function will allow us to determine the number of rows and columns
of loaded data. If necessary, we can only verify the number of rows and
columns individually by specifying 0 and 1 in the function itself.

Output:

(1581, 14)

Info:
Using this function, we will be able to acquire the details regarding the
loaded data, including the count, data type, number of columns, and
column names.

Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1581 entries, 0 to 1580
Data columns (total 14 columns):
# Column Non-Null Count Dtype
0 Age 1581 non-null int64
1 Gender 1528 non-null object
2 Profession 1581 non-null object
3 Marital_status 1581 non-null object
4 Education 1581 non-null object
5 No_of_Dependents 1581 non-null int64
6 Personal_loan 1581 non-null object
7 House_loan 1581 non-null object
8 Partner_working 1581 non-null object
9 Salary 1581 non-null int64
10 Partner_salary 1475 non-null float64
11 Total_salary 1581 non-null int64
12 Price 1581 non-null int64
13 Make 1581 non-null object
dtypes: float64(1), int64(5), object(8)
memory usage: 173.1+ KB

Describe:
Using this function, we will be to know the statistical summary of the
loaded data, including mean, median,standard deviation,min,max ,count
and frequency of the data.

Output:
count unique top freq mean std min 25% 50% 75% max
Age 1581 NaN NaN NaN 31.922201 8.425978 22 25 29 38 54
Gender 1528 4 Male 1199 NaN NaN NaN NaN NaN NaN NaN
Profession 1581 2 Salaried 896 NaN NaN NaN NaN NaN NaN NaN
Marital_status 1581 2 Married 1443 NaN NaN NaN NaN NaN NaN NaN
Education 1581 2 Post Graduate 985 NaN NaN NaN NaN NaN NaN NaN
No_of_Dependents 1581 NaN NaN NaN 2.457938 0.943483 0 2 2 3 4
Personal_loan 1581 2 Yes 792 NaN NaN NaN NaN NaN NaN NaN
House_loan 1581 2 No 1054 NaN NaN NaN NaN NaN NaN NaN
Partner_working 1581 2 Yes 868 NaN NaN NaN NaN NaN NaN NaN
Salary 1581 NaN NaN NaN 60392.22011 14674.82504 30000 51900 59500 71800 99300
Partner_salary 1475 NaN NaN NaN 20225.55932 19573.14928 0 0 25600 38300 80500
Total_salary 1581 NaN NaN NaN 79625.99621 25545.85777 30000 60500 78000 95900 171000
Price 1581 NaN NaN NaN 35597.72296 13633.63655 18000 25000 31000 47000 70000
Make 1581 3 Sedan 702 NaN NaN NaN NaN NaN NaN NaN

Type:
Using this function, we will be able to know the data type of all columns of
the loaded data.

Output:

Data Check:
Isnull:
Using this function, we will be to check total number of missing values of
loaded data.

Output:
Age 0
Gender 53
Profession 0
Marital_status 0
Education 0
No_of_Dependents 0
Personal_loan 0
House_loan 0
Partner_working 0
Salary 0
Partner_salary 106
Total_salary 0
Price 0
Make 0
dtype: int64
Duplicates:
Using this function, we will be able to check if there are any duplicated
values of loaded data.

Output:
0

Unique:
Using this function, we will be able to check if there are any unique values
in the columns.

Output:
Gender
Male 1199
Female 329
Name: count, dtype: int64
--------------------------------------------------
Profession
Salaried 896
Business 685
Name: count, dtype: int64
--------------------------------------------------
Marital_status
Married 1443
Single 138
Name: count, dtype: int64
--------------------------------------------------
Education
Post Graduate 985
Graduate 596
Name: count, dtype: int64
--------------------------------------------------
Personal_loan
Yes 792
No 789
Name: count, dtype: int64
--------------------------------------------------
House_loan
No 1054
Yes 527
Name: count, dtype: int64
--------------------------------------------------
Partner_working
Yes 868
No 713
Name: count, dtype: int64
--------------------------------------------------
Make
Sedan 702
Hatchback 582
SUV 297
Name: count, dtype: int64
--------------------------------------------------
Unique Categorical (Percentage):
Using for loop and including the function to calculate the percentage we
can find out percentage of unique categorical values in the column.

Output:
Gender
Male 0.784686
Female 0.215314
Name: proportion, dtype: float64
--------------------------------------------------
Profession
Salaried 0.56673
Business 0.43327
Name: proportion, dtype: float64
--------------------------------------------------
Marital_status
Married 0.912713
Single 0.087287
Name: proportion, dtype: float64
--------------------------------------------------
Education
Post Graduate 0.623023
Graduate 0.376977
Name: proportion, dtype: float64
--------------------------------------------------
Personal_loan
Yes 0.500949
No 0.499051
Name: proportion, dtype: float64
--------------------------------------------------
House_loan
No 0.666667
Yes 0.333333
Name: proportion, dtype: float64
--------------------------------------------------
Partner_working
Yes 0.54902
No 0.45098
Name: proportion, dtype: float64
--------------------------------------------------
Make
Sedan 0.444023
Hatchback 0.368121
SUV 0.187856
Name: proportion, dtype: float64
--------------------------------------------------

Missing value detection and imputation:


In the dataset given there are few missing values on the
columns(Gender,Partner_salary,). These missing values are imputed and
treated using the mean() and transform() function.

Output:

Missing values detection output:(1)


Count Percentage
Gender 53 3.352309
Partner_salary 106 6.704617

Missing values imputation output:(2)


Count Percentage

Incorrect values correction:


In the dataset for gender column Female is misspelled twice. Replaced
those 2 values with correct spelling using replace function.

The output is given below.

Output:

Output before correction


Gender Count
Male 1199
Female 327
Femal 1
Femle 1
Name: count, dtype: int64

Output After correction


Gender Count
Male 1199
Female 329
Name: count, dtype: int64

Outlier Detection:
For salary, partner salary and price after running the outlier process we
can see that there are no outliers to impute or treat, but when it comes to
total salary we can see 1.7% in the total salary which has been treated
using the outlier treatment methodology.

Output:

Salary:-
Q1 = 51900.0
Q3 = 71800.0
IQR = 19900.0
lower_whisker = 22050.0
upper_whisker = 101650.0

0.00– Outlier Percentage


Partner_Salary:-
Q1 = 0.0
Q3 = 38000.0
IQR = 38000.0
lower_whisker = -57000.0
upper_whisker = 95000.0

0.00– Outlier Percentage

Total_Salary:-
Q1 = 60500.0
Q3 = 95900.0
IQR = 35400.0
lower_whisker = 7400.0
upper_whisker = 149000.0

1.7077798861480076– Outlier Percentage


Price:-
Q1 = 25000.0
Q3 = 47000.0
IQR = 22000.0
lower_whisker = -8000.0
upper_whisker = 80000.0

0.00– Outlier Percentage


Univariate Analysis:
Histogram
For salary attribute keeping salary in x-axis we create a histogram using
hist plot function. The distribution of salary is normal. There are no
outliers.

Output:

Salary:-

Price:-

For price attribute keeping price in x-axis we create a histogram using hist
plot function. The data distribution for price is right skewed. There are no
outliers.
Partner_salary:-

For partner salary attribute keeping price in x-axis we create a histogram


using hist plot function. The data distribution for partner salary is right
skewed and inference is most of the partners are not working. There are
no outliers.

Salary & Gender:-

Keeping salary in x-axis and hue as gender we create a histogram using


hist plot function. In this plot we can also infer that KDE(Kernel
distribution) is also shown. The data distribution of Gender and salary is
normal. There are no outliers.

Since we have given hue as gender, Male attribute is shown in blue and
Female attribute is shown in orange colour.
Box Plot:
Gender & Salary:-

Keeping Gender in the x-axis, Salary in y-axis we have created a box plot
using box plot function. From the plot we can infer that male salary plot
has outliers. When we see the data for salary column there are no outliers.
Make & Salary: -

Keeping Make in x-axis and Salary in y-axis we create a box plot using box
plot function.

From the plot we can infer that buying an SUV is more common among
men and women when the salary range is more than 60000. If the salary
is less than 60000 then the preference shifts to either Sedan/Hatchback.

Partner_working & Price (Hue=Marital Status): -

Keeping Partner working in x-axis and price in y-axis and hue as marital
status we created a box plot using box plot function. From the plot we can
infer that if a person is married and his/her partner are working the price
of purchasing an automobile range is between (24K-47K). If the partner is
not working and married the price of purchasing an automobile range is
between (24K-49K). If the partner is not working and single the price of
purchasing an automobile range is between (23K-40K).
Bi-Variate Analysis:
Count Plot(Bar):
Gender: -

Keeping gender in x-axis we create a count plot using count plot function
to know the count of male & female. From the plot we can see that count
of male is higher than that of female.
Gender to Marital Status: -

Keeping Gender in x-axis and hue as Marital status we created a count


plot using count plot function.From the plot we can infer that percentage
of married male is higher than male who are single, and also percentage
of married female is higher than female who are single. We can also
observe that married male count is higher than female.
Gender & Age:

Keeping Age in x-axis and hue as gender created a count plot using count
plot function. From the plot we can infer that age band (male,female)
ranging from 22-30 years are more likely to purchase an automobile
compared to Age from 31-54 years. The data distribution is right skewed.

Gender & Personal Loan: -

Keeping Gender in x-axis and hue as personal loan we create a count plot
using count plot function. From the plot we can infer that Male who have
taken personal loan are more when compared to male who have not to
taken personal loan. Female who have taken personal loan are less when
compared to female who have not taken personal loan. The data
distribution is right skewed.

Scatter Plot
Salary to Partner Salary(by Gender): -
Keeping salary in x-axis and partner_salary in y-axis and hue as gender we
have created a scatter plot using scatter plot function. From the plot we
can infer that data distribution is normal. Majority of data is ranging
between 20K-40K.

Salary & Price: -

Keeping Salary in x-axis and price in y-axis and hue as gender we created
a scatter plot using scatter plot function. From the plot we can infer that
data distribution is normal. Majority data is ranging between 18K-34K. As
the salary is increasing the preference towards priced cars is more.
Joint Plot
Salary & Partner Salary: -

Keeping Salary in x-axis and partner salary in y-axis created a joint plot
using joint plot function. From the plot we can infer that data distribution
is normal. When we observe scatter plot the data distribution is normal
and histogram is rightly skewed for salary attribute. For partner salary the
histogram is left skewed. The partner salary is ranging between 20K-40K.

Multi-Variate Analysis:
Heat Map:

Created a heat map for age, salary,price,partner salary,total salary using


heatmap function. From heat map we can infer that correlation between
age and price is higher. (0.80)

Correlation between salary and partner salary is less. (0.069)


Business Questions:

Do men tend to prefer SUVs more compared to women?


From the plot we can infer that men prefer Hatchback over Sedan and
SUV.

Women prefer Sedan & SUV over hatchback.

Output:
What is the likelihood of a salaried person buying a Sedan?
From the plot we can infer that data is right skewed. People salary ranging
between 52K-70K are likely to buy Sedan.

Output:

What evidence or data supports Sheldon Cooper's claim that a


salaried male is an easier target for a SUV sale over a Sedan sale?
From the plot we can infer that salaried male whose salary ranging
between 52K-70K are more likely to buy a sedan. Men salary ranging
between 69K-87K i.e., more salary is more likely to buy a SUV.

Output:
How does the amount spent on purchasing automobiles vary by
gender?
From the plot we can infer that majority male are ready to spend 33K for
purchasing a automobile. Majority Female are ready to spend 45K for
purchasing a automobile.

Output:

How much money was spent on purchasing automobiles by


individuals who took a personal loan?
From the plot we can infer that majority female has taken personal loan to
purchase a automobile. Money spent 44k.

Output:
How does having a working partner influence the purchase of higher-
priced cars?
From plot we can infer that having working partner does less influence in
purchase of higher priced cars.

Output:

Actionable Insights & Recommendations


Men tend to spend lesser amount when compared to women in purchasing
a automobile and men prefer hatchback to other make and when a male is
earning higher than 60K then he is preferring SUV over other makes.

Female purchasing a car depends on their salary range irrespective of


range they tend to prefer SUV over other makes and irrespective of
employed or unemployed they prefer to take personal loan and purchase
a automobile.

You might also like