Business Project Report
Business Project Report
Abstract
Exploring, Analyzing the data of Austo Motor Company and providing
insights and answers to the business questions from the company for
improving the business
Swathi Karnamadakala
[email protected]
Contents
Introduction:.......................................................................................................... 2
Exploratory Data Analysis:..................................................................................... 2
Descriptive statistics:......................................................................................... 2
Head:............................................................................................................... 2
Tail:................................................................................................................. 2
Shape:............................................................................................................. 3
Info:................................................................................................................. 3
Describe:......................................................................................................... 3
Type:............................................................................................................... 4
Data Check:........................................................................................................ 4
Isnull:............................................................................................................... 4
Duplicates:...................................................................................................... 5
Unique:............................................................................................................ 5
Unique Categorical (Percentage):....................................................................6
Missing value detection and imputation:............................................................6
Incorrect values correction:................................................................................ 7
Outlier Detection:............................................................................................... 7
Univariate Analysis:............................................................................................. 10
Histogram......................................................................................................... 10
Box Plot:........................................................................................................... 12
Bi-Variate Analysis:.............................................................................................. 14
Count Plot(Bar):................................................................................................ 14
Scatter Plot....................................................................................................... 16
Joint Plot........................................................................................................... 18
Multi-Variate Analysis:......................................................................................... 18
Business Questions:............................................................................................. 19
Do men tend to prefer SUVs more compared to women?.................................19
What is the likelihood of a salaried person buying a Sedan?............................20
What evidence or data supports Sheldon Cooper's claim that a salaried male is
an easier target for a SUV sale over a Sedan sale?..........................................20
How does the amount spent on purchasing automobiles vary by gender?......21
How much money was spent on purchasing automobiles by individuals who
took a personal loan?....................................................................................... 21
How does having a working partner influence the purchase of higher-priced
cars?................................................................................................................. 21
Actionable Insights & Recommendations.............................................................22
Introduction:
Austo Motor Company is a leading car manufacturer specializing in SUV,
Sedan, and Hatchback models. We need to analysis to improve the
efficiency of already existing marketing campaign so that they will
understand the demand of customers which will help in improving the
experience for customers.
Descriptive statistics:
In this part we would be providing the details on the data like description,
head, tail, shape and other basic statistics on the data.
Head:
We can obtain information about the dataset's first five rows with the use
of this function. The system's default value for this function is '5.' If
necessary, we can provide other values.
Output:
Age Gender Profession Marital_status Education No_of_Dependents Personal_loan House_loan Partner_working Salary Partner_salary Total_salary Price Make
53 Male Business Married Post Graduate 4 No No Yes 99300 70700 170000 61000 SUV
53 Femal Salaried Married Post Graduate 4 Yes No Yes 95500 70300 165800 61000 SUV
53 Female Salaried Married Post Graduate 3 No No Yes 97300 60700 158000 57000 SUV
53 Female Salaried Married Graduate 2 Yes No Yes 72500 70300 142800 61000 SUV
53 Male Salaried Married Post Graduate 3 No No Yes 79700 60200 139900 57000 SUV
Tail:
We can obtain information about the dataset's last five rows with the use
of this function. The system's default value for this function is '5.' If
necessary, we can provide other values.
Output:
Age Gender Profession Marital_status Education No_of_Dependents Personal_loan House_loan Partner_working Salary Partner_salary Total_salary Price Make
22 Male Salaried Single Graduate 2 No Yes No 33300 0 33300 27000 Hatchback
22 Male Business Married Graduate 4 No No No 32000 NaN 32000 31000 Hatchback
22 Male Business Single Graduate 2 No Yes No 32900 0 32900 30000 Hatchback
22 Male Business Married Graduate 3 Yes Yes No 32200 NaN 32200 24000 Hatchback
22 Male Salaried Married Graduate 4 No No No 31600 0 31600 31000 Hatchback
Shape:
This function will allow us to determine the number of rows and columns
of loaded data. If necessary, we can only verify the number of rows and
columns individually by specifying 0 and 1 in the function itself.
Output:
(1581, 14)
Info:
Using this function, we will be able to acquire the details regarding the
loaded data, including the count, data type, number of columns, and
column names.
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1581 entries, 0 to 1580
Data columns (total 14 columns):
# Column Non-Null Count Dtype
0 Age 1581 non-null int64
1 Gender 1528 non-null object
2 Profession 1581 non-null object
3 Marital_status 1581 non-null object
4 Education 1581 non-null object
5 No_of_Dependents 1581 non-null int64
6 Personal_loan 1581 non-null object
7 House_loan 1581 non-null object
8 Partner_working 1581 non-null object
9 Salary 1581 non-null int64
10 Partner_salary 1475 non-null float64
11 Total_salary 1581 non-null int64
12 Price 1581 non-null int64
13 Make 1581 non-null object
dtypes: float64(1), int64(5), object(8)
memory usage: 173.1+ KB
Describe:
Using this function, we will be to know the statistical summary of the
loaded data, including mean, median,standard deviation,min,max ,count
and frequency of the data.
Output:
count unique top freq mean std min 25% 50% 75% max
Age 1581 NaN NaN NaN 31.922201 8.425978 22 25 29 38 54
Gender 1528 4 Male 1199 NaN NaN NaN NaN NaN NaN NaN
Profession 1581 2 Salaried 896 NaN NaN NaN NaN NaN NaN NaN
Marital_status 1581 2 Married 1443 NaN NaN NaN NaN NaN NaN NaN
Education 1581 2 Post Graduate 985 NaN NaN NaN NaN NaN NaN NaN
No_of_Dependents 1581 NaN NaN NaN 2.457938 0.943483 0 2 2 3 4
Personal_loan 1581 2 Yes 792 NaN NaN NaN NaN NaN NaN NaN
House_loan 1581 2 No 1054 NaN NaN NaN NaN NaN NaN NaN
Partner_working 1581 2 Yes 868 NaN NaN NaN NaN NaN NaN NaN
Salary 1581 NaN NaN NaN 60392.22011 14674.82504 30000 51900 59500 71800 99300
Partner_salary 1475 NaN NaN NaN 20225.55932 19573.14928 0 0 25600 38300 80500
Total_salary 1581 NaN NaN NaN 79625.99621 25545.85777 30000 60500 78000 95900 171000
Price 1581 NaN NaN NaN 35597.72296 13633.63655 18000 25000 31000 47000 70000
Make 1581 3 Sedan 702 NaN NaN NaN NaN NaN NaN NaN
Type:
Using this function, we will be able to know the data type of all columns of
the loaded data.
Output:
Data Check:
Isnull:
Using this function, we will be to check total number of missing values of
loaded data.
Output:
Age 0
Gender 53
Profession 0
Marital_status 0
Education 0
No_of_Dependents 0
Personal_loan 0
House_loan 0
Partner_working 0
Salary 0
Partner_salary 106
Total_salary 0
Price 0
Make 0
dtype: int64
Duplicates:
Using this function, we will be able to check if there are any duplicated
values of loaded data.
Output:
0
Unique:
Using this function, we will be able to check if there are any unique values
in the columns.
Output:
Gender
Male 1199
Female 329
Name: count, dtype: int64
--------------------------------------------------
Profession
Salaried 896
Business 685
Name: count, dtype: int64
--------------------------------------------------
Marital_status
Married 1443
Single 138
Name: count, dtype: int64
--------------------------------------------------
Education
Post Graduate 985
Graduate 596
Name: count, dtype: int64
--------------------------------------------------
Personal_loan
Yes 792
No 789
Name: count, dtype: int64
--------------------------------------------------
House_loan
No 1054
Yes 527
Name: count, dtype: int64
--------------------------------------------------
Partner_working
Yes 868
No 713
Name: count, dtype: int64
--------------------------------------------------
Make
Sedan 702
Hatchback 582
SUV 297
Name: count, dtype: int64
--------------------------------------------------
Unique Categorical (Percentage):
Using for loop and including the function to calculate the percentage we
can find out percentage of unique categorical values in the column.
Output:
Gender
Male 0.784686
Female 0.215314
Name: proportion, dtype: float64
--------------------------------------------------
Profession
Salaried 0.56673
Business 0.43327
Name: proportion, dtype: float64
--------------------------------------------------
Marital_status
Married 0.912713
Single 0.087287
Name: proportion, dtype: float64
--------------------------------------------------
Education
Post Graduate 0.623023
Graduate 0.376977
Name: proportion, dtype: float64
--------------------------------------------------
Personal_loan
Yes 0.500949
No 0.499051
Name: proportion, dtype: float64
--------------------------------------------------
House_loan
No 0.666667
Yes 0.333333
Name: proportion, dtype: float64
--------------------------------------------------
Partner_working
Yes 0.54902
No 0.45098
Name: proportion, dtype: float64
--------------------------------------------------
Make
Sedan 0.444023
Hatchback 0.368121
SUV 0.187856
Name: proportion, dtype: float64
--------------------------------------------------
Output:
Output:
Outlier Detection:
For salary, partner salary and price after running the outlier process we
can see that there are no outliers to impute or treat, but when it comes to
total salary we can see 1.7% in the total salary which has been treated
using the outlier treatment methodology.
Output:
Salary:-
Q1 = 51900.0
Q3 = 71800.0
IQR = 19900.0
lower_whisker = 22050.0
upper_whisker = 101650.0
Total_Salary:-
Q1 = 60500.0
Q3 = 95900.0
IQR = 35400.0
lower_whisker = 7400.0
upper_whisker = 149000.0
Output:
Salary:-
Price:-
For price attribute keeping price in x-axis we create a histogram using hist
plot function. The data distribution for price is right skewed. There are no
outliers.
Partner_salary:-
Since we have given hue as gender, Male attribute is shown in blue and
Female attribute is shown in orange colour.
Box Plot:
Gender & Salary:-
Keeping Gender in the x-axis, Salary in y-axis we have created a box plot
using box plot function. From the plot we can infer that male salary plot
has outliers. When we see the data for salary column there are no outliers.
Make & Salary: -
Keeping Make in x-axis and Salary in y-axis we create a box plot using box
plot function.
From the plot we can infer that buying an SUV is more common among
men and women when the salary range is more than 60000. If the salary
is less than 60000 then the preference shifts to either Sedan/Hatchback.
Keeping Partner working in x-axis and price in y-axis and hue as marital
status we created a box plot using box plot function. From the plot we can
infer that if a person is married and his/her partner are working the price
of purchasing an automobile range is between (24K-47K). If the partner is
not working and married the price of purchasing an automobile range is
between (24K-49K). If the partner is not working and single the price of
purchasing an automobile range is between (23K-40K).
Bi-Variate Analysis:
Count Plot(Bar):
Gender: -
Keeping gender in x-axis we create a count plot using count plot function
to know the count of male & female. From the plot we can see that count
of male is higher than that of female.
Gender to Marital Status: -
Keeping Age in x-axis and hue as gender created a count plot using count
plot function. From the plot we can infer that age band (male,female)
ranging from 22-30 years are more likely to purchase an automobile
compared to Age from 31-54 years. The data distribution is right skewed.
Keeping Gender in x-axis and hue as personal loan we create a count plot
using count plot function. From the plot we can infer that Male who have
taken personal loan are more when compared to male who have not to
taken personal loan. Female who have taken personal loan are less when
compared to female who have not taken personal loan. The data
distribution is right skewed.
Scatter Plot
Salary to Partner Salary(by Gender): -
Keeping salary in x-axis and partner_salary in y-axis and hue as gender we
have created a scatter plot using scatter plot function. From the plot we
can infer that data distribution is normal. Majority of data is ranging
between 20K-40K.
Keeping Salary in x-axis and price in y-axis and hue as gender we created
a scatter plot using scatter plot function. From the plot we can infer that
data distribution is normal. Majority data is ranging between 18K-34K. As
the salary is increasing the preference towards priced cars is more.
Joint Plot
Salary & Partner Salary: -
Keeping Salary in x-axis and partner salary in y-axis created a joint plot
using joint plot function. From the plot we can infer that data distribution
is normal. When we observe scatter plot the data distribution is normal
and histogram is rightly skewed for salary attribute. For partner salary the
histogram is left skewed. The partner salary is ranging between 20K-40K.
Multi-Variate Analysis:
Heat Map:
Output:
What is the likelihood of a salaried person buying a Sedan?
From the plot we can infer that data is right skewed. People salary ranging
between 52K-70K are likely to buy Sedan.
Output:
Output:
How does the amount spent on purchasing automobiles vary by
gender?
From the plot we can infer that majority male are ready to spend 33K for
purchasing a automobile. Majority Female are ready to spend 45K for
purchasing a automobile.
Output:
Output:
How does having a working partner influence the purchase of higher-
priced cars?
From plot we can infer that having working partner does less influence in
purchase of higher priced cars.
Output: