0% found this document useful (0 votes)
29 views20 pages

Project PDF

Uploaded by

trivedi.sundeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views20 pages

Project PDF

Uploaded by

trivedi.sundeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

BUSINESS ANALYSIS

REPORT
ADVANCED STATISTICS

SEPTEMBER 26, 2021

SANDYA V B
CONTENTS

PROBLEM 1: ANOVA
1) State the Null and Alternate Hypothesis for conducting one-way
ANOVA for both the variables ‘Manufacturer’ and ‘Technician
individually.
2) Perform one-way ANOVA for variable ‘Manufacturer’ with respect to the
variable ‘Service Time’. State whether the Null Hypothesis is accepted or
rejected based on the ANOVA results.
3) Perform one-way ANOVA for variable ‘Technician’ with respect to the
variable ‘Service Time’. State whether the Null Hypothesis is accepted or
rejected based on the ANOVA results.
4) Analyse the effects of one variable on another with the help of an
interaction plot. What is an interaction between two treatments? [hint: use
the ‘pointplot’ function from the ‘seaborn’ graphical subroutine in Python]
5) Perform a two-way ANOVA based on the variables ‘Manufacturer’ &
‘Technician’ with respect to the variable ‘Service Time’ and state your
results.
6) Mention the business implications of performing ANOVA for this
particular case study.

PROBLEM 2: PCA
1) Perform Exploratory Data Analysis [both univariate and multivariate
analysis to be performed]. The inferences drawn from this should be
properly documented.
2) Scale the variables and write the inference for using the type of scaling
function for this case study.
3) Comment on the comparison between covariance and the correlation
matrix after scaling.
4) Check the dataset for outliers before and after scaling. Draw your
inferences from this exercise.
5) Build the covariance matrix, eigenvalues and eigenvector.
6) Write the explicit form of the first PC (in terms of Eigen Vectors)
7) Discuss the cumulative values of the eigenvalues. How does it help you
to decide on the optimum number of principal components? What do the
eigenvectors indicate? Perform PCA and export the data of the Principal
Component scores into a data frame.
8) Mention the business implication of using the Principal Component
Analysis for this case study.

Problem 1: ANOVA

The staff of a service centre for electrical appliances include three technicians who
specialize in repairing three widely used electrical appliances by three different
manufacturers. It was desired to study the effects of Technician and Manufacturer
on the service time. Each technician was randomly assigned five repair jobs on
each manufacturer's appliance and the time to complete each job (in minutes) was
recorded.

Dataset for Problem 1: Service..csv

Data Dictionary:
Problem 1 data consists of –

• Technician

• Manufacturer

• Job

• ServiceTime
1.1State the Null and Alternate Hypothesis for conducting one-way
ANOVA for both the variables ‘Manufacturer’ and ‘Technician
individually.

• There are 45 rows and 4 columns present in the dataset.


• There are no null values present.
• All the 4 variables are of int64 data type.

After having a look at the data null and alternate hypothesis can be inferred for
the Manufacture and Technician variables =
Ho: There is no significant difference between both the variables ‘Manufacturer’
and ‘Technician’.: µ1 equals µ2
Ha: There is some significant difference between both the variables
‘Manufacturer’ and ‘Technician’.: µ1 not equals µ2

1.2 Perform one-way ANOVA for variable ‘Manufacturer’ with


respect to the variable ‘Service Time’. State whether the Null
Hypothesis is accepted or rejected based on the ANOVA results.
ONE-WAY ANOVA: test for factor Manufacturer on ServiceTime
variable:

• After performing one way ANOVA for variable ‘Manufacturer’ with


respected to service time the p-value obtained was 0.655197.
• Which is greater than 0.05. Hence null hypothesis can’t be rejected.
• So there is no significant difference between both the variables.
• Thus we can say that factor ‘Manufacturer’ has no effect on the
‘ServiceTime’.
Manufacturer
1 56.133333
2 56.600000
3 54.733333
Name: ServiceTime, dtype: float64

• The levels 1,2 and 3 of the factor 'Manufacturer' are balanced.


• The means of 'ServiceTime' variable look different, we can observe the
statistically significance of this by performing one-way ANOVA on factor
Manufacturer.
1.3 Perform one-way ANOVA for variable ‘Technician’ with
respect to the variable ‘Service Time’. State whether the Null
Hypothesis is accepted or rejected based on the ANOVA results

ONE-WAY ANOVA: test for factor Technician on ServiceTime


variable:

• After performing one way ANOVA for variable ‘Technician’ with


respected to service time the p-value obtained was 0.624702.
• Which is greater than 0.05. Hence null hypothesis can’t be rejected.
• So there is no significant difference between both the variables.
• Thus we can say that factor ‘Technician’ has no effect on the
‘ServiceTime’.
Technician
1 55.333333
2 55.266667
3 56.866667
Name: ServiceTime, dtype: float64

• The levels 1,2 and 3 of the factor 'Technician' are balanced.


• The means of 'ServiceTime' variable look different, we can observe the
statistically significance of this by performing one-way ANOVA on factor
Manufacturer.
1.4 Analyse the effects of one variable on another with the help of
an interaction plot. What is an interaction between two treatments?
[hint: use the ‘pointplot’ function from the ‘seaborn’ graphical
subroutine in Python]
• The different levels of the factors 'Technician' and 'Manufacturer' are balanced.
• In the table of means for a particular level in one factor, the value seems to change for
each level of the second factor.
• In the interaction plot we need to observe if this change is same or different to infer
on interaction between the two factors.
• Referring to the interaction plots we can conclude that there are some interaction
between the two variable Technician and Service time based on the Manufacturer
types 1,2 & 3.
• We use a point plot to study that: If the lines are parallel, there is no interaction. If the
lines are overlapping, there is an interaction.

1.5 Perform a two-way ANOVA based on the variables ‘Manufacturer’


& ‘Technician’ with respect to the variable ‘Service Time’ and state
your results.

• Let us consider our Alpha value for checking the hypothesis at 0.05 (5%).
• While analysing two-way ANOVA, we first observe the corresponding values of
the interaction term. If there is statistically significant interaction effect then we
cannot consider the main effects i.e. p values of the independent variables
separately because considering their effect separately could be misleading as there
is statistically significant evidence of interaction being present between the two
independent variables.
• The p-value for Manufacturer is 0.656486, which indicates that there is an
association of Manufacturer and Service Time.
• The p-value for Technician is 0.626250, which indicates that there is an association
between Technician and Service Time.
• The p-value for the interaction between Manufacturer*Technician is 0.236268,
which indicates that the relationship between Manufacturer and Service Time
depends on the value of Technician. Because the interaction effect between
Manufacturer and Technician is statistically significant, we cannot interpret the
main effects without considering the interaction effect.

1.6 Mention the business implications of performing ANOVA for this


particular case study.
After performing ANOVA on the data and following the case study we can conclude
that since the interaction effect between Manufacturer and Technician is statistically
significant.
We cannot interpret the main effects without considering the interaction effect.

So the data Service Time required by a Technician varies for a products of a different
Manufacturers.

Problem 2: PCA
The ‘Hair Salon.csv’ dataset contains various variables used for the context of
Market Segmentation. This particular case study is based on various parameters of
a salon chain of hair products. You are expected to do Principal Component
Analysis for this case study according to the instructions given in the following
rubric.

Dataset for Problem 1: Hair Salon..csv

Data Dictionary:
2.1 Perform Exploratory Data Analysis [both univariate and
multivariate analysis to be performed]. The inferences drawn from
this should be properly documented.

• There are no null value present in the dataset.


• All the variables are of float64 data type expect for ID variable which is of int64
data type.
• There are 100 rows and 13 columns present.
• ID column is removed from the dataset, because it has no purpose.

• There are no duplicate data present.


• From the box plot we see that there are few outliers present in Ecom, SalesFlmage,
OrdBilling and DelSpeed. Which needs to be treated.
2.2 Scale the variables and write the inference for using the type of
scaling function for this case study.

• After scaling we see that the outliers are been treated and are aligned.
2.3 Comment on the comparison between covariance and the
correlation matrix after scaling.

• After scaling the covariance matrix seems more linear which means the
relationship between variables are more aligned or no or negligibly deviated.
• From the correlation matrix post scaling we can infer that the strength of values in
the covariance matrix seems to be good for us to move ahead with finding the
eigen values for PCA to be performed.
2.4 Check the dataset for outliers before and after scaling. Draw your
inferences from this exercise.
Before Scaling:

After Scaling:

• After scaling we see that the outliers are been treated and are aligned.
2.5 Build the covariance matrix, eigenvalues and eigenvector.

2.6 Write the explicit form of the first PC (in terms of Eigen Vectors).

2.7 Discuss the cumulative values of the eigenvalues. How does it help
you to decide on the optimum number of principal components? What
do the eigenvectors indicate? Perform PCA and export the data of the
Principal Component scores into a data frame.
• We can see the optimum number of principal components would be 7 as the steep
reduces gradually after that. The covariance matrix provides a clear cut view on the
same.
2.8 Mention the business implication of using the Principal
Component Analysis for this case study.

The business implication thus can be concluded as:

The higher level of satisfaction mostly relies on different factors like the Product quality
score, the pricing, delivery speed etc. But from the below heat map.

It can be inferred that the maximum business can be achieved by if more efforts are put
into the Advertising of various factors. As the correlation for each of factors are majorly
affected by advertising so it can impact on greatly on the business and hence providing
the satisfaction level on the positive side.

You might also like