95% found this document useful (38 votes)

7K views25 pages

Adv Stats Proj

The document provides details about a business report on advance statistics including ANOVA, EDA, and PCA. It discusses using ANOVA to analyze how salary depends on education level and occupation using sample salary data. It performs one-way ANOVA for education level and occupation individually. It also analyzes the interaction between education and occupation using a two-way ANOVA and interaction plots. The implications for human resource departments are discussed. Principal component analysis is also proposed to analyze college data from another dataset.

Uploaded by

Zohaib Imam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

95% found this document useful (38 votes)

7K views25 pages

Adv Stats Proj

Uploaded by

Zohaib Imam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

BUSINESS REPORT

On Advance Statistics (ANOVA, EDA, PCA)

By- Zohaib Imam.

Problem 1A:
Salary is hypothesized to depend on educational qualification and occupation. To understand
the dependency, the salaries of 40 individuals [SalaryData.csv] are collected and each person’s
educational qualification and occupation are noted. Educational qualification is at three levels,
High school graduate, Bachelor, and Doctorate. Occupation is at four levels, Administrative and
clerical, Sales, Professional or specialty, and Executive or managerial. A different number of
observations are in each level of education – occupation combination.
[Assume that the data follows a normal distribution. In reality, the normality assumption may
not always hold if the sample size is small.]

Q.1.1) State the null and the alternate hypothesis for conducting one-way ANOVA for both Education and
Occupation individually.

Null and Alternate Hypothesis for Education.

H0: The means of 'Salary' variable with respect to each Education is equal.

H1: At least one of the means of 'Salary' variable with respect to each Education is unequal.

Null and Alternate Hypothesis for Occupation.

H0: The means of 'Salary' variable with respect to each Occupation is equal.

H1: At least one of the means of 'Salary' variable with respect to each Occupation is unequal.
Q.1.2) Perform one-way ANOVA for Education with respect to the variable ‘Salary’. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.

Since the p value in this scenario is less than alpha (0.05), we can say that we reject the Null Hypothesis (H0).

Q.1.3) Perform one-way ANOVA for variable Occupation with respect to the variable ‘Salary’. State
whether the null hypothesis is accepted or rejected based on the ANOVA results.

Since the p value in this scenario is greater than alpha (0.05), we can say that we cannot reject the Null
Hypothesis (H0).
Q. 1.4) If the null hypothesis is rejected in either (1.2) or in (1.3), find out which class means are
significantly different. Interpret the result.

Bachelors & Doctorate class means are totally different from other two class means.

Q. 1.5) What is the interaction between the two treatments? Analyze the effects of one variable on the
other (Education and Occupation) with the help of an interaction plot.
• As seen from the above two interaction plots, there seems to be very good interaction between
Doctorate and bachelors in the occupation of Adm-clerical and Sales.
• There is also some of a kind of interaction between Bachelors and HS-grad in the occupation of Prof-
specialty.
• But there is very less or no interaction between Doctorate and HS-grad in any of the occupation.
• The above point draws an important inference that a Doctorate graduate may not be highly
preferred for a job role and may be considered over-qualified which results in at par or not
significantly higher wage to that of a Bachelor's degree holder.

Q.1.6) Perform a two-way ANOVA based on the Education and Occupation (along with their interaction
Education*Occupation) with the variable ‘Salary’. State the null and alternative hypotheses and state your
results. How will you interpret this result?

H0: The means of 'Salary' variable with respect to each Education category and Occupation is equal.

H1: At least one of the means of 'weight6weeks' variable with respect to each Education category and
Occupation is unequal.
• For the variable Education, as P(>F) is less than 0.05(significance level), Null Hypothesis is
rejected and establishes that Education has a significant impact on the mean Salary
• For variable Occupation, as P(>F) is greater than 0.05, Null Hypothesis cannot be rejected and
establishes that Occupation have little to no statistical evidence of any significant effect on
the mean Salary
• For the interaction variable, i.e., ‘’C(Occupation)”, ‘’C(Education)”, the P(>F) is less than 0.05
indicating that there is some statistical evidence about the interaction between the 2
variables(conforming to the earlier inference drawn from Fig 1B.1) and the interaction have
significant impact to some extent on the Salary
• However, more independent variables need to be incorporated to better understand to what
extent the interaction of Education and Occupation leads to appropriate estimation of mean
Salary. For e.g., Year of Work Experience could be a probable independent variable which could
be considered for future analyses
Q. 1.7) Explain the business implications of performing ANOVA for this particular case study.

• Assuming the report is intended for HR departments of a company or HR Consulting firm,

following are the key takeaways:
An employee or a graduate’s salary is significantly dependent on their level of education as
compared to their occupation or job role
• Given the statistical conclusion about the interaction effect of education and occupation on
salary, it is imperative to say despite occupation’s lesser significance, there is some level of
impact of job role on salary
• It is also noteworthy that for few occupations a higher salary may be awarded to a Bachelors
degree holder than their Doctorate counterparts. This brings an important shortcoming of
the dataset provided which further reduces the accuracy of the tests and analyses performed,
i.e., other important independent variables which impact the salary, such as work experience,
specialization/domain, industry.
• Needless to say, on an average a Doctorate would probably earn higher salary than Bachelors
and HS-grads. However, it is also true being a Doctorate may not necessarily mean
significantly higher salary than Bachelor's degree graduates/employees as was observed in Fig
• The above point draws an important inference that a Doctorate graduate may not be highly
preferred for a job role and may be considered over-qualified which results in at par or not
significantly higher wage to that of a Bachelor's degree holder.
• Hence, HR professional may need to have a more comprehensive approach towards setting
of salary bands. As with different industries, similar job titles also do demand varying salary
packaging as with job requirements/description. Nevertheless, work experience remains an
important factor deciding salary.
• The ANOVA test does indicate that to occupation level coupled with higher educational
qualification have significant impact on the salary even though occupation type/level alone
may not be a significant influencer as compared to education.
An employee or a graduate’s salary is significantly dependent on their level of education as
compared to their occupation or job role
• Given the statistical conclusion about the interaction effect of education and occupation on
salary, it is imperative to say despite occupation’s lesser significance, there is some level of
impact of job role on salary
• It is also noteworthy that for few occupations a higher salary may be awarded to a Bachelors
degree holder than their Doctorate counterparts. This brings an important shortcoming of
the dataset provided which further reduces the accuracy of the tests and analyses performed,
i.e., other important independent variables which impact the salary, such as work experience,
specialization/domain, industry.
• Needless to say, on an average a Doctorate would probably earn higher salary than Bachelors
and HS-grads. However, it is also true being a Doctorate may not necessarily mean
significantly higher salary than Bachelor's degree graduates/employees as was observed in Fig
• The above point draws an important inference that a Doctorate graduate may not be highly
preferred for a job role and may be considered over-qualified which results in at par or not
significantly higher wage to that of a Bachelor's degree holder.
• Hence, HR professional may need to have a more comprehensive approach towards setting
of salary bands. As with different industries, similar job titles also do demand varying salary
packaging as with job requirements/description. Nevertheless, work experience remains an
important factor deciding salary.
• The ANOVA test does indicate that to occupation level coupled with higher educational
qualification have significant impact on the salary even though occupation type/level alone
may not be a significant influencer as compared to education.

Problem 2:
The dataset Education - Post 12th Standard.csv contains information on various colleges. You
are expected to do a Principal Component Analysis for this case study according to the
instructions given. The data dictionary of the 'Education - Post 12th Standard.csv' can be found
in the following file: Data Dictionary.xlsx.

Q2.1) Perform Exploratory Data Analysis [both univariate and multivariate analysis to be performed].
What insight do you draw from the EDA?

Exploratory Data Analysis:

• The cars data set has 777 observations and 18 variables in the data set.
• There are no categorical variables in this data set.
• All the values are on int64 type except ‘Names’ which is of object datatype and ‘S.F.Ratio’ which is of
float64 datatype.
• There are no missing values in the data set.
• There are no duplicate rows present in the data set.
• There are many outliers present in the dataset which have not been treated, hence the result
of the analyses may not fully very accurate. However, the simplistic approach though might
distort the output, it will fairly give an approximate result for the purpose drawing inference.
• Max percentage of PhD faculties exceeds 100(i.e., 103) which have been fixed by imputing with
the median value. Same with graduate rate (i.e., 118) which ideally can't exceed 100 have been
corrected by imputing by the median value.
Boxplot before treating outliers:

Boxplot after treating outliers:

Univariate Analysis:
Univariate analysis done on top 10% schools, total application, total acceptance, Phd, S.F.Ratio, Grad Rate,
Room Board.

Univariate analysis of students enrolling from top 10% schools:

• The mean percentage of students in a

university coming from best(top 10 schools) is
27%. The median % is 23%
• More importantly, almost 120 universities
out of 777 have students coming from best
schools
• Almost 20 universities have more than 50%
of its student population who come from the
best schools.
• Approximately there are 10 universities each
having its student population of 80% and 90%
who comes from top 10 schools.
Univariate analysis of total application:

• Mean no. of applications is 3001

whereas the median is 1558.
• The maximum no. of applications stands
at 48094 which roughly have 10
institutions.
• The distribution is highly skewed
towards right.
• There are quite a few universities who
get applications of more than 10,000+.
• Comparing the applications vs
acceptance, mean % of
acceptance/total application is 67%
whereas the median % is 67% which is
comparable.
Univariate analysis of total acceptance:

• Mean no. of acceptance is 2018

whereas the median is 1110
• The maximum no. of acceptance stands
at 26330 which roughly have less 10
institutions
• The distribution is highly skewed
towards right
Univariate analysis of faculties with PhD
Insights:

• The distribution is left skewed.

• The mean % of faculty having PhD is 72& whereas the
median % is 75%.
• There are approximately 20 universities which have
100% PhD qualified faculties.
• The lowest 25% universities when ranked by the PhD,
have about 62% of PhD qualified faculties.
Univariate analysis of student to faculty ratio
Insights:

• The mean and median SF ration are 14 and 13

respectively.
• The distribution is nearly normally distributed.
• Approximately 120 universities have Sf ratio 12.
• However, the max SF ratio is 40 which is for roughly
3 universities.
• It can also be observed few universities do have very
low SF ratio (<5) accounting for approximately 5 universities.
Univariate analysis of graduation rate
Insights:

• The mean and median graduation rate is 65 %

each
• The distribution is somewhat normally
distributed
• The lowest graduation rate is 10% which
accounts for less 10 universities
• Interestingly, there are approximately 40
universities with 100% graduation rate
Univariate analysis of boarding expenses
Insights:

• The mean and median boarding expenses

are $4357 and $4200.
• The distribution is somewhat normally
distributed.
• The highest expense stands at $8124 which
approximately accounts for less than 3 universities.
• The top 75% universities also have a
comparable expense amount of $5050.
Multivariate Analysis:

Heatmap showing correlation coefficients.

Observation and inference:

• Few pairs have very high correlation namely:

o Application and Acceptance
o Students from top 10% schools and graduation rate
o Terminal and PhD qualified faculties
o Full time undergrad students and enrolment
o Students from top 10% schools and 25% schools
• The heatmap exhibit the problem of multicollinearity which can be observed with significant
number of high correlation pairs of features. Multicollinearity is a problem because it
undermines the statistical significance of an independent variable or feature.
Q.2.2) Is scaling necessary for PCA in this case? Give justification and perform scaling.

• The main objective of scaling or standardization to normalize a data within a particular range. It is a
step of data preprocessing which is applied to independent variables or features of data. Another
importance of scaling is it helps in speeding up the calculations in an algorithm.
• If we have attributes with a well-defined meaning. Say, latitude and longitude, then we should
not scale our data, because this will cause distortion.
• But in this case, we have mixed numerical data, where each attribute is something entirely
different (like Room. Board, Grad. Rate), has different units attached (price, graduation rate, ...)
then these values aren't really comparable. so, we need to scale our data. scaling of data can be
done by z-score or from sklearn standardScaler.

Q.2.3) Comment on the comparison between the covariance and the correlation matrices from this data.

• Correlation is a scaled version of covariance; note that the two parameters always have the same
sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively
correlated; when the sign is negative, the variables are said to be negatively correlated; and when
the sign is 0, the variables are said to be uncorrelated.
• In simple sense correlation, measures both the strength and direction of the linear relationship
between two variables.
• Covariance is a measure used to determine how much two variables change in tandem. It indicates
the direction of the linear relationship between variables.

Q.2.4) Check the dataset for outliers before and after scaling. What insight do you derive here?

• While doing the univariate analysis we have check the outliers using the boxplot after standardizing
we are again checking the outliers.

Boxplot of outliers after standardizing.

Observation:

• The scaled dataset has all similar max values and comparable min values.
• The mean value for each of the variables are comparable to 0 and standard deviation 1.
• The range of each variables hence are now standardized and are all unit-less quantities.

Q.2.5) Perform PCA and export the data of the Principal Component scores into a data frame.

• The cumulative % gives the percentage of variance accounted for by the n components. For Example,
the cumulative percentage for the second component is the sum of the percentage of variance for
the first and second components. It helps in deciding the number of components by calculating the
components which explained the high variance.

• In the above array we see that the first feature explains 33.3% of variance within our data set while
the first two explains 62.1% and so on. If we employ 7 features, we capture ~ 07.6% of the variance
within the data set.

Below Is the Principal Score into a Data Frame:

Features marked with rectangular red box are the one having maximum loading on the respective
component. We consider these marked features to decide the context that the component represents.
• The Cumulative % gives the percentage of variance accounted for by the n components. For
example, the cumulative percentage for the second component is the sum of the percentage of
variance for the first and second components. It helps in deciding the number of components by
selecting the components which explained the high variance.

Correlation between components and features:

Q.2.6) Extract the eigenvalues and eigenvectors.[print both}.

• Eigenvalue and Eigenmatrix are mainly used to capture key information that stored in a large
matrix.
• It provides summary of large matrix.
• Performing computation on large matrix is slow and require more memory and CPU,
eigenvectors and eigenvalues can improve the efficiency in computationally intensive task by
reducing dimensions after ensuring of the key information is maintained.
Q.2.7) Write down the explicit form of the first PC (in terms of the eigenvectors. Use values with two places
of decimals only).

• Explicit form of first PC;

Eigen Vectors:

Q.2.8) Consider the cumulative values of the eigenvalues. How does it help you to decide on the optimum
number of principal components? What do the eigenvectors indicate?

• The first eigen value explains 33.20% of the information represented by all the 17 features
• Similarly, the 1st and 2nd eigen value together explains 61.5% of the information and so on
• It can be observed that by considering up to 8th eigen value, the total variance which can be
explained is 90.33%
• Based on the advice by the business about the acceptable percentage of variance explained,
one can identify the optimum number of PCs
• Here in this case, if 90.33% is assumed to be acceptable for the scope of the analyses, we see
by considering the first 8 PCs out of 17 PCs generated can explain significant amount of
variance or in simpler words, can represent the 90.33% information present in 17 numeric
features in the original dataset
• Another way of visualizing and approximating the optimal number of PCs can be done using
a Scree Plot. Note: the y-axis represents the % of variance explained

• Eigen vectors are the coefficient of new feature components which is obtained by multiplying the
Eigen Vector values by the features.
Q.2.9) Explain the business implication of using the Principal Component Analysis for this case study. How
many PCs help in the further analysis? [Hint: Write Interpretations of the Principal Components Obtained].

• PCA is a statistical technique and uses orthogonal transformation to convert a set of observations of
possibly correlated variables into a set of values of linearly uncorrelated variables. PCA also is a tool
to reduce multidimensional data to lower dimensions while retaining most of the information.
Principal Component Analysis (PCA) is a well-established mathematical technique for reducing the
dimensionality of data, while keeping as much variation as possible.
• This PCA can only be done on continuous variables.
• There are about 18 variables in the dataset, by applying PCA we will reduce those to just 7
components which will capture 87.6 % variance in the dataset.

Problem Statement2
0% (1)
Problem Statement2
2 pages
Weekly Quiz 2 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
No ratings yet
Weekly Quiz 2 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
5 pages
This Study Resource Was: Quiz 3
100% (1)
This Study Resource Was: Quiz 3
5 pages
PM - ExtendedProject - Business Report
100% (4)
PM - ExtendedProject - Business Report
35 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Business Report
No ratings yet
Business Report
12 pages
Linear Regression: Prepared by Muralidharan N
77% (13)
Linear Regression: Prepared by Muralidharan N
34 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
PCA Project Advanced Statistics
67% (3)
PCA Project Advanced Statistics
24 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
Weekly Quiz 3 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
100% (2)
Weekly Quiz 3 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
6 pages
Mountain State University 1
100% (1)
Mountain State University 1
2 pages
Week 7 Project Report 1 and 2
No ratings yet
Week 7 Project Report 1 and 2
10 pages
JCL Refresher
100% (2)
JCL Refresher
50 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Project Report
100% (3)
Project Report
36 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Lifi
100% (1)
Lifi
16 pages
Linear - Regression - Assignment: Problem Statement
100% (3)
Linear - Regression - Assignment: Problem Statement
24 pages
Project - Advanced Statistics - Final-1
100% (3)
Project - Advanced Statistics - Final-1
15 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Facebook Comment Volume Prediction
100% (1)
Facebook Comment Volume Prediction
12 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
FRA Report
100% (1)
FRA Report
30 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Algorithms: Notes For Professionals
100% (1)
Algorithms: Notes For Professionals
252 pages
Week 1 Graded Quiz On Solution PDF
100% (1)
Week 1 Graded Quiz On Solution PDF
2 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
ML Quiz-2
No ratings yet
ML Quiz-2
5 pages
Graded Project AS
No ratings yet
Graded Project AS
14 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
Exploratory Data Analysis:: Salarydata - CSV
No ratings yet
Exploratory Data Analysis:: Salarydata - CSV
32 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
Business Report: Pgpdsba Advanced Statistics Module Project
100% (3)
Business Report: Pgpdsba Advanced Statistics Module Project
18 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
As Quiz 3 PCA Solution PDF
100% (1)
As Quiz 3 PCA Solution PDF
1 page
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
HITEC PowerPRO2700 - 2016 PDF
100% (4)
HITEC PowerPRO2700 - 2016 PDF
55 pages
Pranjal - Singh - 27.11.2022 AS Project
No ratings yet
Pranjal - Singh - 27.11.2022 AS Project
9 pages
Pos - 0101 Qe Et200sp Elev.28kw Inv - Sew.+cat. 1,5kw Sew 2i004764 (Es2-2019) q5 Vinamilk - 04!30!2020 English Version
No ratings yet
Pos - 0101 Qe Et200sp Elev.28kw Inv - Sew.+cat. 1,5kw Sew 2i004764 (Es2-2019) q5 Vinamilk - 04!30!2020 English Version
51 pages
Worksheet On Force
No ratings yet
Worksheet On Force
3 pages
Dual Operational Amplifier: General Description Package Outline
No ratings yet
Dual Operational Amplifier: General Description Package Outline
4 pages
Ajax Selenium Webdriver
No ratings yet
Ajax Selenium Webdriver
6 pages
Performance Task #5: University of San Agustin
No ratings yet
Performance Task #5: University of San Agustin
7 pages
Huawei SinlgeSDB HSS9860-BE Feature Description
No ratings yet
Huawei SinlgeSDB HSS9860-BE Feature Description
26 pages
DLD Exam
No ratings yet
DLD Exam
25 pages
Ec24 33
No ratings yet
Ec24 33
3 pages
Digital Systems
No ratings yet
Digital Systems
390 pages
2024 Ceed Mathematics - Paper I
No ratings yet
2024 Ceed Mathematics - Paper I
14 pages
1tne968902r1101 Ai561s500 Analog Input Mod 4ai U I
No ratings yet
1tne968902r1101 Ai561s500 Analog Input Mod 4ai U I
2 pages
Itelect2a Module 1
No ratings yet
Itelect2a Module 1
37 pages
Arabic Pronunciation Activity - Azida Hazlin Binti Hayazi (MC200912233) (Section 2)
No ratings yet
Arabic Pronunciation Activity - Azida Hazlin Binti Hayazi (MC200912233) (Section 2)
3 pages
Post Graduate Graduate Graduate Professional 12th Pass 10th Pass 8th Pass Doctorate Others Literate 5th Pass Illiterate
No ratings yet
Post Graduate Graduate Graduate Professional 12th Pass 10th Pass 8th Pass Doctorate Others Literate 5th Pass Illiterate
4 pages
Relationship Between Marketing and Customer Satisfaction: Case Study From Beco Powering Somalia in Mogadishu-Somalia
No ratings yet
Relationship Between Marketing and Customer Satisfaction: Case Study From Beco Powering Somalia in Mogadishu-Somalia
10 pages
(Touzi) Deterministic and Stochastic Control, Application To Finance
No ratings yet
(Touzi) Deterministic and Stochastic Control, Application To Finance
117 pages
CHEM 113-Quiz #7 Answer Key
No ratings yet
CHEM 113-Quiz #7 Answer Key
4 pages
Hollywood at It's Best
No ratings yet
Hollywood at It's Best
11 pages
User Maual For Operation and PC Software and APP of TC66 (C) Type-C USB PD Trigger Meter 2019.6.5
No ratings yet
User Maual For Operation and PC Software and APP of TC66 (C) Type-C USB PD Trigger Meter 2019.6.5
12 pages
Jurnal Spasial: Volume 6, Nomor 1, April
No ratings yet
Jurnal Spasial: Volume 6, Nomor 1, April
7 pages
WF4 Pre Production HoW
No ratings yet
WF4 Pre Production HoW
142 pages
CE Topic 2 & 3
No ratings yet
CE Topic 2 & 3
2 pages
Research Final
No ratings yet
Research Final
39 pages
Risc VS Cisc
No ratings yet
Risc VS Cisc
2 pages
Microsoft Excel Intermediate
No ratings yet
Microsoft Excel Intermediate
9 pages
BasicMath F4 2022
No ratings yet
BasicMath F4 2022
6 pages
قوانين الفصول بملف واحد فيزياء السادس علمي للاستاذ سعيد محي تومان PDF PDF Mathematical Analysis Teaching Mathematics
No ratings yet
قوانين الفصول بملف واحد فيزياء السادس علمي للاستاذ سعيد محي تومان PDF PDF Mathematical Analysis Teaching Mathematics
1 page
Extech Phase Rotation Testers
No ratings yet
Extech Phase Rotation Testers
1 page
Grade 2 Tos Sum1
No ratings yet
Grade 2 Tos Sum1
5 pages

Adv Stats Proj

Uploaded by

Adv Stats Proj

Uploaded by

BUSINESS REPORT

On Advance Statistics (ANOVA, EDA, PCA)

By- Zohaib Imam.

Null and Alternate Hypothesis for Education.

Null and Alternate Hypothesis for Occupation.

• Assuming the report is intended for HR departments of a company or HR Consulting firm,

Exploratory Data Analysis:

Boxplot after treating outliers:

Univariate analysis of students enrolling from top 10% schools:

• The mean percentage of students in a

• Mean no. of applications is 3001

• Mean no. of acceptance is 2018

• The distribution is left skewed.

• The mean and median SF ration are 14 and 13

• The mean and median graduation rate is 65 %

• The mean and median boarding expenses

Heatmap showing correlation coefficients.

Observation and inference:

• Few pairs have very high correlation namely:

Boxplot of outliers after standardizing.

Below Is the Principal Score into a Data Frame:

Correlation between components and features:

• Explicit form of first PC;

You might also like