0% found this document useful (0 votes)
4 views46 pages

02 Stats Revision

Uploaded by

suresh.yalagudri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views46 pages

02 Stats Revision

Uploaded by

suresh.yalagudri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Revisit

Business Statistics
Case-let

Candidat Candidat
e1 e2
Skill
60 90
1
Skill
60 75
2
Skill
Mean 60 60 63 60
3
Skill
60 37
4
Skill
60 35
5

https://pxhere.com/en/photo/1446003

You want to select one of the two candidates for a job in your company based on scores in 5
equally important skill sets. How are they same? How are they different from each other?
Public
Case-let

 You being a doctor, a patient comes with a fever


 How will you proceed?
 What data will you need?
 How will you organize data?

Population
Sample

Public
What is business statistics & Business analytics?

Collection
Organization
Presentation
 Business statistics, is the Science of
Analysis collection, organization, presentation,
Interpretation analysis and interpretation of
numerical data; and; using the
combination of skills, technologies,
applications and processes to gain
insight in to businesses, based on data
and statistics.

Public
Measures of Central
Tendency
Mean

Median Measures of
Mode
Dispersion
Standard Deviation
… Variance
… Range

Interquartile Range

Data Distribution

Skewness

Kurtosis

….
Measures of Central Tendency

𝑀𝑒𝑎𝑛 ( 𝑥 )=
∑ 𝑥
𝑁
49 For even Sales (mil
55 numbers $)
Q1

( )
𝑡h 55
𝑛+1 63 median is; 55
𝑀𝑒𝑑𝑖𝑎𝑛= 63 Q2 Mean of
2 63
71 Q3 (n/2)th and 63
71 (n/2+1)th 49
76
item
55
𝑀𝑜𝑑𝑒=𝑚𝑜𝑠𝑡 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑛𝑜 . 71
63
71
63
76
Public
Measures of Dispersion

 Standard Deviation
 Variance
 Range
Standard deviation of
 Interquartile Range population
 Data Distribution 49 Sales (mil
 Skewness
55
Q1
$)
55

 Kurtosis
63
55
63 Q2
63
63
71 Q3
71
76 49
55
71
63
71
63
76
Public
Measures of Dispersion

Standard deviation of
Sales (mil population
Mean x-µ (x-µ)2
$)
55 62.889 -7.88 62.239
63 62.889 0.111 0.012
49 62.889 -13.888 192.901  = 8.30
55 62.889 -7.889 62.234  = SQRT(68.98)
71 62.889 8.111 65.790
63 62.889 0.111 0.012
71 62.889 8.111 65.790
63 62.889 0.111 0.0123 = 68.987
76 62.889 13.111 171.901
= 620.888/9
620.888 Public
“The most technologically efficient
machine that man has ever invented is
Software installations the book.”

- Northrop Frye
Data analysis software: R-programming

 There are many software used by data scientists for statistical data analysis,
 IBM SPSS
 SAS

 Python Programming
 R-programming

Public
Lab
Install R-programming software
Install R-studio software

Introduction to R-programming
BASE commands
Read file in R-studio
“What gets measured,
gets managed.”

Levels of Measurement
- Peter Drucker
American Management
Guru (1909 – 2005)
Levels of Measurement

Public
You are working for a automobile
manufacture. You wants to study what car
customers think of their car and more….
Customer satisfaction Survey: Classificatio Natural
Order Distance
n Origin
What is your name._______________________
Nomina No No No
Your age: ______ Yes
l
Gender Ordinal Yes Yes No No
√ Male
 Female Interval Yes Yes Yes No
Ratio Yes
Nomina Yes Yes Yes
Nomina
Income gr.  Below 5000 String Ratio Ordinal Interval
l l
√ 5001 to 10000
 10001 to 25000
 25001 to 50000 respondent age gender incgrp make csat
 50001 and above Manoj 55 Male 5000 to 10000 Hundai 2
Car Make:  Suzuki Sonali 45 Female 25000 to 50000 Honda 4
Krishna 45 Male 25000 to 50000 Suzuki 5
√ Hundai
 Tata Ganesh 350 Male CATEGORICAL:
5000 to 10000 Suzuki 2
 Honda Sonia 55 Female
NominalAbove 50000
& Ordinal Hundai 5
 Toyota Rupali 45 Female 25000 to 50000 Toyota 4
 Others Sangram 45 Male Above 50000 Toyota 3
CONTINUOUS: Ratio
Sat. rating  Highly Dissatisfied (1) Ram 75 Male & Interval
10000 to 25000 Tata 5
 Dissatisfied (2) Kavita 55 Female 25000 to 50000 Honda
√ Neither satisfied Nor dissatisfied Shriram 25 Male Above 50000 Others 4
(3) Rahul 85 Male 10000 to 25000 Honda 1
 Satisfied (4)
Sanjay 55 Male 25000 to 50000 Hundai 5
 Highly Satisfied (5)
Sangita 55 Female 10000 to 25000 Suzuki 4
Data cleaning and Missing value handling

 First stage in any data analysis is Data cleaning and Planning for Missing data
handling

Data cleaning

respondent age gender incgrp make csat


Manoj 55 Male 5000 to 10000 Hundai 2
Sonali 45 Female 25000 to 50000 Honda 4
Krishna 45 Male 25000 to 50000 Suzuki 5
Ganesh 350 Male 5000 to 10000 Suzuki 2
Sonia 55 Female Above 50000 Hundai 5
Rupali 45 Female 25000 to 50000 Toyota 4
Sangram 45 Male Above 50000 Toyota 3
Ram 75 Male 10000 to 25000 Tata 5
Kavita 55 Female 25000 to 50000 Honda
Shriram 25 Male Above 50000 Others 4
Rahul 85 Male 10000 to 25000 Honda 1
Sanjay 55 Male 25000 to 50000 Hundai 5
Sangita 55 Female 10000 to 25000 Suzuki 4

Public
Data cleaning and Missing value handling

 After data cleaning and missing value handling, our data should look like this

Missing value handling

respondent age gender incgrp make csat


Manoj 55 Male 5000 to 10000 Hundai 2
Sonali 45 Female 25000 to 50000 Honda 4
Krishna 45 Male 25000 to 50000 Suzuki 5
Ganesh 35 Male 5000 to 10000 Suzuki 2
Sonia 55 Female Above 50000 Hundai 5
Rupali 45 Female 25000 to 50000 Toyota 4
Sangram 45 Male Above 50000 Toyota 3
Ram 75 Male 10000 to 25000 Tata 5
Kavita 55 Female 25000 to 50000 Honda
Shriram 25 Male Above 50000 Others 4
Rahul 85 Male 10000 to 25000 Honda 1
Sanjay 55 Male 25000 to 50000 Hundai 5
Sangita 55 Female 10000 to 25000 Suzuki 4

Public
Interval measure, what is a better option?

 We are in general satisfied with services provided by XYZ company.

1 2
Option 1 NO YES
1 2 3
Option 2 NO MAY BE YES
1 2 3 4 5
Option 3 HD D NSND S HS
1 2 3 4 5 6 7
Option 4 ED HD D NSND S HS ES

Public
“If you think adventure is dangerous
Exploratory Data Analysis try routine, it’s lethal.”

& - Paulo Coelho


Visual Analytics Brazilian lyricist & novelist, best know
for his book ‘The Alchemist’
Lab test
Exploratory Data Analysis
Exploratory data analysis: NOMINAL measure

 How will you analyse NOMINAL measure GENDER?

#table of gender
table(df$gender)

Female Male
59 105

#pieplot of gender
pie(table(df$gender))

Public
Exploratory data analysis: NOMINAL measure

 How will you analyse NOMINAL measure MAKE?

#table of make
table(df$make)

Honda Hundai Others Suzuki Tata


Toyota
20 40 7
68 17 12

#pieplot of make
pie(table(df$make))

Public
Exploratory data analysis: ORDINAL measure

 How will you analyse ORDINAL measure Income Group

#table of IncGrp
table(df$IncGrp)

#barplot of IncGrp
barplot(table(df$IncGrp))

Public
Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56
31
44
25
60
46
35
35
41
45
43
55
32
45
38
38
51
47




… Public
Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56
31
44
25
60
46
35
35
41
45
43
55
32
45
38
38
51
47



Raw…(Ungrouped)
… Public
Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56 Age gp Count
31
44 (Upto 10] 4
25 (10 to 20] 20
60
46 (20 to 30] 42
35 (30 to 40] 52
35
41 (40 to 50] 29
45 (50 to 60] 11
43
55 (60 to 70] 5
32 (70 to 80] 1
45
38 (80 to 90]
38 51
51
47
… Grouped Data


Raw…(Ungrouped)
… Public
Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56 Age gp Count
31
44
(Upto 10] 4
25 (10 to 20] 20
60
(20 to 30] 42
46
35 (30 to 40] 52
35
(40 to 50] 29
41
45 (50 to 60] 11 #histogram of
43
55
(60 to 70] 5 age
32 (70 to 80] 1 hist(df$age)
45
(80 to 90]
38
38 51
51 RANDOM VARIABLE
47
… Grouped Data Data Distribution


Raw…(Ungrouped)
Public

Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56 Age gp Count
31
44 (Upto 10] 4
25 (10 to 20] 20
60
46 (20 to 30] 42
35 (30 to 40] 52
35
41 (40 to 50] 29
45 (50 to 60] 11
43
55 (60 to 70] 5
32 (70 to 80] 1
45
38 (80 to 90]
38 51
51 RANDOM VARIABLE
47
… Grouped Data Data Distribution


Raw…(Ungrouped)
… Public
Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56 Age gp Count
31
44
(Upto 10] 4
25 (10 to 20] 20 mean(df$age)
60 median(df$age)
(20 to 30] 42
46
35 (30 to 40] 52
sd(df$age)
35
(40 to 50] 29 var(df$age)
41
45 (50 to 60] 11
43 library(psych)
(60 to 70] 5
55 skew(df$age)
32 (70 to 80] 1 kurtosi(df$age)
45 range(df$age)
(80 to 90]
38 min(df$age)
38 51 max(df$age)
51 RANDOM VARIABLE
47 quantile(df$age, probs
… Grouped Data Data Distribution = c(.1,.25,.50,.75,.99))


Raw…(Ungrouped)
Public

Exploratory data analysis: SCALE measure

 How will you analyse SCALE measure AGE?

Age
56 Age gp Count
31
44
(Upto 10] 4
25 (10 to 20] 20
60
(20 to 30] 42
46
35 (30 to 40] 52
35
(40 to 50] 29
41
45 (50 to 60] 11
43
(60 to 70] 5
55
32 (70 to 80] 1
45
(80 to 90]
38
38 51
51 RANDOM VARIABLE
47
… Grouped Data Data Distribution


Raw…(Ungrouped)
Public

Exploratory data analysis: SCALE measure

Normal
data
Skewness = 0
Kurtosis = 0

-ve +ve
skewness skewness

+ve -ve
kurtosis Kurtosis
Public
Exploratory data analysis: RATIO measure

 How will you analyse SCALE measure like AGE?

mean(df$age)
median(df$age)

sd(df$age)
var(df$age)

library(psych)
skew(df$age)
kurtosi(df$age)
range(df$age)
min(df$age)
max(df$age)
quantile(df$age, probs = c(.1,.25,.50,.75,.99))

summary(df$age)

Min. 1st Qu. Median Mean


3rd Qu. Max.
18.00 34.00 44.00 43.72 51.25 85.00
Public
Exploratory data analysis: INTERVAL measure

 Create constructs

NSN
Construct for CS ED HD D D S HS ES
1 I am happy with mileage of my car 1 2 3 4 5 6 7
2 My car gives high comfort while driving 1 2 3 4 5 6 7
3 My car is a great value-for money 1 2 3 4 5 6 7
4 My car is low on maintenance 1 2 3 4 5 6 7

Construct for CL
1 I will go for the same brand for next purchase 1 2 3 4 5 6 7
I will be happy to recommend this model to my friends &
2 relatives 1 2 3 4 5 6 7
3 I will discuss about this model on social media 1 2 3 4 5 6 7
I am happy to participate in any social event organised by this
4 brand 1 2 3 4 5 6 7

Public
Exploratory data analysis: INTERVAL measure

 Create constructs

I am happy with mileage of my car

Customer My car gives high comfort while


driving
Satisfactio
n My car is a great value-for money

My car is low on maintenance

I will go for the same brand for next purchase

I will be happy to recommend this model to my friends &


Customer relatives
Loyalty I will discuss about this model on social media

I am happy to participate in any social event organised by this


brand
Public
Exploratory data analysis: cross of SCALE & CAT

 Cross between mean Customer Satisfaction & Make?

#make wise CSAT


mwa <- aggregate(csat~make, FUN
= mean, data = df)

#barplot
barplot(csat~make, data = mwa)

Public
Exploratory data analysis: cross of SCALE & SCALE

 Cross between Customer Satisfaction and Age

#scatter plot
plot(loy~csat, data = df)

Public
Exploratory data analysis: cross of CAT & CAT

 Cross between Income Region & Gender

#table
table(df$gender,df$make)
Honda Hundai Others
Suzuki Tata Toyota
Female 6 11 2
32 1 7
Male 14 29 5
36 16 5

#barplot
barplot(table(df$gender,df$make))

Public
“Data is the new oil.”

Normal Distribution
— Clive Humby
Normal distribution

 Normal distribution was first described by Abraham Demoivre (1667-1754) in 1733.


it was rediscovered by Gauss in 1809 and Laplace in 1812
 It relate to the occurrence of distinct events

Public
Normal distribution

Public
Normal distribution

Public
Normal distribution

Public
Normal distribution

Public
Normal distribution

Public

You might also like