02 Stats Revision
02 Stats Revision
Business Statistics
Case-let
Candidat Candidat
e1 e2
Skill
60 90
1
Skill
60 75
2
Skill
Mean 60 60 63 60
3
Skill
60 37
4
Skill
60 35
5
https://pxhere.com/en/photo/1446003
You want to select one of the two candidates for a job in your company based on scores in 5
equally important skill sets. How are they same? How are they different from each other?
Public
Case-let
Population
Sample
Public
What is business statistics & Business analytics?
Collection
Organization
Presentation
Business statistics, is the Science of
Analysis collection, organization, presentation,
Interpretation analysis and interpretation of
numerical data; and; using the
combination of skills, technologies,
applications and processes to gain
insight in to businesses, based on data
and statistics.
Public
Measures of Central
Tendency
Mean
Median Measures of
Mode
Dispersion
Standard Deviation
… Variance
… Range
Interquartile Range
Data Distribution
Skewness
Kurtosis
….
Measures of Central Tendency
𝑀𝑒𝑎𝑛 ( 𝑥 )=
∑ 𝑥
𝑁
49 For even Sales (mil
55 numbers $)
Q1
( )
𝑡h 55
𝑛+1 63 median is; 55
𝑀𝑒𝑑𝑖𝑎𝑛= 63 Q2 Mean of
2 63
71 Q3 (n/2)th and 63
71 (n/2+1)th 49
76
item
55
𝑀𝑜𝑑𝑒=𝑚𝑜𝑠𝑡 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑛𝑜 . 71
63
71
63
76
Public
Measures of Dispersion
Standard Deviation
Variance
Range
Standard deviation of
Interquartile Range population
Data Distribution 49 Sales (mil
Skewness
55
Q1
$)
55
Kurtosis
63
55
63 Q2
63
63
71 Q3
71
76 49
55
71
63
71
63
76
Public
Measures of Dispersion
Standard deviation of
Sales (mil population
Mean x-µ (x-µ)2
$)
55 62.889 -7.88 62.239
63 62.889 0.111 0.012
49 62.889 -13.888 192.901 = 8.30
55 62.889 -7.889 62.234 = SQRT(68.98)
71 62.889 8.111 65.790
63 62.889 0.111 0.012
71 62.889 8.111 65.790
63 62.889 0.111 0.0123 = 68.987
76 62.889 13.111 171.901
= 620.888/9
620.888 Public
“The most technologically efficient
machine that man has ever invented is
Software installations the book.”
- Northrop Frye
Data analysis software: R-programming
There are many software used by data scientists for statistical data analysis,
IBM SPSS
SAS
Python Programming
R-programming
Public
Lab
Install R-programming software
Install R-studio software
Introduction to R-programming
BASE commands
Read file in R-studio
“What gets measured,
gets managed.”
Levels of Measurement
- Peter Drucker
American Management
Guru (1909 – 2005)
Levels of Measurement
Public
You are working for a automobile
manufacture. You wants to study what car
customers think of their car and more….
Customer satisfaction Survey: Classificatio Natural
Order Distance
n Origin
What is your name._______________________
Nomina No No No
Your age: ______ Yes
l
Gender Ordinal Yes Yes No No
√ Male
Female Interval Yes Yes Yes No
Ratio Yes
Nomina Yes Yes Yes
Nomina
Income gr. Below 5000 String Ratio Ordinal Interval
l l
√ 5001 to 10000
10001 to 25000
25001 to 50000 respondent age gender incgrp make csat
50001 and above Manoj 55 Male 5000 to 10000 Hundai 2
Car Make: Suzuki Sonali 45 Female 25000 to 50000 Honda 4
Krishna 45 Male 25000 to 50000 Suzuki 5
√ Hundai
Tata Ganesh 350 Male CATEGORICAL:
5000 to 10000 Suzuki 2
Honda Sonia 55 Female
NominalAbove 50000
& Ordinal Hundai 5
Toyota Rupali 45 Female 25000 to 50000 Toyota 4
Others Sangram 45 Male Above 50000 Toyota 3
CONTINUOUS: Ratio
Sat. rating Highly Dissatisfied (1) Ram 75 Male & Interval
10000 to 25000 Tata 5
Dissatisfied (2) Kavita 55 Female 25000 to 50000 Honda
√ Neither satisfied Nor dissatisfied Shriram 25 Male Above 50000 Others 4
(3) Rahul 85 Male 10000 to 25000 Honda 1
Satisfied (4)
Sanjay 55 Male 25000 to 50000 Hundai 5
Highly Satisfied (5)
Sangita 55 Female 10000 to 25000 Suzuki 4
Data cleaning and Missing value handling
First stage in any data analysis is Data cleaning and Planning for Missing data
handling
Data cleaning
Public
Data cleaning and Missing value handling
After data cleaning and missing value handling, our data should look like this
Public
Interval measure, what is a better option?
1 2
Option 1 NO YES
1 2 3
Option 2 NO MAY BE YES
1 2 3 4 5
Option 3 HD D NSND S HS
1 2 3 4 5 6 7
Option 4 ED HD D NSND S HS ES
Public
“If you think adventure is dangerous
Exploratory Data Analysis try routine, it’s lethal.”
#table of gender
table(df$gender)
Female Male
59 105
#pieplot of gender
pie(table(df$gender))
Public
Exploratory data analysis: NOMINAL measure
#table of make
table(df$make)
#pieplot of make
pie(table(df$make))
Public
Exploratory data analysis: ORDINAL measure
#table of IncGrp
table(df$IncGrp)
#barplot of IncGrp
barplot(table(df$IncGrp))
Public
Exploratory data analysis: SCALE measure
Age
56
31
44
25
60
46
35
35
41
45
43
55
32
45
38
38
51
47
…
…
…
…
… Public
Exploratory data analysis: SCALE measure
Age
56
31
44
25
60
46
35
35
41
45
43
55
32
45
38
38
51
47
…
…
…
Raw…(Ungrouped)
… Public
Exploratory data analysis: SCALE measure
Age
56 Age gp Count
31
44 (Upto 10] 4
25 (10 to 20] 20
60
46 (20 to 30] 42
35 (30 to 40] 52
35
41 (40 to 50] 29
45 (50 to 60] 11
43
55 (60 to 70] 5
32 (70 to 80] 1
45
38 (80 to 90]
38 51
51
47
… Grouped Data
…
…
Raw…(Ungrouped)
… Public
Exploratory data analysis: SCALE measure
Age
56 Age gp Count
31
44
(Upto 10] 4
25 (10 to 20] 20
60
(20 to 30] 42
46
35 (30 to 40] 52
35
(40 to 50] 29
41
45 (50 to 60] 11 #histogram of
43
55
(60 to 70] 5 age
32 (70 to 80] 1 hist(df$age)
45
(80 to 90]
38
38 51
51 RANDOM VARIABLE
47
… Grouped Data Data Distribution
…
…
Raw…(Ungrouped)
Public
…
Exploratory data analysis: SCALE measure
Age
56 Age gp Count
31
44 (Upto 10] 4
25 (10 to 20] 20
60
46 (20 to 30] 42
35 (30 to 40] 52
35
41 (40 to 50] 29
45 (50 to 60] 11
43
55 (60 to 70] 5
32 (70 to 80] 1
45
38 (80 to 90]
38 51
51 RANDOM VARIABLE
47
… Grouped Data Data Distribution
…
…
Raw…(Ungrouped)
… Public
Exploratory data analysis: SCALE measure
Age
56 Age gp Count
31
44
(Upto 10] 4
25 (10 to 20] 20 mean(df$age)
60 median(df$age)
(20 to 30] 42
46
35 (30 to 40] 52
sd(df$age)
35
(40 to 50] 29 var(df$age)
41
45 (50 to 60] 11
43 library(psych)
(60 to 70] 5
55 skew(df$age)
32 (70 to 80] 1 kurtosi(df$age)
45 range(df$age)
(80 to 90]
38 min(df$age)
38 51 max(df$age)
51 RANDOM VARIABLE
47 quantile(df$age, probs
… Grouped Data Data Distribution = c(.1,.25,.50,.75,.99))
…
…
Raw…(Ungrouped)
Public
…
Exploratory data analysis: SCALE measure
Age
56 Age gp Count
31
44
(Upto 10] 4
25 (10 to 20] 20
60
(20 to 30] 42
46
35 (30 to 40] 52
35
(40 to 50] 29
41
45 (50 to 60] 11
43
(60 to 70] 5
55
32 (70 to 80] 1
45
(80 to 90]
38
38 51
51 RANDOM VARIABLE
47
… Grouped Data Data Distribution
…
…
Raw…(Ungrouped)
Public
…
Exploratory data analysis: SCALE measure
Normal
data
Skewness = 0
Kurtosis = 0
-ve +ve
skewness skewness
+ve -ve
kurtosis Kurtosis
Public
Exploratory data analysis: RATIO measure
mean(df$age)
median(df$age)
sd(df$age)
var(df$age)
library(psych)
skew(df$age)
kurtosi(df$age)
range(df$age)
min(df$age)
max(df$age)
quantile(df$age, probs = c(.1,.25,.50,.75,.99))
summary(df$age)
Create constructs
NSN
Construct for CS ED HD D D S HS ES
1 I am happy with mileage of my car 1 2 3 4 5 6 7
2 My car gives high comfort while driving 1 2 3 4 5 6 7
3 My car is a great value-for money 1 2 3 4 5 6 7
4 My car is low on maintenance 1 2 3 4 5 6 7
Construct for CL
1 I will go for the same brand for next purchase 1 2 3 4 5 6 7
I will be happy to recommend this model to my friends &
2 relatives 1 2 3 4 5 6 7
3 I will discuss about this model on social media 1 2 3 4 5 6 7
I am happy to participate in any social event organised by this
4 brand 1 2 3 4 5 6 7
Public
Exploratory data analysis: INTERVAL measure
Create constructs
#barplot
barplot(csat~make, data = mwa)
Public
Exploratory data analysis: cross of SCALE & SCALE
#scatter plot
plot(loy~csat, data = df)
Public
Exploratory data analysis: cross of CAT & CAT
#table
table(df$gender,df$make)
Honda Hundai Others
Suzuki Tata Toyota
Female 6 11 2
32 1 7
Male 14 29 5
36 16 5
#barplot
barplot(table(df$gender,df$make))
Public
“Data is the new oil.”
Normal Distribution
— Clive Humby
Normal distribution
Public
Normal distribution
Public
Normal distribution
Public
Normal distribution
Public
Normal distribution
Public
Normal distribution
Public