0% found this document useful (0 votes)
105 views36 pages

Basic Statistics II

This document discusses various numerical descriptive measures including measures of central tendency (mean, median, mode) and variation. It provides formulas and examples for calculating the mean, median, mode, and weighted mean. The mean can be affected by outliers while the median is not. The document also discusses calculating central tendency measures for grouped data and choosing the appropriate measure based on the data.

Uploaded by

Abhijit Ash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views36 pages

Basic Statistics II

This document discusses various numerical descriptive measures including measures of central tendency (mean, median, mode) and variation. It provides formulas and examples for calculating the mean, median, mode, and weighted mean. The mean can be affected by outliers while the median is not. The document also discusses calculating central tendency measures for grouped data and choosing the appropriate measure based on the data.

Uploaded by

Abhijit Ash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Basic Statistics II

Dr. Sujeet Kumar Singh


Assistant Professor, IIM Jammu
Jammu 180016
Email: [email protected]
Numerical Descriptive Measures
The Central Tendency is the extent to which all the data values group around a typical or
central value.
The variation is amount of dispersion or scattering of data points
The shape/skewness is the pattern of distribution of values from the lowest value to the
highest value
Measure of Central Tendency: The Mean
 The arithmetic mean(AM) (often just called as the mean or the average) is the most
common central tendency
 For a sample(population) of size n, with elements,
 The AM denoted by is given by
Read as i-th
Sigma observation

Read as
X-bar
=

Sample size
 AM=Sum of the values divided by the no of the values
Measure of Central Tendency: The Mean(continued)
 AM is affected by extreme values (outliers)-Disadvantage
 Data 1- 1, 2, 3, 4, 5
Mean= =3
 Data 2- 1, 2, 3, 4, 20

Mean= =6

Mean for grouped data


• Formula for the mean for grouped data is given by =
• Where =Mean

• =sum of cross products of frequency of each class with mid point X of each class
• n=total no. of observation= =total frequency
 Solution
Lower Upper Midpoints(x) Frequency(f) fx
0 1 0.5 1 0.5
1 2 1.5 4 6
2 3 2.5 8 20
3 4 3.5 7 24.5
4 5 4.5 3 13.5
5 6 5.5 2 11
Total 25 75.5

Mean 3.02
Example of Mean

Class Interval Class Interval


Lower Upper f Lower Upper f x fx
0 50 78
0 50 78 25 1950
50 100 123
50 100 123 75 9225
100 150 187
100 150 187 125 23375
150 200 82
150 200 82 175 14350
200 250 51
200 250 51 225 11475
250 300 47
300 350 13 250 300 47 275 12925

350 400 9 300 350 13 325 4225

400 450 6 350 400 9 375 3375

450 500 4 400 450 6 425 2550


Total 600 450 500 4 475 1900

Total 600 85350

Mean 142.25
Weighted Mean
• A weighted mean is a kind of average. Instead of each data point contributing equally to the
final mean, some data points contribute “more weights” than others.
• To calculate the average, that takes into account the importance of each value to the overall.
• Let are the data points with weights . Then the weighted average mean is given by
(

• Find average cost of labour per hour for each of the product

Grade of labour Hourly wage Labours hrs Labours hrs


Product 1 Product 2
Unskilled 4 2 4
Semiskilled 6 3 3
Skilled 8 5 2
Weighted Mean

• Simple Arithmetic Mean=(4+6+8)/3=6,


using this labour cost of 1 unit of product 1 to be=6(2+3+5)=60
that for product 2=6(4+3+2)=54

• Weighted Avg cost of labour per hour for product1=(4*2+6*3+8*5)/10=6.6


• Wtd Avg cost of labour per hour for product 2=(4*4+6*3+8*2)/9=5.55
• Labour cost per unit product 1=66 and product 2=50
Geometric Mean
• Sometimes when dealing with quantities that change over a period of time. We need to
know an average rate of change. In such cases AM is inappropriate.
• The GM for quantities is given by
=()^(1/n)
Example, Rs 100 deposited in a saving account

GM==(718)^(1/5)=10.388
Year Interest Rate(%) Return at the end of year
1 7 107
2 8 115.56
3 10 127.12
4 12 142.37 Return with agg rate
5 18 168
GM 10.388 163.91

It can not be computed for grouped data


Harmonic Mean
• The HM for quantities is given by
((
HM for 2 and 3 is
HM for two quantities is given by
HM is used when data are more dispersed.

Can not compute in case of grouped data


Disadvantages of Mean
It may be affected by extreme values
Tedious to compute in case of large data
Can not compute in case of open class
Can not compute in case of categorical data

Relation

𝐴𝑀 ≥𝐺𝑀 ≥ 𝐻𝑀
Measure of Central Tendency: The Median
In an ordered array the middle number is median i.e., 50% data are above and 50% below
 Data 1- 1, 2, 3, 4, 5 Median=3

 Data 2- 1, 2, 3, 4, 20 Median=3

It is not affected by outliers


Arrange the numbers in ascending order, the middle entry is the median
If the number of values are odd i.e. n is odd, median=
If n is even, median= Avg of(
Note that these are positions of median but not the median
Median for grouped data
The median for grouped data is given
Median=
 L=Lower limit of the median class
 n=Total no of observations=
 m= Cumulative frequency preceding the median class
 f= frequency of median class
 c= class interval for median class
Median for grouped data
 Find the median for the following continuous frequency distribution
Class Frequency Class Frequency CF Median
0-100 5 5
100-200 8 13
Class
0-100 5 200-300 4 17
300-400 6 23
100-200 8 400-500 2 25
500-600 11 36
200-300 4
600-700 5 41
300-400 6 Total 41

400-500 2 Median Position 21st positon

500-600 11
600-700 5

 Median=
Median
Advantages
• Not affected by extreme values
• Can be computed in case of open class
• Can be computed in case of categorical data
Disadvantage
• Arraying of data is time consuming
• To estimate population parameter mean is easier
Measure of Central Tendency: The Mode
Value that occurs most often
• Not affected by extreme values
• Can be computed in case of categorical data or numerical data
There may be no mode
• There may be several modes
Mode for grouped data
Mode=
where =lower limit of modal class
=frequency of the modal class Mode=
=frequency preceding the modal class
=frequency succeeding the modal class
c= class interval of the modal class
Class Frequency
0-100 5
100-200 8
200-300 4
300-400 6
Modal
400-500 2 Class
500-600 11
600-700 5
Measure of Central Tendency: Which
measure to use?
• The mean is generally used unless extreme value(s) exists
• Median is used, when there is outliers in data
• In some situations it make sense to use both the mean and the median
Read as
Sigma
Interfractile Range

Example-Quartiles, Deciles, Percentiles


• We have 4 quartiles of a data set Q1, Q2, Q3, Q4
• Interquartile range(IQR)=Q3-Q1
Measure of Variation: The Variance
Measure of Variation: The Standard Deviation
Example
Data(- 10, 12, 14, 15, 17, 18, 18, 24
Mean
2
𝛴 𝑓 ( X− 𝑋)
2
S =
𝑛 −1

Mid ponits Return of funds


Lower Limit Upper Limit X f fx (x-x ̅) f(x-x ̅)^2
5 10 7.5 10 75 -9.83333 966.9444
10 15 12.5 12 150 -4.83333 280.3333
15 20 17.5 16 280 0.166667 0.444444
20 25 22.5 14 315 5.166667 373.7222
25 30 27.5 8 220 10.16667 826.8889
Total 60 1040 2448.333
Mean 17.33333333

Variance 41.49717514
SD 6.441830108
Measure of Variations: Comparing Standard
Deviations
Formulae: (Mean-Mode)/SD, (Mean-Median)/SD,
Locating Extreme Outliers: Z-score
• Example-
Relation between two numerical variables
1. Covariance
2. Coefficient of correlation
Features of the coefficient of correlation
Interpreting the coefficient
Test Score #1 (X) Test Score #2(Y)
78 82 100
92 88
86 91 95
95 90
83 92 90

Test Score #2(Y)


85 95
91 89 85
86 81
80
89 96
88 92 75

70
70 75 80 85 90 95 100
  Test Score #1 Test Score #2
Test Score #1 1 Test Score #1(x)
Test Score #2 0.293643524 1

r=0.293, there is positive relationship between Test


Scores 1 and 2. Students who scored high on the first
test tended to score high on the second test

You might also like