0% found this document useful (0 votes)
28 views20 pages

Unit 3

This document discusses measures of central tendency including mean, median, and mode. It provides definitions and formulas for calculating each. It also discusses other related statistical concepts like variance, standard deviation, weighted mean, and covariance. The key points are: - Mean, median, and mode are the three main measures used to describe the central tendency or typical value of a data set. - Mean is the average value calculated by adding all values and dividing by the number of values. Median is the middle value when values are sorted in order. Mode is the most frequent value. - Other concepts discussed include variance (a measure of how spread out values are from the mean), standard deviation (the square root of variance),

Uploaded by

NamanJain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views20 pages

Unit 3

This document discusses measures of central tendency including mean, median, and mode. It provides definitions and formulas for calculating each. It also discusses other related statistical concepts like variance, standard deviation, weighted mean, and covariance. The key points are: - Mean, median, and mode are the three main measures used to describe the central tendency or typical value of a data set. - Mean is the average value calculated by adding all values and dividing by the number of values. Median is the middle value when values are sorted in order. Mode is the most frequent value. - Other concepts discussed include variance (a measure of how spread out values are from the mean), standard deviation (the square root of variance),

Uploaded by

NamanJain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Business Statistics

Chapter 3
Central Tendency
• Any set of Data follows a distribution – for example the age of
students in a class
• However, there always exists a Central Tendency.
• For example, the age in Class 12 of a class may vary between 18
and 20 but the central point may be 19.
• Mean, Median and Mode are the three measures of central
tendency.
• Mean is the arithmetic average of a data set. This is found by
adding the numbers in a data set and dividing by the number of
observations in the data set.
Central Tendency

• The Median is the middle number in a data set determined by


sorting the numbers in ascending or descending order.

• The Mode is the value that occurs the most often in a data set

• The Range is the difference between the highest and lowest


values in a data set
Mean
• The Mean is also known as the average of a Data Set

• In other words, the sum of the numbers divided by the


number of observations µ=ΣX/N, where X= the individual
values and N=Number of observations

• If we consider the numbers 13, 18, 13, 14, 13, 16, 14, 21, 13
,
the Mean=13+18+13+14+13+16+14+21+13=135/9=15

• Thus, the set of observations gravitate centrally towards the


number 15 which is the mean.
Median

• Let us sort the numbers shown in the sheet of the previous slide
in ascending order:

• 13,13,13,13,14,14,16,18,21

• The middle-point of 9 observations is 5th observation(9/2=4.5,


rounded to 5)

• Median is thus 14.


Percentiles
• Median is the central point of the distribution, but percentiles are
the 1st quarter and 3rd quarters of the distribution.
• For example, a company may decide to pay their employees a
mean which is identical of the 3rd quartile of industry in order to
attract the best talent.
• The first quartile, or 25th percentile (Q1), the second quartiles or
50th percentile(Q2), (same as Median) ,the third quartiles or
75th percentile(Q3).
• The formula for finding out the position of the data as their
percentile is i= (p/100)*n , where p is percentile, n is the sample
size.
• The 75th percentile of the above data set
i= (75/100)X9= 6.75 (7th position data) = 16
Median where the data is
distributed into classes
In case of frequency distribution, the formula for calculating
the median is Median = l+h * ((n/2-Cf)/f)
Here
l= lower limit of the median class
h= length of the class interval -1
n= total frequency of the observed data
Cf= cumulative frequency of the previous class to the median.
f= frequency of the median class.
The above formula calculates the middle of the median class
This is best understood with an example.
Median where the data is
distributed into classes No. of share
No. of Months of holding cf
holders

1 Less than 2 4 4
2 2 to 4 7 11
3 4 to 6 10 21
4 6 to 8 12 33
5 8 to 10 14 47
6 10 to 12 6 53
7 12 to 14 15 68
8 14 to 16 13 81
9 16 to 18 1 82
Total 82

Each of the line items belong to a class. There are 9 classes and hence the mid-point
=4.5 (taken as 5)
Lower limit = l =8 ; h=2 ; n=82; cf=33; f= 14
Therefore, using the formula, the Median
=8+2*((41-33)/14)=8+16/14= 9.1(approx.)
Mode
• The number 13 occurs 4 times and since this is the maximum, the
node is 13
• Mode is the most frequently occurring number in a set.
• It is also used to depict the central tendency of the distribution
• At times the Mode is not located centrally, and hence may not
represent the central point
This is however not very frequent.
• In the previous example, it is close to both the Median and Mean
and thus represents
the Central Tendency to a large extent..
Weighted Mean
• The weighted mean is a type of mean that is calculated by
multiplying the weight (or probability) associated with a
particular event or outcome with its associated quantitative
outcome and then summing all the products together.

• It is very useful when calculating a theoretically expected


outcome where each outcome has a different probability of
occurring, which is the key feature that distinguishes the
weighted mean from the arithmetic mean.

• For example, if an item sells 100 units and is priced at 40/,


another item sells 82 units and priced at 30/ and a 3rd item is
priced at 25/ and sells 91 units, the weighted average =
(100*40+82*30+91*25)/273=31.99
Use of Weighted Mean

• Where there are multiple occurrences of an item, the number of


occurrences are taken as the weight and the values are multiplied
by each weight.
• In a stock market scenario, we calculate the expected value of a
Stock by multiplying the probability of the stock attaining a price
x , the probability of the same stock attaining a price y and so on
and summing them
• The probabilities may be considered by taking the number of days
in a year the stock was sold at that value divided by 365.
• The expected value will then provide the investor an indication of
his returns
Variance of a Data-Set
• We now need to also study the variation of the Data in a Data-Set.
• Range is the simplest measure – the difference between the largest
and smallest value . For example in Slide Number 8, the Range=21-
13=8
• Variance is a measure of the variability or spread around the Central
Mean
• It measures how far each number in the set is from the mean and
thus from every other number in the set.
• Itis often depicted by this symbol: σ 2.
• The square root of the variance is the standard deviation (σ), which
helps determine the consistency of an investment's returns over a
period of time and generally used because it is in the same unit of
measurement as the data.
• Variance is used by both analysts and traders to determine the
volatility and market security of stocks .
The significance of Standard Deviation
• Standard Deviation is a numerical measure of the variation of
individual values from the mean. A low standard deviation
indicates that the values tend to be close to the mean (also
called the expected value), while a high standard deviation
indicates that the values are spread out over a wider range of
variation or dispersion of a set of values

• Variance is the square of SD.

• SD is algebraically simpler and, since expressed in the same


unit as the data, is a better measure.
The formulae for Variance and
Standard Deviation

where X= each individual element in the distribution, xˉ is the mean,


and n=number of observations
An example of Standard Deviation
• A person Vivek posts on Facebook every day. Following are
number of likes for his posts in a particular week.
• Elements are 1,2,3,3,6,10
• Mean=(1+2+3+3+6+10)/6=4.17
• Numerator of SD(X) = (1-4.17)2 +(2-4.7)2 +(3-4.17)2+(3-
4.17)2+(6-4.17)2+(10-4.17)2 = 54.83
• Denominator of SD(Y) =6-1=5
• Variance= X/Y=54.83/5=10.97
• Standard Deviation=√10.97=3.31
Covariance Matrix
• Covariance Matrix is a measure of how much two random variables
change together.
• Used to compute the covariance in between every column of a
data matrix.
• The Covariance Matrix is also known as dispersion matrix and
variance-covariance matrix.
• The covariance between two jointly distributed real-valued
random variables X and Y is defined as :
Cov(X, Y) = Covariance of corresponding scores in the two sets of
data
• This parameter measures the product of variances and not the
relationship between the variables
Formulae for Covariance Matrix

Cov(x and y)=

Where :
n = Number of scores in each set of data
X(bar) = Mean of the n scores in the first data set
Xi = ith raw score in the first set of scores
Y(bar) = Mean of the N scores in the second data set
Yi = ith raw score in the second set of scores
Covariance between Economic
growth and stock market indices

Economic Growth % NIFTY


2.1 8
2.5 12
4.0 14
3.6 10

x = 2.1, 2.5, 4.0, and 3.6 (economic growth)


y = 8, 12, 14, and 10 (NIFTY)
x̄ = 3.1 and ȳ = 11
Covariance between Economic
growth and stock market
indices
We calculate the relevant values in the table below.

xi yi xi – x̄ yi – ȳ
2.1 8 -1 -3
2.5 12 -0.6 1
4.0 14 0.9 3
3.6 10 0.5 -1

Now summing each product of the 3rd and 4th rows of the table above and
dividing by n-1 which is equal to 3, we arrive at the covariance which is 1.5 and
hence it has a positive covariance i.e., they move together in the same
direction.
Coefficient of Correlation
The coefficient of correlation provides a measure of the relative strength of the
linear relationship between numerical variables.
The sample’s coefficient of correlation is represented by the symbol ‘r’ , which
range from -1 for perfectly negative correlation to +1 for the perfectly positive
correlation
It is calculated as r=
to measure the total variances of the products of the variables divided by the
product of individual variances.
In the previous example, the standard deviations of the 2 variables (Economic
growth and Stock Market indices) are calculated as 0.89 and 2.58 respectively
Hence, r= 1.5/(0.89*2.58) = 0.64 which indicates a strong correlation, if not very
strong.

You might also like