Basics of Stats
Basics of Stats
Session 1
Introduction to statistics – What & Why
Scope of statistics and its applications in
marketing & managerial decision making
Demand analysis
Customer satisfaction/preference analysis
Sales and advertising analysis
Relationship between variables
Trend forecasting
Books….
Statistics for Management by Richard I
Levin & David S Rubin,Pearson
publication 7th Edition
Business statistics by S P Gupta
Definition
Statistics is a standard method for
collecting, organizing, summarizing,
presenting, and analyzing data for drawing
conclusions and making decisions based
upon the analyses of these data.
Statistics are used extensively by
engineers, managers, govt, businessmen,
etc throughout the world.
Functions of Statistics
Presents facts in a definite form
Simplifies mass of figures
Facilitates comparison
Helps in formulating and testing
hypothesis
Helps in prediction
Helps in the formulation of suitable
policies.
Is single statistical figure
conclusive ?
Descriptive Statistics
Descriptive Statistics include the techniques
that are used to summarize and describe
numerical data. These methods can either
be graphical or involve computational
analysis.
Inferential Statistics
Inferential statistics include those techniques by
which decisions about a statistical population
are made based on a sample having been
observed, or possibly, by the use of managerial
judgments. Because such statements are made
Under conditions of
uncertainty, the use
of probability concepts
is required.
Discrete Variable
Discrete variable can have only observed values at
isolated points along a scale of values. In business
statistics, such data typically occur through the
process of counting; hence the values generally
are expressed as integers (whole numbers only).
Continuous Variable
A continuous variable can assume a value at any
fractional point along a specified interval of
values. Continuous data are generated by the
process of measuring.
Working on data….
Data Collection
Primary
Secondary
Data classification
Frequency distribution
Presentation of data
Populations and Samples
A population is a complete set of all of
the possible instances of a particular
object
for example, students in this College.
A sample is a subset of the population
for example, any one of the classes.
We use samples to draw conclusions
about the parent population.
Measures of Central Tendency
If you have to declare a single value to
represent a population or a sample, what do
you use?
The most common value is the mean, also
called the average or the expected value.
Another common value is the mode or the
most likely (most common) value.
Another value is the median or the “middle”
of the data set.
Measures of Central Tendency
Mean
This is the mathematical average of a set of numbers
Median
This is the middle value of a set of data that has been arranged
from lowest to highest
Mode
The value that occurs the most in a set of data
We can use expenditure as a good way of discussing these
three measures. If we wanted to know the average
expenditure of NIFT students. Imagine that we took a
random sample of monthly expenditure of NIFT students.
What is the Mean?
The mean is the sum of all of the
values in the data set divided by the
number of values.
The equation for calculating the mean is
the same for both samples and
populations.
x
x
n
Mean
Sample Mean
n
1
x
n
x
i 1
i
Where:
X-bar is the mean
xi are the data points
n is the sample size
Population Mean
N
1
N x i 1
i
Where:
μ is the population mean
xi are the data points
N is the total number of observations in the
population
Measures of Central Tendency
(ungrouped)
The sample gives The Mean
these values: This is the
5000, 6000, 30000, average….
110000, 15000, Sum of values =
6000, 17000, 13000, 271500
12000, 11000, Total N = 15
8000, 6000, 15000, Mean = 18100
6000, 11500
Advantages /Disadvantages of the
Arithmetic Mean
Advantages:
1) Familiar and intuitively clear to most people
2) Every data set has one and only one mean
3) Useful for performing statistical procedures
Disadvantages:
1) May be affected by extreme values
2) Tedious to compute
3) Difficult to compute for data set with open- ended classes
Solve it….
The average weekly for a group of 25
persons working in a factory was
calculated to be Rs. 378.40 it was later
discovered that one figure was misread
as 160 instead of the correct value Rs.
200. Calculate correct average wage.
Mean of two or more groups
What is the Median?
If the data has been sorted (ascending or
descending), the median is the middle value (for
an odd number of points) or the average of the
two middle values (for an even number of points).
median is used to characterize data sets with a
few extreme values that distort the relevance of
the mean, such as house values or family
incomes.
Median = ( n+1
2 )th item in the data array
Measures of Central Tendency
(ungrouped)
The Median
The sample gives This is the middle
these values: values:
5000, 6000, 30000,
5000, 6000, 6000,
6000, 6000, 8000,
110000, 15000, 11000, 11500, 12000,
6000, 17000, 13000, 13000, 15000, 15000,
17000, 30000, 110000
12000, 11000, The median here is
8000, 6000, 15000, 11500
6000, 11500 In cases where there
are two middle values,
we average the two.
What is the Mode?
If the data is discrete, or has been grouped
into discrete intervals, the mode is that value
that occurs the most often.
In other words it is the value most likely to
occur.
Measures of Central Tendency
(ungrouped)
The Mode
The sample gives This is the most
these values: numerous value:
5000, 6000, 30000, 5000, 6000, 6000,
110000, 15000, 6000, 6000, 8000,
6000, 17000, 13000, 11000, 11500, 12000,
12000, 11000, 13000, 15000, 15000,
17000, 30000, 110000
8000, 6000, 15000, The Mode here is 6000.
6000, 11500 Sometimes there is no
mode…or even two
modes!
Measures of Central Tendency
(ungrouped)
So given these …what is the best
values… measure of central
tendency for this
5000, 6000, 6000, random sample of
6000, 6000, 8000, NIFT students?
11000, 11500, Mean?...18100
12000, 13000, Median?...11500
15000, 15000, Mode?...6000
17000, 30000,
110000
Measure of central tendency
of grouped data
Grouped data,
Class limits
Class mid point
Class interval
Inclusive
Exclusive
Mean of grouped data
x 1 f x
n
( N / 2 C .F .)
Median = L X h where L is lower limit of Median Class; N is total Frequency,
F
C.F. id cumulative frequency of class preceding median class, F is frequency of median class
and h is class width.
Computation of Mode for grouped Data
Summary of
Central Tendency Measures
Range
The highest value minus the lowest value….
From our last example, the range would be:
110000 – 5000 = 105000
What is the Variance?
The Variance of a population is the sum of
the squares of the differences between the
mean and the individual data points divided
by the number of data points.
The Variance of a sample is the sum of the
squared differences divided by the number of
data points less one.
What is the Standard
Deviation?
Standard Deviation
This is the average distance your
values have from the mean
score.
The Standard Deviation is the square
root of the variance
Computing Standard Deviation
Population σ
Sample "s"
the variance
f d f d
2
2
S .D h
f f
Where f is frequency; d is deviation computed
as
xi A
di= h
. A is assumed mean
x is mid value of class
SD for Grouped Data
The following data traveling No. of
provides the traveling expenses (Rs) Students
expenses (Rs) Of 50 61 – 70 2
NIFT students. 71 – 80 10
Find Mean and SD
81 – 90 20
91 – 100 17
101 - 110 1
S.D for Grouped Data
CI Mid Freq. d= xa fxd f X d2
Values(x) (f) h
61 – 70 65.5 2 -2 -4 8
71 -80 75.5 10 -1 -10 10
81 – 90 85.5 20 0 0 0
91 – 100 95.5 17 1 17 17
101 – 110 105.5 1 2 2 4
Total 5 39
S.D for Grouped Data
n
f i xd i
x A i 1
n
xh 86.5
i 1
fi
2
n
2 n
f i xd i f i xd i
i 1 i 1 xh
n n
fi
fi
i 1 i 1
8.86
Coefficient of Variation
a40 b160
a5 b15
Solution
CV (100)
Technician A Technician B
= 5 (100)
40 =
15 (100)
160
= 12.5%
= 9.4%
A Valuable Tool
The standard deviation is a rather
recent invention and was originally
devised by Gauss to explain the error
observed in measured star positions.
Today it is used in everything from
Quality Control to Measuring Risk in
financial investments.
Measures of Central Tendency and Dispersion
(Grouped Data)
Remember that grouped data is a collection
of data that has been placed into categories…
Mode Mean
Mean Mode
Median
Median