Week One: Introduction To Quantitative Methods MBA 2013
Week One: Introduction To Quantitative Methods MBA 2013
Methods
1
MBA 2013
CHITRAKALPA SEN AURO UNIVERSITY
WEEK ONE
Variables
A variable is a characteristic or condition that
can change or take on different values.
Most research begins with a general question
about the relationship between two variables
for a specific group of individuals.
2
Population
The entire group of individuals is called the
population.
For example, a researcher may be interested
in the relation between class size (variable 1)
and academic performance (variable 2) for the
population of third-grade children.
3
Sample
Usually populations are so large that a
researcher cannot examine the entire group.
Therefore, a sample is selected to represent
the population in a research study. The goal is
to use the results obtained from the sample to
help answer questions about the population.
4
Types of Variables
Variables can be classified as discrete or
continuous.
Discrete variables (such as class size) consist
of indivisible categories, and continuous
variables (such as time or weight) are
infinitely divisible into whatever units a
researcher may choose. For example, time
can be measured to the nearest minute,
second, half-second, etc.
6
Measuring Variables
To establish relationships between variables,
researchers must observe the variables and
record their observations. This requires that
the variables be measured.
The process of measuring a variable requires a
set of categories called a scale of
measurement and a process that classifies
each individual into one category.
7
4 Types of Measurement Scales
1. A nominal scale is an unordered set of
categories identified only by name. Nominal
measurements only permit you to determine
whether two individuals are the same or
different.
2. An ordinal scale is an ordered set of
categories. Ordinal measurements tell you
the direction of difference between two
individuals.
8
4 Types of Measurement Scales
3. An interval scale is an ordered series of equal-sized
categories. Interval measurements identify the
direction and magnitude of a difference. The zero
point is located arbitrarily on an interval scale.
4. A ratio scale is an interval scale where a value of
zero indicates none of the variable. Ratio
measurements identify the direction and
magnitude of differences and allow ratio
comparisons of measurements.
9
Correlational Studies
The goal of a correlational study is to
determine whether there is a relationship
between two variables and to describe the
relationship.
A correlational study simply observes the two
variables as they exist naturally.
10
Experiments
The goal of an experiment is to demonstrate a
cause-and-effect relationship between two
variables; that is, to show that changing the
value of one variable causes changes to occur
in a second variable.
12
Other Types of Studies
Other types of research studies, know as non-
experimental or quasi-experimental, are
similar to experiments because they also
compare groups of scores.
These studies do not use a manipulated
variable to differentiate the groups. Instead,
the variable that differentiates the groups is
usually a pre-existing participant variable
(such as male/female) or a time variable (such
as before/after).
13
Data
The measurements obtained in a research
study are called the data.
The goal of statistics is to help researchers
organize and interpret the data.
14
Descriptive Statistics
Descriptive statistics are methods for
organizing and summarizing data.
For example, tables or graphs are used to
organize data, and descriptive values such as
the average score are used to summarize data.
A descriptive value for a population is called a
parameter and a descriptive value for a
sample is called a statistic.
15
Inferential Statistics
Inferential statistics are methods for using sample
data to make general conclusions (inferences) about
populations.
Because a sample is typically only a part of the whole
population, sample data provide only limited
information about the population. As a result,
sample statistics are generally imperfect
representatives of the corresponding population
parameters.
16
Sampling Error
The discrepancy between a sample statistic
and its population parameter is called
sampling error.
Defining and measuring sampling error is a
large part of inferential statistics.
17
Arithmetic Mean
Arithmetic mean is a mathematical average and it is
the most popular measures of central tendency. It is
frequently referred to as mean it is obtained by
dividing sum of the values of all observations in a series
(X) by the number of items (N) constituting the series.
Thus, mean of a set of numbers X1, X2, X3,..Xn
denoted by x
and is defined as
Example : Arithmetic Mean for monthly Users
Statistics in the University Library
Month No. of
Working
Days
Total Users Average
Users per
month
Sep-2011 24 11618
484.08
Oct-2011 21 8857
421.76
Nov-2011 23 11459
498.22
Dec-2011 25 8841
353.64
Jan-2012 24 5478
228.25
Feb-2012 23
10811 470.04
Total 140
57064
= 407.6
Advantages of Mean:
It is easy to understand & simple
calculate.
It is based on all the values.
It is rigidly defined .
It is easy to understand the arithmetic
average even if some of the details of
the data are lacking.
It is not based on the position in the
series.
Disadvantages of Mean:
It is affected by extreme values.
It cannot be calculated for open end
classes.
It cannot be located graphically
It gives misleading conclusions.
It has upward bias.
HARMONIC MEAN
2.Median
Median is a central value of the distribution,
or the value which divides the distribution in
equal parts, each part containing equal number
of items. Thus it is the central value of the
variable, when the values are arranged in order
of magnitude.
Measures of Central Tendency
Median
First, if you have an odd number of scores pick the middle
score.
1 4 6 7 12 14 18
Median is 7
Second, if you have an even number of scores, take the
average of the middle two.
1 4 6 7 8 12 14 16
Median is (7+8)/2 = 7.5
Advantages of Median:
Median can be calculated in all distributions.
Median can be understood even by common
people.
Median can be ascertained even with the
extreme items.
It can be located graphically
It is most useful dealing with qualitative data
Disadvantages of Median:
It is not based on all the values.
It is not capable of further mathematical
treatment.
It is affected fluctuation of sampling.
In case of even no. of values it may not the
value from the data.
3. Mode
Mode is the most frequent value or score
in the distribution.
It is defined as that value of the item in
a series.
Highest point of the frequencies
distribution curve.
Advantages of Mode :
Mode is readily comprehensible and
easily calculated
It is the best representative of data
It is not at all affected by extreme value.
The value of mode can also be
determined graphically.
It is usually an actual value of an
important part of the series.
Disadvantages of Mode :
It is not based on all observations.
It is not capable of further mathematical
manipulation.
Mode is affected to a great extent by
sampling fluctuations.
Choice of grouping has great influence
on the value of mode.
Dispersion
Measures of dispersion are descriptive
statistics that describe how similar a set of
scores are to each other
The more similar the scores are to each other, the
lower the measure of dispersion will be
The less similar the scores are to each other, the
higher the measure of dispersion will be
In general, the more spread out a distribution is,
the larger the measure of dispersion will be
32
Measures of Dispersion
There are three main measures of dispersion:
The range
The semi-interquartile range (SIR)
Variance / standard deviation
33
The Range
The range is defined as the difference
between the largest score in the set of data
and the smallest score in the set of data, X
L
-
X
S
What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
The largest score (X
L
) is 9; the smallest score
(X
S
) is 1; the range is X
L
- X
S
= 9 - 1 = 8
34
Variance
Variance is defined as the average of the
square deviations:
35
( )
N
X
2
2
= o
What Does the Variance Formula
Mean?
First, it says to subtract the mean from each of
the scores
This difference is called a deviate or a deviation
score
The deviate tells us how far a given score is from
the typical, or average, score
Thus, the deviate is a measure of dispersion for a
given score
36
What Does the Variance Formula
Mean?
Why cant we simply take the average of
the deviates? That is, why isnt variance
defined as:
37
( )
N
X
=
o
2
This is not the formula
for variance!
What Does the Variance Formula
Mean?
One of the definitions of the mean was that it
always made the sum of the scores minus the
mean equal to 0
Thus, the average of the deviates must be 0
since the sum of the deviates must equal 0
To avoid this problem, statisticians square the
deviate score prior to averaging them
Squaring the deviate score makes all the squared
scores positive
38
What Does the Variance Formula
Mean?
Variance is the mean of the squared deviation
scores
The larger the variance is, the more the scores
deviate, on average, away from the mean
The smaller the variance is, the less the scores
deviate, on average, from the mean
39
Standard Deviation
When the deviate scores are squared in
variance, their unit of measure is squared as well
E.g. If peoples weights are measured in pounds,
then the variance of the weights would be expressed
in pounds
2
(or squared pounds)
Since squared units of measure are often
awkward to deal with, the square root of
variance is often used instead
The standard deviation is the square root of variance
40
Standard Deviation
Standard deviation = \variance
Variance = standard deviation
2
41
Computational Formula
When calculating variance, it is often easier to use
a computational formula which is algebraically
equivalent to the definitional formula:
( )
( )
N N
N
X
X
X
2
2
2
2
o
=
=
42
o
2
is the population variance, X is a score, is
the population mean, and N is the number of
scores
Computational Formula Example
X X
2
X- (X-)
2
9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
E = 42 E = 306 E = 0 E = 12
43
Computational Formula Example
( )
2
6
12
6
294 306
6
6
306
N
N
42
X
X
2
2
2
2
=
=
o
( )
2
6
12
N
X
2
2
=
=
=
o
44
Measure of Skew
Skew is a measure of symmetry in the
distribution of scores
45
Positive Skew
Negative Skew
Normal (skew = 0)
Measure of Skew
If s
3
< 0, then the distribution has a negative
skew
If s
3
> 0 then the distribution has a positive
skew
If s
3
= 0 then the distribution is symmetrical
The more different s
3
is from 0, the greater
the skew in the distribution
46
Kurtosis
Kurtosis measures whether the scores are
spread out more or less than they would be in
a normal (Gaussian) distribution
47
Mesokurtic
(s
4
= 3)
Leptokurtic (s
4
> 3)
Platykurtic (s
4
< 3)
Kurtosis
When the distribution is normally distributed,
its kurtosis equals 3 and it is said to be
mesokurtic
When the distribution is less spread out than
normal, its kurtosis is greater than 3 and it is
said to be leptokurtic
When the distribution is more spread out than
normal, its kurtosis is less than 3 and it is said
to be platykurtic
48
s
2
, s
3
, & s
4
Collectively, the variance (s
2
), skew (s
3
), and
kurtosis (s
4
) describe the shape of the
distribution
49