0% found this document useful (0 votes)
7K views101 pages

Ugc Net Commerce: Business Statistics & Research

The document provides information about business statistics and research, including definitions and characteristics of statistics. It discusses measures of central tendency such as the mean, median, and mode. Specifically, it describes the arithmetic mean in detail, including its formula, properties, and how to calculate it for different data sets. It also covers other averages like the weighted mean, geometric mean, and their uses and properties. The key aspects covered are definitions and measures of central tendency in statistics.

Uploaded by

Parth Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7K views101 pages

Ugc Net Commerce: Business Statistics & Research

The document provides information about business statistics and research, including definitions and characteristics of statistics. It discusses measures of central tendency such as the mean, median, and mode. Specifically, it describes the arithmetic mean in detail, including its formula, properties, and how to calculate it for different data sets. It also covers other averages like the weighted mean, geometric mean, and their uses and properties. The key aspects covered are definitions and measures of central tendency in statistics.

Uploaded by

Parth Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

UNIT SNAPSHOT

UGC NET COMMERCE


Unit V

Business Statistics &


Research
Statistics
“Statistics”, that a word is often used, has been derived from the Latin word ‘Status’ that means
a group of numbers or figures; those represent some information of our human interest.

According to A.L. Bowley “Statistics are numerical statements of facts in any department of
enquiry placed in relation to each other.”

According to Croxton and Cowden, “Statistics may be defined as the collection, presentation,
analysis, and interpretation of numerical data.

The important characteristics of statistics are given below:

1. Statistics are aggregates of facts.


2. Statistics are numerically expressed.
3. Statistics are affected to a marked extent by multiplicity of causes.
4. Statistics are enumerated or estimated according to a reasonable standard of accuracy.
5. Statistics are collected for a predetermined purpose. Statistics are collected in a systemic
manner.
6. Statistics must be comparable to each other.

Functions of Statistics
The functions of statistics may be enumerated as follows:

(i) To present facts in a definite form


(ii) To simplify unwieldy and complex data
(iii) To use it as a technique for making comparisons
(iv) To enlarge individual experience
(v) To provide guidance in the formulation of policies
(vi) To enable measurement of the magnitude of a phenomenon

Main Limitations of Statistics i


1. Qualitative Aspect Ignored
2. It does not deal with individual items
3. It does not depict entire story of phenomenon
4. It is liable to be miscued
5. Results are true only on average

www.everstudy.co.in Query: [email protected]


6. To Many methods to study problems

Statistical results are not always beyond doubt

Measures of Central Tendency


According to Prof Bowley, “Measures of central tendency (averages) are statistical constants
which enable us to comprehend in a single effort the significance of the whole.”

The main objectives of Measure of Central Tendency are

1) To condense data in a single value.

2) To facilitate comparisons between data.

 Averages provide us the gist and give a bird’s eye view of the huge mass of unwieldy
numerical data.
 Averages are the typical values around which other items of the distribution congregate.
 This value lies between the two extreme observations of the distribution and give us an
idea about the concentration of the values in the central part of the distribution.
 And so they are called the measures of central tendency.
 Averages are also called measures of location since they enable us to locate the
position or place of the distribution in question.

Essential of a Good Average


An average represents the statistical data and it is used for purposes of comparison, it must
possess the following properties.

1. It must be rigidly defined and not left to the mere estimation of the observer. If the definition
is rigid, the computed value of the average obtained by different persons shall be similar.

2. The average must be based upon all values given in the distribution.

3. It should be easily understood. The average should possess simple and obvious properties. It
should be too abstract for the common people.

4. It should be capable of being calculated with reasonable care and rapidity.

www.everstudy.co.in Query: [email protected]


5. It should be stable and unaffected by sampling fluctuations.

6. It should be capable of further algebraic manipulation.

7. It should be not be unduly affected by extreme values.

Methods of Measuring Central Tendency

Different methods of measuring “Central Tendency” provide us with different kinds of


averages.

The following are the main types of averages that are commonly used:

1. Mean

(i) Arithmetic mean

(ii) Weighted mean

(iii) Geometric mean

(iv) Harmonic mean


www.everstudy.co.in Query: [email protected]
2. Median

3. Mode

1. Arithmetic Mean:
 The mean is the arithmetic average, and it is probably the measure of central tendency
that you are most familiar.
 Calculating the mean is very simple.
 Add up all of the values and divide by the number of observations in the dataset.
 The calculation of the mean incorporates all values in the data. If you change any value,
the mean changes.

Mathematical Properties of the Arithmetic Mean:

1. The sum of the deviation of a given set of individual observations from the arithmetic
mean is always zero.
2. The sum of squares of deviations of a set of observations is the minimum when
deviations are taken from the arithmetic average.
3. If each value of a variable X is increased or decreased or multiplied by a constant k, the
arithmetic mean also increases or decreases or multiplies by the same constant.
4. If we are given the arithmetic mean and number of items of two or more groups, we can
compute the combined average of these groups by apply the following formula :

www.everstudy.co.in Query: [email protected]


Combined mean

Merits of Arithmetic mean: Demerits of Arithmetic mean:

 Simplicity  Effect of extreme values.


 Certainty  Mean value may not figure in the
 Based on all values. series.
 Algebraic treatment possible.  Unsuitability.
 Basis of comparison  Misleading conclusions.
 Accuracy test possible.  Cannot be used in case of
 No scope for estimated value qualitative phenomenon.
 Gets distorted by extreme value
of the series.

Formulae of calculating arithmetic mean:

Types of
Direct Method Shortcut Methods Step deviation Methods
Series

www.everstudy.co.in Query: [email protected]


Individual
Series

Discrete
series

Continuous
Series

2. Weighted Average

 A weighted average is a type of average where each observation in the data set is
multiplied by a predetermined weight before calculation.
 In calculating a simple average (arithmetic mean) all observations are treated equally and
assigned equal weight.
 A weighted average assigns weights that determine the relative importance of each data
point.

Why Weighted Average??

 Takes into account relative importance of data points when calculating an average
thereby making it more descriptive than a simple average.

 Smoothes out data which improves accuracy.

 Often used in finance to calculate cost basis of stock portfolios, inventory accounting
and valuation.

www.everstudy.co.in Query: [email protected]


Weighted Mean

3. Geometric Mean:

 A geometric mean is a mean or average which shows the central tendency of a set of
numbers by using the product of their values.
 For a set of n observations, a geometric mean is the nth root of their product.
 The geometric mean G.M., for a set of numbers x1, x2, … , xn is given as

G.M. = (x1. x2 … xn)1⁄n

Or, G. M. = (π i = 1n xi) 1⁄n = n√( x1, x2, … , xn).

The geometric mean of two numbers, say x, and y is the square root of their product x×y. For three
numbers, it will be the cube root of their products i.e., (x y z) 1⁄3.

Relation between Geometric Mean and Logarithms

In order to make our calculation easy and less time consuming we use the concept of logarithms in
the calculation of geometric means.

Since, G.M. = (x1. x2 … xn) 1⁄n

Taking log on both sides, we have

www.everstudy.co.in Query: [email protected]


log G.M. = 1⁄n (log ((x1. x2 … xn))

or, log G.M. = 1⁄n (log x1 + log x2 + … + log xn)

or, log G.M. = (1⁄n) ∑ i= 1n log xi

or, G.M. = Antilog(1⁄n (∑ i= 1n log xi))

For a grouped frequency distribution, the geometric mean G.M. is

G.M. = (x1 f1. x2 f2 … xn fn) 1⁄N , where N = ∑ i= 1n fi

Taking logarithms on both sides, we get

log G.M. = 1⁄N (f1 log x1 + f2 log x2 + … + fn log xn) = 1⁄N [∑ i= 1n fi log xi ].

Properties of Geometric Means

1. The logarithm of geometric mean is the arithmetic mean of the logarithms of given
values

2. If all the observations assumed by a variable are constants, say K >0, then the G.M. of the
observation is also K

3. The geometric mean of the ratio of two variables is the ratio of the geometric means of
the two variables

4. The geometric mean of the product of two variables is the product of their geometric
means

Geometric Mean of a Combined Group

www.everstudy.co.in Query: [email protected]


Suppose G1, and G2 are the geometric means of two series of sizes n1, and n2respectively. The
geometric mean G, of the combined groups, is:

log G = (n1 log G1 + n2 log G2) ⁄ (n1 + n2)

or, G = antilog [(log G1 + n2 log G2) ⁄ (n1 + n2)]

In general for ni geometric means, i = 1 to k, we have

G = antilog [(log G1 + n2 log G2 + … + nk log Gk) ⁄ (n1 + n2 + … +nk)]

Specific uses of G.M:

The geometric Mean has certain specific uses, some of them are:

1. It is used in the construction of index numbers.


2. It is also helpful in finding out the compound rates of change such as the rate of growth
of population in a country.
3. It is suitable where the data are expressed in terms of rates, ratios and percentage.
4. It is quite useful in computing the average rates of depreciation or appreciation.
5. It is most suitable when large weights are to be assigned to small items and small weights
to large items.

Advantages of Geometric Mean Disadvantages of Geometric Mean

 A geometric mean is based upon  A geometric mean is not easily


all the observations understandable by a non-mathematical

www.everstudy.co.in Query: [email protected]


 It is rigidly defined person

 The fluctuations of the  If any of the observations is zero,


observations do not affect the geometric the geometric mean becomes zero
mean
 If any of the observation is
 It gives more weight to small negative, the geometric mean becomes
items imaginary

4. Harmonic Mean

 A simple way to define a harmonic mean is to call it the reciprocal of the arithmetic mean
of the reciprocals of the observations.

 The most important criteria for it are that none of the observations should be zero.

 A harmonic mean is used in averaging of ratios.

 The most common examples of ratios are that of speed and time, cost and unit of material,
work and time etc.

 The harmonic mean (H.M.) of n observations is

H.M. = 1÷ (1⁄n ∑ i= 1n (1⁄xi) )

In the case of frequency distribution, a harmonic mean is given by

www.everstudy.co.in Query: [email protected]


H.M. = 1÷ [1⁄N (∑ i= 1n (fi ⁄ xi)], where N = ∑ i= 1n fi

Properties of Harmonic Mean

1. If all the observation taken by a variable are constants, say k, then the harmonic mean of
the observations is also k

2. The harmonic mean has the least value when compared to the geometric mean and the
arithmetic mean

Advantages of Harmonic Mean Disadvantages of Harmonic Mean

 A harmonic mean is rigidly  Not easily understandable


defined
 Difficult to compute
 It is based upon all the
observations

 The fluctuations of the


observations do not affect the harmonic
mean

 More weight is given to smaller


items

Relationship between Arithmetic Mean, Harmonic Mean, and Geometric Mean of Two
Numbers

www.everstudy.co.in Query: [email protected]


For two numbers x and y, let x, a, y be a sequence of three numbers.

 If x, a, y is an arithmetic progression then 'a' is called arithmetic mean.


 If x, a, y is a geometric progression then 'a' is called geometric mean.
 If x, a, y form a harmonic progression then 'a' is called harmonic mean.

Let AM = arithmetic mean,

GM = geometric mean,

And HM = harmonic mean.

The relationship between the three is given by the formula

AM × HM = GM2

(II) Median

 Median is the middle value of the series when arranged in order of the magnitude.
 When a series is divided into more than two parts, the dividing values are called Partition
values.

How to calculate Median??

The very first thing to be done with raw data is to arrange them in ascending or descending
order.

In Layman’s terms:

For odd numbers : Median = the middle number

For example: 5,8,9,10,6,15

Arrange in ascending order: 5,6,8,10,15

As we have 5 numbers the middle number will be the 3rd number which can also be
calculated as

www.everstudy.co.in Query: [email protected]


{(n+1)/2 }th number= (5+1)/= 6/2 = 3rd number which is 8

So the Median is 8

For even numbers: As then there is no value exactly in the middle of the series. In
such a situation the median is arbitrarily taken to be halfway between the two
middle items.

For example: 19,8,9,6,12,5

Ascending order: 5,6,8,9,12,19

Here we have 6 numbers so (n+1)/2= (6+1)/2 = 3.5

So find the average of 3rd and 4th number = 8+9/2=8.5

So 8.5 is the median

For grouped data Median is calculated using the formula:

l --lower limit of median class,


c--cumulative frequency of previous to median class,
f --frequency of median class,

h --Size of the median class interval

N --total number of observation i.e. sum of frequencies

Related Positional Measures:


www.everstudy.co.in Query: [email protected]
The median divides the series into two equal parts.

Similarly there are certain other measures which divide the series into certain equal parts

Quartiles:
Quartiles are the measures which divide the data into four equal parts; each portion contains
equal number of observation.

 There are three quartiles


 If a statistical series is divided into four equal parts, the end value of each part is called a
quartile and denoted by ‘Q’.

1. The lower half of a data set is the set of all values that are to the left of the median value
when the data has been put into increasing order.
2. The upper half of a data set is the set of all values that are to the right of the median value
when the data has been put into increasing order.

1. The first quartile, denoted by Q1, is the median of the lower half of the data
set. This means that about 25% of the numbers in the data set lie below Q1 and
about 75% lie above Q1.

2. The second quartile also called median and denoted by Q2, has 50% of the
items below it and 50% of the items above it.

3. The third quartile, denoted by Q3, is the median of the upper half of the
data set. This means that about 75% of the numbers in the data set lie below Q3
and about 25% lie above Q3.

Formulae of calculating median and partition values:

Measure Individual Series Discrete Series Continuous Series

Size of item Size of item Size of item Size of item Formula

www.everstudy.co.in Query: [email protected]


Median

First Quartile

Third Quartile

Deciles
Deciles: Deciles distribute the series into ten equal parts and generally expressed as D.

 There are nine deciles expressed as D1,D2…D9 which are called as first decile, second
decile and so on

Percentiles
Percentiles: Percentiles divide the series into hundred equal parts and generally expressed as P.

Merits of Median: Demerits of median:

(i) Simple measure of central tendency. (i) Not based on all the items in the
(ii) It is not affected by extreme series, as it indicates the value of middle
observations. items.
(Iii) Possible even when data is (ii) Not suitable for algebraic treatment.
incomplete. (iii) Arranging the data in ascending
(iv) Median can be determined by order takes much time.
graphic presentation of data. (iv) Affected by fluctuations of items.
(v) It has a definite value. (v) It cannot be computed exactly where

www.everstudy.co.in Query: [email protected]


(vi) Simple to calculate and understand the number of items in a series is even.
(vii) It is a positional value not a
calculated value.

Mode
 Mode is that value of the variable which occurs or repeats itself maximum number of
item.
 The mode is most “ fashionable” size in the sense that it is the most common and typical
and is defined by Zizek as “the value occurring most frequently in series of items and around
which the other items are distributed most densely.”
 In the words of Croxton and Cowden, the mode of a distribution is the value at the point
where the items tend to be most heavily concentrated.
 According to A.M. Tuttle, Mode is the value which has the greater frequency density in
its immediate neighborhood.
 In the case of individual observations, the mode is that value which is repeated the
maximum number of times in the series. The value of mode can be denoted by the alphabet ‘z’
also.

Mode for continuous series:

L= lower limit of the class, where mode lies,


i = Class interval
f 0 = frequency of the class preceding the modal class.
f 1 = frequency of the class, where mode lies.

www.everstudy.co.in Query: [email protected]


f 2 = frequency of the class succeeding the modal class.

Merits of mode: Demerits of mode:

(i) Simple and popular measure of (i) It is an uncertain measure.


central tendency. (ii) It is not capable of algebraic
(ii) It can be located graphically with the treatment.
help of histogram. (iii) Procedure of grouping is complex.
(iii) Less effect of marginal values. (iv) It is not based on all observations.
(iv) No need of knowing all the items of (v) For bi- modal and tri-modal series, it
series. is difficult to calculate.
(v) It is the most representative value in (vi) Its value is not based on each and
the given series. every item of the series.
(vi) It is less effected by extreme values. (vii) If items are identical, it is difficult
to identify the modal value.

Relation among mean, median and mode :

Mode = 3 median – 2 mean

DISPERSION
➢ According to Dr. Bowley, “dispersion is the measure of the variation between items.”

➢ Dispersion refers to the variation of the items around an average.

➢ Measures of dispersion measure how spread out a set of data is.

Objectives of Dispersion
a) To determine the reliability of an average.

www.everstudy.co.in Query: [email protected]


b) To compare the variability of two or more series.
c) It serves the basis of other statistical measures such as correlation etc.
d) It serves the basis of statistical quality control.

Properties of good measure of Dispersion


a) It should be easy to understand.
b) Easy to calculate.
c) Rigidly defined
d) Based on all observations.
e) Should not be unduly affected by extreme values.

Classification of Measures of Dispersion

MEASURES OF
DISPERSION

ABSOLUTE
RELATIVE MEASURES
MEASURES

RANGE COEFFICIENT OF RANGE

QUARTILE COEFFICIENT OF QUARTILE


DEVIATON DEVIATON
STANDARD COEFFICIENT OF STANDARD
DEVIATION DEVIATION
MEAN COEFFICIENT OF MEAN
DEVIATION DEVIATION

Range:
 It is the simplest method of studying dispersion. Range is the difference between the
smallest value and the largest value of a series.

 While computing range, we do not take into account frequencies of different groups.

 If X max and X min are the two extreme observations then

www.everstudy.co.in Query: [email protected]


Range = X max – X min

Co-efficient of Range = (X max – X min)

(X max + X min)

Merits of Range Demerits of Range

1. It is simple to understand and easy to 1. It is affected by extreme values in the


calculate. series.

2. It is widely used in statistical quality 2. It cannot be calculated in case of open


control. end series.

3. It is not based on all items.

Quartile Deviation

✓ The concept of ‘Quartile Deviation does take into account only the values of the ‘Upper
quartile (Q3) and the ‘Lower quartile’ (Q1).

✓ Quartile Deviation is also called ‘inter-quartile range’.

✓ Inter quartile range is the difference between Upper Quartile (Q3) and Lower Quartile
Q1.

✓ Quartile deviation is half of inter quartile range.

✓ ‘Quartile Deviation’ can be obtained as :

www.everstudy.co.in Query: [email protected]


Inter-quartile Deviation

= Q3 – Q1

Co-efficient of quartile deviation


Semi-quartile Deviation
= (Q3 – Q1)
= Q3 – Q1
(Q3 + Q1).
2

Merits of Q.D Demerits of Q.D

1. Easy to compute 1. Not based on all observations

2. Less affected by extreme values. 2. It ignores the 50% of the data

3. Can be computed in open ended 3. It is influenced by change in


series. sample and suffers from instability.

4. All the drawbacks of Range are


overcome by quartile deviation.

Mean Deviation:
Average deviation is defined as a value which is obtained by taking the average of the
deviations of various items from a measure of central tendency Mean or Median or Mode,
ignoring negative signs.

www.everstudy.co.in Query: [email protected]


Merits of Mean Deviation Demerits of Mean Deviation

1. Based on all observations. 1. It ignores ± signs in deviations.

2. It is less affected by extreme values. 2. It is difficult to compute when


deviations comes in fractions.
3. Simple to understand and easy to
calculate. 3. M.D. and its co-efficient taken from
X, M and Z often differ.
4. It is a good index of score density at
the middle of the distribution.

5. Quartiles are useful in indicating the


skewness of a distribution

Standard Deviation:

www.everstudy.co.in Query: [email protected]


“Standard deviation or S.D. is the square root of the mean of the squared deviations of the
individual scores from the mean of the distribution.”

Standard deviation is calculated as the square root of average of squared deviations taken from
actual mean.

✓ It is denoted by a Greek letter sigma, σ.

✓ It is also called root mean square deviation.

✓ The square of standard deviation is called ‘variance’. It is denoted by σ 2

Properties of Standard Deviation:


1. If each variate value is increased by the same constant value, the value of S.D. of the
distribution remains unchanged
2. When a constant value is subtracted from each variate, the value of S.D. of the new
distribution remains unchanged
3. If each observed value is multiplied by a constant value, S.D. of the new observations
will also be multiplied by the same constant
4. If each observed value is divided by a constant value, S.D. of the new observations will
also be divided by the same constant.
5. Thus, to conclude, SD is

SD is independent of change of origin (addition, subtraction)

But,

SD is dependent of change of scale (multiplication, division).

Standard deviation for Sample Standard deviation for Population

www.everstudy.co.in Query: [email protected]


The standard deviation for ungrouped data is defined as

Where d = deviation of individual scores from the mean;

(Some authors use ‘x’ as the deviation of individual scores from the mean)

∑ = sum total of;

N = total number of cases.

Computation of S.D. (Grouped data):

Co-efficient of Standard deviation = S.D.


Mean

www.everstudy.co.in Query: [email protected]


Merits of Standard Deviation Demerits of Standard Deviation

i. Rigidly defined and its value is always


definite.
i. Difficult to understand and compute.
ii. Based on all observations
ii. Affected by extreme items.
iii. Takes Algebraic signs in
consideration

iv. Amenable to further Algebraic


treatment

v. It is less affected by fluctuations of


sampling.

vi. It provides a standard unit of


measure that possesses comparable
meaning from one test to another.

vii. Moreover, the normal curve is


directly related to S.D.

www.everstudy.co.in Query: [email protected]


Uses of S.D:

(i) When the most accurate, reliable and stable measure of variability is wanted.

(ii) When more weight is to be given to extreme deviations from the mean.

(iii) When coefficient of correlation and other statistics are subsequently computed.

(iv) When measures of reliability are computed.

(v) When scores are to be properly interpreted with reference to the normal curve.

(vi) When standard scores are to be computed.

(vii) When we want to test the significance of the difference between two statistics.

(viii) When coefficient of variation, variance, etc. are calculated.

Coefficient of Dispersion
 Whenever we want to compare the variability of the two series which differ widely in
their averages.

 Also, when the unit of measurement is different.

 We need to calculate the coefficients of dispersion along with the measure of dispersion.

 The coefficients of dispersion (C.D.) based on different measures of dispersion.

 The coefficient of variation (C.V.) is 100 times the coefficient of dispersion based on
standard deviation.

C.V. = 100 x (S.D. / Mean)

www.everstudy.co.in Query: [email protected]


 CV gives the percentage which σ is of the test mean. It is thus a ratio which is
independent of the units of measurement.
 CV is restricted in its use owing to certain ambiguities in its interpretation. It is defensible
when used with ratio scales—scales in which the units are equal and there is a true zero or
reference point.
 Two cases arise in the use of V with ratio scales:

(1) When units are dissimilar, and

(2) When M’s are unequal, the units of the scale being the same.

Types of Distributions

Bernoulli Distribution

 A Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0
(failure), and a single trial.

 So the random variable X which has a Bernoulli distribution can take value 1 with the
probability of success, say p, and the value 0 with the probability of failure, say q or 1-p.

 For example when a unbiased coin is tossed,

✓ the occurrence of a head denotes success, and

✓ The occurrence of a tail denotes failure.

 Probability of getting a head = Probability of getting a tail = 0.5 since there are only two
possible outcomes.

 The probability mass function is given by:

px(1-p)1-x

where x € (0, 1).

www.everstudy.co.in Query: [email protected]


✓ It can also be written as

 The probabilities of success and failure need not be equally likely,

✓ For instance, the result of a fight between me and Undertaker. He is pretty much certain
to win. So in this case probability of my success is 0.15 while my failure is 0.85

 Here, the probability of success(p) is not same as the probability of failure.

The expected value of a random variable X from a Bernoulli distribution is found


as follows:

E(X) = 1*p + 0*(1-p) = p

E(X) = p

The variance of a random variable from a Bernoulli distribution is:

V(X) = E(X²) – [E(X)]² = p – p² = p(1-p)

Var(X) = p-(1-p)

Binomial Distribution

 A distribution where only two outcomes are possible, such as success or failure, gain or
loss, win or lose and where the probability of success and failure is same for all the trials is
called a Binomial Distribution.

 The outcomes need not be equally likely.

www.everstudy.co.in Query: [email protected]


✓ For example, if the probability of success in an experiment is 0.2 then the probability of
failure can be easily computed as q = 1 – 0.2 = 0.8.

 The properties of a Binomial Distribution are:

1. Each trial is independent.

2. There are only two possible outcomes in a trial- either a success or a failure.

3. A total number of n identical trials are conducted.

4. The probability of success and failure is same for all trials. (Trials are identical.)

The mathematical representation of binomial distribution (Probability mass


function) is given by:

Parameters of binomial distribution are n and p

The mean of a binomial distribution are given by:

www.everstudy.co.in Query: [email protected]


Mean = µ = n*p

The Variance of a binomial distribution are given by:

Variance =Var(X) = n*p*q

Examples of binomial experiments


• Tossing a coin 20 times to see how many tails occur.

• Asking 200 people if they watch ABC news.

• Rolling a die to see if a 5 appears.

Examples which aren't binomial experiments


• Rolling a die until a 6 appears (not a fixed number of trials)

• Asking 20 people how old they are (not two outcomes)

• Drawing 5 cards from a deck for a poker hand (done without replacement, so not
independent)

Normal distribution :
 Normal distribution represents the behavior of most of the situations in the universe.
 The large sum of (small) random variables often turns out to be normally distributed,
contributing to its widespread application.
 Any distribution is known as Normal distribution if it has the following characteristics:
1. The mean, median and mode of the distribution coincide.
2. The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
3. The total area under the curve is 1.
4. The mean divides the curve into 2 equal parts
5. Its quartile deviation, Q.D= /3 σ

6. Its mean deviation, M.D= 4/5 σ

www.everstudy.co.in Query: [email protected]


7. The X axis is an Asymptote to the curve(Asymptote is a straight line that touches the
curve at infinity)
8. It is unimodal distribution
9. The area under the curve within the central limits is as under

Limits Area %

µ± σ 68.2

µ± 1.96σ 95

µ± 2σ 95.4

µ± 3σ 99.7

 A normal distribution is highly different from Binomial Distribution.


 However, if the number of trials approaches infinity then the shapes will be quite similar.

www.everstudy.co.in Query: [email protected]


 The PDF of a random variable X following a normal distribution is given by:

The parameters of normal distribution are µ (mean) and σ (standard


deviation)

The mean and variance of a random variable X which is said to be normally


distributed is given by:

Mean= E(X) = µ

Variance = Var(X) = σ2

Poisson distribution
 The Poisson distribution is a discrete distribution with a single parameter ‘m’.

 Poisson process is obtained when the binomial experiment is conducted many number of
times

 Here the number of trails would be large number.

 It is also called discrete probability distribution

 If the probability of success “p” is small and the number of trails “n” is large, the binomial
distribution is approximated to Poisson distribution.

 As m increases, the distribution shifts to the right.

www.everstudy.co.in Query: [email protected]


 All the Poisson distribution is skewed to right. This is the reason why the Poisson probability
distributions have been called the probability of distribution of rare events.

Assumptions of Poisson distribution:

 A distribution is called Poisson distribution when the following assumptions are valid:

1. Any successful event should not influence the outcome of another successful event.

2. The probability of success over a short interval must equal the probability of success over a
longer interval.

3. The probability of success in an interval approaches zero as the interval becomes smaller.

Now, if any distribution validates the above assumptions then it is a Poisson distribution.

Some examples are

1. The number of emergency calls recorded at a hospital in a day.

2. The number of thefts reported in an area on a day.

3. The number of customers arriving at a salon in an hour.

www.everstudy.co.in Query: [email protected]


4. The number of suicides reported in a particular city.

5. The number of printing errors at each page of the book.

Here, X is called a Poisson Random Variable and the probability distribution of X is called
Poisson distribution.

The Probability distribution of X following a Poisson distribution is given by:

The mean µ is the parameter of this distribution.

µ is also defined as the λ time’s length of that interval.

The mean and variance of X following a Poisson distribution:

Mean = E(X) = µ = λ
Variance = Var(X) = µ = λ

Standard deviation = SD(X) =  = 

Exponential Distribution

 Exponential distribution is widely used for survival analysis.

www.everstudy.co.in Query: [email protected]


 From the expected life of a machine to the expected life of a human, exponential distribution
successfully delivers the result.

 Consider the call center example one more time. What about the interval of time between the
calls ?

 Here, exponential distribution comes to our rescue. Exponential distribution models the
interval of time between the calls.

Other examples are:

1. Length of time between metro arrivals,

2. Length of time between arrivals at a gas station

3. The life of an Air Conditioner

A random variable X is said to have an exponential distribution with PDF:

 Here λ > 0 is the parameter of the distribution, often called the rate
parameter.

 The distribution is supported on the interval [0, ∞).

 If a random variable X has this distribution, we write X ~ Exp(λ).

For survival analysis, λ is called the failure rate of a device at any time t, given that it has
survived up to t.

www.everstudy.co.in Query: [email protected]


Mean and Variance of a random variable X following an exponential distribution:

Mean = E(X) = 1/λ

Variance = Var(X) = (1/λ)²

Standard Deviation = SD(X) = 1/λ

The standard deviation is equal to the mean.

 Greater the rate, the faster the curve drops and

 The lower the rate, flatter the curve.

Data collection:
 Collection of data is the first and most important stage in any statistical survey. The
method for collection of data depends upon various factors such as objective, scope, nature of
investigation and availability of resources.

www.everstudy.co.in Query: [email protected]


Sources of Data

There are two sources of data in Statistics.

1. Statistical sources refer to data that are collected for some official purposes and include
censuses and officially conducted surveys.

2. Non-statistical sources refer to the data that are collected for other administrative purposes
or for the private sector.

➢ Statistical Survey: A statistical Survey is normally conducted using a sample. It is also


called Sample Survey. It is the method of collecting sample data and analyzing it using
statistical methods. This is done to make estimations about population characteristics.

➢ Census : Opposite to a sample survey, a census is based on all items of the population and
then data are analyzed. Data collection happens for a specific reference period. For
example, the Census of India is conducted every 10 years. Other censuses are conducted
roughly every 5-10 years. Data is collected using questionnaires that may be mailed to the
respondents. Responses can also be collected over other modes of communication like the
telephone.

➢ Register: Registers are basically storehouses of statistical information from which data can
be collected and analysis can be made. Registers tend to be detailed and extensive. It is
beneficial to use data from here as it is reliable. Two or more registers can be linked
together based on common information for even more relevant data collection.

Types of Data

There are two types of data – primary data and secondary data.

1. Primary data is the data collected for the first time keeping in view the objective of the
survey. Interview, questionnaire and telephone/mail are all examples of primary data.
2. Secondary data is any information, used for the current investigation but is obtained
from data, which has been collected and used by some other agency or person in a separate
investigation, or survey.

www.everstudy.co.in Query: [email protected]


 Both primary and secondary data may be collected either by census or by sampling
methods. Based on how accurate data is required for statistical surveys, appropriate methods
can be adopted.

Let’s learn data collection in detail:

Primary data
 Primary data is the one, which is collected by the investigator for the purpose of a
specific inquiry or study.
 Such data is original in character and is generated by a survey conducted by individuals
or a research institution or any organization.
 They are likely to be more reliable. However, cost of collection of such data is much
higher.
 Primary data is collected by either a census method or a sampling method

Collection of primary data is done by a suitable method as per the following:

1. Direct personal observation: In the direct personal observation method, the investigator
collects data by having direct contact with the units of investigation. The accuracy of data
depends upon the ability, training and attitude of the investigator.

2. Indirect oral interview:


 Indirect oral interview is used when the area to be covered is large.
 The investigator collects the data from a third party or a witness or the head of an
institution.
 This method is generally used by the police department in cases related to enquiries on
the cause of fires, thefts or murders.
 Enquiry committees appointed by governments use this method to get people’s views and
every possible detail regarding the enquiry.
 This method suits best when direct sources do not exist or cannot be relied upon or would
be unwilling to take part in the

3. Information through agencies:


 This method of collecting information through local agencies or correspondents is
generally adopted by newspapers and television channels.
 Local agents are appointed in different parts of the area under investigation.

www.everstudy.co.in Query: [email protected]


 They send the desired information at regular intervals. This method is used where the
area to be covered is very large and periodic information is required.

4. Information through mailed questionnaires:


 Under this method, information is collected through questionnaires.
 The questionnaires are filled with questions pertaining to the investigation.
 They are sent to the respondents with a covering letter soliciting cooperation from the
respondents (respondents are the people who respond to questions in the questionnaire).
 The respondents are asked to give correct information and to mail the questionnaire back.

5. Information through a schedule filled by investigators:


 Information can be collected through schedules filled by investigators through personal
contact. In order to get reliable information, the investigator should be well trained, tactful,
unbiased and hard working.
 A schedule is suitable for an extensive area of investigation through investigator’s
personal contact.
 The problem of non-response is minimized. There is a difference between a schedule and
a questionnaire.
 A schedule is a form that the investigator fills personally, while surveying the units or
individuals from the sample (respondent).
 A questionnaire is a form sent (usually mailed) by an investigator to respondents. The
respondent has to fill it and then send it back to the investigator.

Secondary data:
 Any information, that is used for the current investigation but is obtained from some data,
which has been collected and used by some other agency or person in a separate
investigation, or survey, is known as secondary data.
 They are available in a published or unpublished form.

 In published form, secondary data is available in research papers, newspapers,


magazines, government publication, international publication, and websites.
 Secondary data is collected for different purposes. Therefore, care should be exercised
while using it.
 The accuracy, reliability, objectives and scope of secondary data should be examined
thoroughly before use.
 Secondary data may be collected either by census or by sampling methods.

www.everstudy.co.in Query: [email protected]


1. Published sources: The various sources of published data are:

➢ Reports and official publications of international and national organizations as well as


central and state governments
➢ Publications of several local bodies such as municipal corporations and district boards
➢ Financial and economic journals
➢ Annual reports of various companies
➢ Publications brought out by research agencies and research scholars
➢ Some of the journals (both academic and non-academic) are published at regular intervals
like yearly, monthly, weekly whereas, other publications are more ad hoc.
➢ Internet is a powerful source of secondary data, which can be accessed at any time for any
further analysis of the study.
2. Unpublished sources: Unpublished data such as records maintained by various
government and private offices, studies made by research institutions and scholars can
also be used where necessary.

Though, use of secondary data is economic in terms of expense, time and manpower
requirement, researcher must be careful in choosing such secondary data.

Secondary data must possess the following characteristic:

1. Reliability of data

2. Suitability of the data

3. Adequacy of data

NO TABLE OF PRIMARY DATA SECONDARY DATA


FIGURES ENTRIES
FOUND.

Meaning Primary data refers to the first Secondary data means data

www.everstudy.co.in Query: [email protected]


hand data gathered by the collected by someone else earlier.
researcher himself.

Data Real time data Past data

Process Very involved Quick and easy

Source Surveys, observations, Government publications, websites,


experiments, questionnaire, books, journal articles, internal
personal interview, etc. records etc.

Cost effectiveness Expensive Economical

Collection time Long Short

Specific Always specific to the May or may not be specific to the


researcher's needs. researcher's need.

Available in Crude form Refined form

Accuracy and More Relatively less


Reliability

Questionnaire design
Questionnaire design is the process of designing the format and questions in the survey
instrument that will be used to collect data about a particular phenomenon.

In designing a questionnaire, all the various stages of survey design and implementation should
be considered.

Elements of Questionnaire design:


These include the following nine elements:

www.everstudy.co.in Query: [email protected]


a. determination of goals, objectives, and research questions;
b. definition of key concepts;
c. generation of hypotheses and proposed relationships;
d. choice of survey mode (mail, telephone, face-to-face, Internet);
e. question construction;
f. sampling;
g. questionnaire administration and data collection;
h. data summarization and analysis;
i. Conclusions and communication of results.

Points to be considered before forming the Questionnaire design


1. Initial considerations

i. Type of information required

ii. Type/nature of respondents

iii. Type and method by which survey is to be undertaken

2. Question content

i. Relevance of a question

ii. Clarity of a question

iii. Avoid ambiguous, leading, double-barrelled questions

iv. Ability and willingness of a respondent to answer the questions

3. Question phrasing

i.Style appropriate to target population

ii.Short, Clear and unambiguous questions

iii.Avoid biased words and leading questions

www.everstudy.co.in Query: [email protected]


iv.Avoid negative questions

v.Discourage guessing

vi.Do not assume anything for granted from the part of the respondents

4. Types of questions

b. Closed ended questions

i. Dichotomous

ii. Multiple choice (4 to 5 options; neutral point)

iii. Likert scale (Agree or disagree)

iv. Semantic differential (scale connecting bipolar words)

v. Importance scale (importance of some attribute)

vi. Rating scale (Excellent to poor)

c. Open ended questions

i. Completely unstructured

ii. Word association (first word that comes to mind …)

iii. Sentence completion

iv. Story completion

v. Picture completion (filling balloons)

vi. Thematic Apperception Test (relate story to picture)

5. Question sequence

i.Logical order

ii.Avoid questions which suggest answers to later questions (bias)

www.everstudy.co.in Query: [email protected]


6. Questionnaire layout

i. Good quality paper

ii. As short as possible (20-30 questions)

iii. Use lines, boxes, pictures, etc.

iv. Instructions kept to a minimum but user-friendly

v. Purpose of survey explained at the beginning and guarantee of confidentiality

vi. What is to be done with the completed questionnaire?

7. Pre-test, revision and final version of questionnaire

i. Uncover faults

ii. Misprints

iii. Grammatical mistakes

iv. Relevance of questions

v. Expected range of answers

Essentials of a good questionnaire?


Success of this method of collection of data depends mainly on proper drafting of the
questionnaire. You have to keep the following points in mind while preparing a questionnaire:

1. The questionnaires should begin with an effort to awaken the respondents’ interest.

Important target questions should be asked in the middle of the opinion survey.

2. The respondent should not take much time in completing the questionnaire. It should be

small and not lengthy.

3. The questions asked should be well structured and unambiguous.

www.everstudy.co.in Query: [email protected]


4. The questions asked should be in a proper logical sequence.

5. Questions should be unbiased. The questions in the questionnaire should not disturb the

privacy of the respondents.

6. The questionnaire should not have much writing work.

7. Necessary instructions and glossary should be given in covering letter.

8. Questions involving technological jargons and mathematical calculations should be

avoided.

9. All the questions related to personal information (name, income, phone, address etc) of

the respondents should be either optional or asked in the last section of the questionnaires.

10. A pilot test should be conducted to detect the weakness in the questionnaires designed.

Steps in Questionnaire Design:


The task of composing questionnaire may be considered more an art than a science. It needs a
great deal of experience, expertise, and creativity.

Determine the Data to be collected

Determine the Method to be used for Data Collection

Evaluate the Contents of the Question

Decide on Type of Questions and Response Format

Decide on Wording of Questions

Determine on Questionnaire Structure or Physical Format

Pretest, Review and Final Draft

www.everstudy.co.in Query: [email protected]


Statistical Survey
A Statistical Survey is a scientific process of collection and analysis of numerical data.

Surveys differ from each other with regard to their purpose, field of study, scope, and the source
of information. The standard tools for any statistical study are:

• relevance

• timeliness

• accuracy of data gathered

Surveys are used by businesses to:

• Assess the level of their customer satisfaction

• Find out what products their customers choose

• Determine which section of the population is buying their products

Stages of Statistical surveys


Statistical surveys involve two stages namely –

1. Planning and
2. Execution.

Planning: A properly planned investigation can lead to the best results with least cost and
time. There are five steps involved in planning the survey.

Steps involved in planning phase:

www.everstudy.co.in Query: [email protected]


Identify the nature of the problem

State the objectives of investigation

Define the scope of the investigation

Identify the type of data

Organize the investigation


Execution phase :
 In Execution phase, controlled methods should be adopted at every stage of survey to
check the accuracy, coverage, methods of measurements, analysis and interpretation.
 The collected data should be edited, classified, tabulated and presented in the form of
diagrams and graphs.
 The data should be carefully and systematically analyzed and interpreted.

Sampling – Concept, Process and Techniques

Sampling:
The process of selecting a number of individuals for a study in such a way that the individuals
represent the larger group from which they were selected

www.everstudy.co.in Query: [email protected]


➢ Sample: A sample is “a smaller (but hopefully representative) collection of units from a
population used to determine truths about that population”
➢ Sampling Frame: A list of all elements or other units containing the elements in a
population. The sampling frame must be representative of the population
➢ Population: The larger group from which individuals are selected to participate in a
study
➢ Target population: A set of elements larger than or different from the population
sampled and to which the researcher would like to generalize study findings.

Process of sampling: The sampling process comprises several stages:

Defining the population of concern

Specifying a sampling frame, a set of items or events possible to measure

Specifying a sampling method for selecting items or events from the frame

Determining the sample size

Implementing the sampling plan

Sampling and data collecting

Reviewing the sampling process

Types of Sample:

www.everstudy.co.in Query: [email protected]


Probability (Random) Samples Non-Probability Samples

Simple random sample Convenience sample


Systematic random
Purposive sample
sample
Stratified random sample Quota

Cluster sample

1. Simple random sample:


➢ It is applicable when population is small, homogeneous & readily available
➢ All subsets of the frame are given an equal probability.
➢ Each element of the frame thus has an equal probability of selection.
➢ It provides for greatest number of possible samples. This is done by assigning a number
to each unit in the sampling frame.
➢ A table of random number or lottery system is used to determine which units are to be
selected.

Pros: Cons:

 Estimates are easy to calculate.  If sampling frame large, this method


impracticable.
 Simple random sampling is always
an EPS design, but not all EPS designs  Minority subgroups of interest in
are simple random sampling. population may not be present in sample
in sufficient numbers for study.

www.everstudy.co.in Query: [email protected]


2. Systematic random sample:

 It is applicable when the given population is logically homogenous

 Systematic sampling relies on arranging the target population according to some


ordering scheme and then selecting elements at regular intervals through that ordered list.

 It involves a random start and then proceeds with the selection of every kth element from
then onwards. In this case, k=(population size/sample size).

 It is important that the starting point is not automatically the first in the list, but is instead
randomly chosen from within the first to the kth element in the list.

 In a systematic sample, after you decide the sample size, arrange the elements of the
population in some order and select terms at regular intervals from the list.

 A simple example would be to select every 10th name from the telephone directory (an
'every 10th' sample, also referred to as 'sampling with a skip of 10').

ADVANTAGES: DISADVANTAGES:

 Sample easy to select  The possible weakness of the


method that may compromise the
 Suitable sampling frame can be randomness of the sample is an inherent
identified easily periodicity of the list i.e. Sample may be

www.everstudy.co.in Query: [email protected]


 Sample evenly spread over entire biased.
reference population
 Difficult to assess precision of
estimate from one survey.

3. Stratified random sample


 It is applicable when we can divide our population into characteristics of importance
for the research.
 The population is divided into two or more groups called strata, according to some
criterion, such as geographic location, grade level, age, or income, and subsamples are
randomly selected from each strata.
 Every unit in a stratum has same chance of being selected.
 Adequate representation of minority subgroups of interest can be ensured by stratification
& varying sampling fraction between strata as required.

ADVANTAGES: DISADVANTAGES:

 More accurate sample  Identification of all members of


the population can be difficult
 Can be used for both proportional
and non- proportional samples
 Identifying members of all
 Representation of subgroups in subgroups can be difficult.
the sample

www.everstudy.co.in Query: [email protected]


4. Cluster Sampling:

 Cluster sampling is an example of 'two-stage sampling' .

 The process of randomly selecting intact groups, not individuals, within the defined
population sharing similar characteristics

 Clusters are locations within which an intact group of members of the population can be
found

Examples: Neighborhood, School districts, Schools. Classrooms etc

Selection process

 First stage a sample of areas is chosen;

 Second stage a sample of respondents within those areas is selected.

 Population divided into clusters of homogeneous units, usually based on geographical


contiguity.

 Sampling units are groups rather than individuals.

 A sample of such clusters is then selected.

 All units from the selected clusters are studied.

There are two types of cluster sampling methods:

1. One-stage sampling: All of the elements within selected clusters are included in the
sample.
2. Two-stage sampling: A subset of elements within selected clusters is randomly selected
for inclusion in the sample.

www.everstudy.co.in Query: [email protected]


One-stage sampling. Two-stage sampling

5. Multi-Stage Sampling

 It is the combination of one or more methods described above.


 Population is divided into multiple clusters and then these clusters are further divided and
grouped into various sub groups (strata) based on similarity.
 One or more clusters can be randomly selected from each stratum. This process continues
until the cluster can’t be divided anymore.
 For example country can be divided into states, cities, urban and rural and all the areas
with similar characteristics can be merged together to form a strata.

www.everstudy.co.in Query: [email protected]


Non- probability sampling:

1. Convenience Sampling

 The process of including whoever happens to be available at the time that is, readily
available and convenient .

 Sometimes also known as grab or opportunity sampling or accidental or haphazard


sampling.

 The researcher using such a sample cannot scientifically make generalizations about the
total population from this sample because it would not be representative enough.

 For example, if the interviewer was to conduct a survey at a shopping center early in the
morning on a given day, the people that he/she could interview would be limited to those given
there at that given time, which would not represent the views of other members of society in
such an area, if the survey was to be conducted at different times of day and several times per
week.

 This type of sampling is most useful for pilot testing.

 In social science research, snowball sampling is a similar technique, where existing study
subjects are used to recruit more subjects into the sample.

Advantages: Disadvantages

The sample is created quickly without Difficulty in determining how much of


adding any additional burden on the the effect (dependent variable) results
available resources. from the cause (independent variable)

2. Purposive sample:

 The researcher chooses the sample based on who they think would be appropriate for the
study.

www.everstudy.co.in Query: [email protected]


 This is used primarily when there is a limited number of people that have expertise in
the area being researched
 It is the process whereby the researcher selects a sample based on experience or
knowledge of the group to be sampled
 It is also called “judgment” sampling

Advantages: Disadvantages

 Judgment sampling is less


time consuming than other sampling  Judgment sampling is prone to
techniques researcher bias.
 Judgment sampling allows  Potential for inaccuracy in the
researchers to go directly to their researcher’s criteria and resulting
target population of interest. sample selections

3. Quota Sampling

 Quota sampling is the non-probability equivalent of stratified sampling that we discussed


earlier.

 It starts with characterizing the population based on certain desired features and assigns a
quota to each subset of the population.

 The population is first segmented into mutually exclusive sub-groups, just as in stratified
sampling.

 Then judgment used to select subjects or units from each segment based on a specified
proportion.

 For example, an interviewer may be told to sample 200 females and 300 males between
the age of 45 and 60.

 It is this second step which makes the technique one of non-probability sampling.

 In quota sampling the selection of the sample is non-random.

www.everstudy.co.in Query: [email protected]


Advantage Disadvantage

 This process can be extended to  People who are less accessible


cover several characteristics and varying (more difficult to contact, more reluctant
degrees of complexity. to participate) are under-represented

4. Snowball Sampling

 Just as the snowball rolls and gathers mass, the sample constructed in this way will grow
in size as you move through the process of conducting a survey.
 In this technique, you rely on your initial respondents to refer you to the next respondents
whom you may connect with for the purpose of your survey.
 Snowball sampling can be useful when you need the sample to reflect certain features
that are difficult to find.
 To conduct a survey of people who go jogging in a certain park every morning, for
example, snowball sampling would be a quick, accurate way to create the sample.

Advantage: Disadvantage:
 The costs associated with this  The clear downside of this
method are significantly lower, and you approach is that you may restrict
will end up with a sample that is very yourself to only a small, largely
relevant to your study. homogenous section of the population.

Hypothesis Testing
The Hypothesis is an assumption which is tested to check whether the inference drawn from the
sample of data stand true for the entire population or not.

Hypothesis Testing Procedure

The following steps are followed in hypothesis testing:


www.everstudy.co.in Query: [email protected]
1. Set up a Hypothesis:

 The first step is to establish the hypothesis to be tested.


 The statistical hypothesis is an assumption about the value of some unknown parameter,
and the hypothesis provides some numerical value or range of values for the parameter.
 Here two hypotheses about the population are constructed Null
Hypothesis and Alternative Hypothesis.
 The Null Hypothesis denoted by H0 asserts that there is no true difference between the
sample of data and the population parameter and that the difference is accidental which is
caused due to the fluctuations in sampling. Thus, a null hypothesis states that

H0 = there is no difference between the assumed and actual value of the parameter.

 The alternative hypothesis denoted by H1 is the other hypothesis about the population,
which stands true if the null hypothesis is rejected.

www.everstudy.co.in Query: [email protected]


 Thus, if we reject H0 then the alternative hypothesis H1 gets accepted.

HYPOTHESIS
TESTING

Alternative
Null hypothesis, H0
hypothesis,HA

State the hypothesized value of the All possible alternatives other than the
parameter before sampling null hypothesis.
The assumption we wish to test (or the E.g µ ≠ 20
assumption we are trying to reject) . µ > 20
E.g population mean µ = 20 µ < 20
There is no difference between coke There is a difference between coke and
and diet coke diet coke

2. Set up a Suitable Significance Level:

 Once the hypothesis about the population is constructed the researcher has to decide the
level of significance, i.e. a confidence level with which the null hypothesis is accepted or
rejected.
 The significance level is denoted by ‘α’ and is usually defined before the samples are
drawn such that results obtained do not influence the choice.
 In practice, we either take 5% or 1% level of significance.

3. Determining a Suitable Test Statistic:

 After the hypothesis is constructed, and the significance level is decided upon, the next
step is to determine a suitable test statistic and its distribution.
 Most of the statistic tests assume the following form:

www.everstudy.co.in Query: [email protected]


4. Determining the Critical Region:

 Before the samples are drawn it must be decided that which values to the test
statistic will lead to the acceptance of H0 and which will lead to its rejection.
 The values that lead to rejection of H0 are called the critical region.

5. Performing Computations:

 Once the critical region is identified, we compute several values for the random sample
of size ‘n.’
 Then we will apply the formula of the test statistic as shown in step (3) to check whether
the sample results falls in the acceptance region or the rejection region.

6. Decision-making:

 Once all the steps are performed, the statistical conclusions can be drawn, and the
management can take decisions.
 The decision involves either accepting the null hypothesis or rejecting it.

www.everstudy.co.in Query: [email protected]


 The decision that the null hypothesis is accepted or rejected depends on whether the
computed value falls in the acceptance region or the rejection region.

Thus, to test the hypothesis, it is necessary to follow these steps systematically so that the
results obtained are accurate and do not suffer from either of the statistical error Viz. Type-I
error and Type-II error.

Type I and Type II Errors


Type I error refers to the situation when we reject the null hypothesis when it is true (H0 is
wrongly rejected).

For example

H0: there is no difference between the two drugs on average.


 Type I error will occur if we conclude that the two drugs produce different effects when
actually there isn’t a difference.
 The probability of making a Type I error when the null hypothesis is true as an equality is
called the level of significance.
 Applications of hypothesis testing that only control the Type I error are often called
significance tests.
 Prob(Type I error) = significance level = α 2

Type II error

 Type II error refers to the situation when we accept the null hypothesis when it is false.
 H0: there is no difference between the two drugs on average. Type II error will occur if
we conclude that the two drugs produce the same effect when actually there is a difference.
 Prob(Type II error) = ß

 It is difficult to control for the probability of making a Type II error.


 Statisticians avoid the risk of making a Type II error by using “do not reject H0” and not
“accept H0”.

www.everstudy.co.in Query: [email protected]


One tailed Test and Two Tail Test
Two tailed test
 Two tailed test will reject the null hypothesis if the sample mean is significantly higher or
lower than the hypothesized mean.
 Appropriate when H0 : µ = µ0 and HA: µ ≠ µ0

One Tail Test


 A one-sided test is a statistical hypothesis test in which the values for which we can reject
the null hypothesis, H0 are located entirely in one tail of the probability distribution.
 Lower tailed test will reject the null hypothesis if the sample mean is significantly lower
than the hypothesized mean.
 Appropriate when H0 : µ = µ0 and HA: µ < µ0

 One Tail Test Upper tailed test will reject the null hypothesis if the sample mean is
significantly higher than the hypothesized mean.
 Appropriate when H0 : µ = µ0 and HA: µ > µ0

T test:
 The T-statistic was introduced by W.S. Gossett under the pen name “Student”
 Developed T test around 1905, for dealing with small samples in brewing quality control
which was Published in 1908
 T test is used to compare two samples to determine if they came from the same
population

www.everstudy.co.in Query: [email protected]


Conditions for T-test:
1. Limited sample size (n < 30)

2. Variables are approximately normally distributed

3. The sample observations are random and independent.

4. If the populations’ standard deviation is unknown

5. If the standard deviation is known, best to use Z-test

Application of T Test
1. Test of Hypothesis about population(One sample t-Test )

2. Difference between the 2 means in case of independent sample(Independent samples t-


Test )

3. Difference between the 2 means in case of dependent sample(Correlated/Paired/Repeated


Measures t-test )

Degrees of Freedom and t test


 Degrees of freedom describe the number of scores in a sample that are free to vary.
 degrees of freedom = df = n-1

 Larger the degrees of freedom, the more it approximates the normal distribution.
 The curve doesn’t touches X axis

www.everstudy.co.in Query: [email protected]


How many
Samples

Population How are


Parameters samples
Known Related

Independent Dependent
One Sample T
Z- Test Samples T Samples T
Test
Test Test

www.everstudy.co.in Query: [email protected]


One sample t-test:
H0 : µ = µ0
Test if a sample mean for a variable differs significantly from the given
population with a known mean

Unpaired- or independent samples- t-test:


H0 : µ1 = µ2
Test if the population means estimated by 2 independent samples differ
significantly (e.g. group of male and group of females)

Paired- or dependent- samples t-test:


H0 : µ1 = µ2
Test if the population means estimated by dependent samples differ significantly
(e.g. mean of pre and post treatment for same set of patients)

Test Statistics for T Test:

One sample T Test Independent Sample T Test Paired Sample T Test

www.everstudy.co.in Query: [email protected]


Z test
 Given by Prof. Fisher

 The Z-test is applied to compare sample and population means to know if there’s a
significant difference between them.

 Z-Test is used when the coefficient of correlation is not zero

 Location tests are the most familiar to z test

 Z-test based on standard Normal Distribution

 Z test is also called as Standard Normal deviate Test, Standard Normal Test,
approximate Test and Large Sample Test

Conditions of Z Test
1. Data points should be independent from each other

2. Z-test is preferable when n is greater than 30

3. The variances of the samples should be the same

4. Population variance is known

5. All individuals must be selected at random from the population

6. All individuals must have equal chance of being selected

Application of Z Test
1. Test of significance for single mean

2. Test of significance for difference of means

3. Test of significance for difference of standard deviation (s.d.)

4. Testing a Claim about a Proportion

5. Testing difference of Two proportions

www.everstudy.co.in Query: [email protected]


Conditions for acceptance/rejection of null hypothesis

1. If the Table value > Calculated value, we accept the Null Hypothesis

2. If the Table value < Calculated value, we Reject the Null Hypothesis

Table Values

Level of significance 0.10 0.05 0.01 0.005

1 Tailed Test ±1.28 ±1.645 ±2.33 ±2.58

2 Tailed Test ±1.645 ±1.96 ±2.58 ±2.81

Test of significance for single mean

Test of significance for difference of


means

www.everstudy.co.in Query: [email protected]


Test of significance for difference of
standard deviation (s.d.)

Testing a Claim about a Proportion

Testing difference of Two proportions

T test v/s Z Test


 Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a
Student’s T-distribution.

www.everstudy.co.in Query: [email protected]


 A T-test is appropriate when handling small samples (n<30) while a Z-test is appropriate
when handling moderate to large samples (n > 30).

 T-test is more adaptable than Z-test since Z-test will often require certain conditions to be
reliable.

 Additionally, T-test has many methods that will suit any need. T-tests are more commonly
used than Z-tests.

 Z-tests are preferred than T-tests when standard deviations are known.

ANOVA Test:
 Analysis of Variance (ANOVA) is a parametric statistical technique used to compare
datasets.

 This technique was invented by R.A. Fisher, in 1920 and is thus often referred to as Fisher’s
ANOVA, as well.

 It is similar in application to techniques such as t-test and z-test, in that it is used to compare
means and the relative variance between them.

 However, analysis of variance (ANOVA) is best applied where more than 2 populations
or samples are meant to be compared.

 F>1, means. →Numeration should be greater than denomination-because value of F should


be always greater than one.

 So F test never be in negative because of square and numerator is always greater than
denominator.

 ƒ = large variance /smaller variance = (s1)2 /(s2)2

 F Test is mainly arise when the models have been shifted to the data using to least square

Assumptions of ANOVA test


1. The population must be close to a normal distribution.(Normality)

2. Samples must be independent.

3. Population variances must be equal. (Homogeneity)

www.everstudy.co.in Query: [email protected]


4. Groups must have equal sample sizes.

Types of t-tests

One way analysis: When we are comparing more than three groups based on one
factor variable, then it said to be one way analysis of variance (ANOVA).

For example, if we want to compare whether or not the mean output of three workers
is the same based on the working hours of the three workers.

Two way analysis: When factor variables are more than two, then it is said to be two
way analysis of variance (ANOVA).

For example, based on working condition and working hours, we can compare
whether or not the mean output of three workers is the same.

Steps ANOVA

Define the null and alternative hypothesis

State Alpha

Calculate degrees of Freedom

State decision rule

Calculate test statistic


• - Calculate variance between samples
• - Calculate variance within the samples
• - Calculate ratio F
• - If F is significant, perform post hoc test
6. State Results & conclusion
• Critical Value is looked up from the F table
• If calculated F value > Critical Value-Ho is Rejected otherwise it is accepted

www.everstudy.co.in Query: [email protected]


N- Total Observations (Total sample size)
K- Number of groups
SSb - Sum of Square between the groups
SSW - Sum of Square within the group
MSSW -Mean sum of Square within the group
MSSb- Mean sum of Square within the group
Post Hoc Analysis in ANOVA

 If we reject the null hypothesis, all know is that there is a difference somewhere among
(between) the groups but we don’t know where the differences are ???

 Additional tests called Post Hoc tests can be done to determine where differences lie.

 It may be between first and second or second and third or may be between all of them.

Chi-Square Test
✓ The chi-square test is an important test amongst the several tests of significance
developed by statistician Karl Pearson in1900.

www.everstudy.co.in Query: [email protected]


✓ A non parametric test.

✓ Measures the differences between what is observed(Oi) and what is expected(Ei)

✓ It is denoted by the sign- X2

✓ The distributions are positively skewed. The research hypothesis for the chi-square is
always a one-tailed test.

✓ As the number of degrees of freedom increases, the distribution X2 becomes more


symmetrical.

Conditions for the application of X 2 test


1. All the observation must be independent.

2. All the events must be mutually exclusive.

3. The data must be in the form of frequencies

4. The frequency data must have a precise numerical value and must be organized into
categories or groups.

5. Observations recorded and used are collected on a random basis.

6. No group should contain very few items, say less than 10.

www.everstudy.co.in Query: [email protected]


7. The overall number of items must also be reasonably large. It should normally be at least
50.

Determining the Degrees of Freedom

✓ If there are two classes, three classes, and four classes,

✓ The degree of freedom would be 2-1, 3-1, and 4-1, respectively.

df = n-1

In a contingency table

df = (r – 1)(c – 1)

Where r = the number of rows

c = the number of columns

✓ 2×3 contingency table, d.f= (2-1) (3-1) = 2.

✓ 3×4 contingency table d.f=(3-1) (4-1) = 6,

➢ As the number of degrees of freedom increases, the distribution c2 becomes


more symmetrical.

www.everstudy.co.in Query: [email protected]


Types of Chi-Square Test:

CHI-
SQUARE

Non-
Parametric
Parametric

Test of Test of
Test of Test of
Comparing Goodness of
Independence Homogeneity
Variance fit

Test Of Comparing Variance

Goodness Of Fit

www.everstudy.co.in Query: [email protected]


Test Of Independence

Yates's Correction Factor

Test Of Homogeneity

Test Of  A chi-square test ( Snedecor and Cochran, 1983) can be used to test
Comparing if the variance of a population is equal to a specified value. This test can
Variance be either a two-sided test or a one-sided test.

Goodness of fit  In Chi-Square goodness of fit test, the term goodness of fit is used
to compare the observed sample distribution with the expected
probability distribution.

Test of  Test enables us to explain whether or not two attributes are

www.everstudy.co.in Query: [email protected]


independence associated.

Yates's Correction  When Degree of freedom is 1 i.e. In 2*2 contingency table and
Factor N<50, adjust χ2 by Yates's Correction Factor

Test of  This test determines if two or more populations (or subgroups of a


homogeneity population) have the same distribution of a single categorical variable.

Decision rule:

✓ If X2 (calculated) > X2(tabulated), then null hypothesis is rejected otherwise


accepted.

Correlation:
 The degree of relationship between the variables under consideration is measure through
the correlation analysis. „
 The measure of correlation called the correlation coefficient.
 The degree of relationship is expressed by coefficient which range from correlation ( -1 ≤
r ≥ +1) „
 The direction of change is indicated by a sign. „
 The correlation analysis enables us to have an idea about the degree & direction of the
relationship between the two variables under study.
 Correlation is a statistical tool that helps to measure and analyze the degree of
relationship between two variables. „
 Correlation analysis deals with the association between two or more variables.

www.everstudy.co.in Query: [email protected]


Types of Correlation:
On the basis of Degree of Correlation
1. Positive Correlation: The correlation is said to be positive correlation if the values of
two variables changing with same direction. As X is increasing, Y is increasing „ As X is
decreasing, Y is decreasing

Example. Expenses & sales, Height & weight. „

2. Negative Correlation: The correlation is said to be negative correlation when the values
of variables change with opposite direction. As X is increasing, Y is decreasing „ As X is
decreasing, Y is increasing. Ex. Price & qty. demanded.
3. No correlation: There might be the case when there is no change in a variable with any
change in another variable. In this case, it is defined as no correlation between the two.

Correlatio
n
On the On the
basis of On the
basis of
Degree of basis of
number of
Correlatio linerity
variables
n
Positive Negitive Partial Linear Non-
Simple Multiple
Correlatio Correlatio Correlatio Correlatio Linear
correlation correlation
n n n n correlation

On the basis of number of variables


Simple, Partial and Multiple Correlations:

www.everstudy.co.in Query: [email protected]


 Whether the correlation is simple, partial or multiple depends on the number of
variables studied.
 The correlation is said to be simple when only two variables are studied.
 The correlation is either multiple or partial when three or more variables are studied.
 Multiple Correlations: The correlation is said to be Multiple when three variables are
studied simultaneously.
 Such as, if we want to study the relationship between the yield of wheat per acre and the
amount of fertilizers and rainfall used, then it is a problem of multiple correlations.
 Partial Correlation: Whereas, in the case of a partial correlation we study more than
two variables, but consider only two among them that would be influencing each other such
that the effect of the other influencing variable is kept constant.
 Such as, in the above example, if we study the relationship between the yield and
fertilizers used during the periods when certain average temperature existed, then it is a problem
of partial correlation

On the basis of linearity.


1. Linear Correlation: The correlation is said to be linear when the amount of change in
one variable to the amount of change in another variable tends to bear a constant ratio.

For example, from the values of two variables given below, it is clear that the ratio of change
between the variables is the same:

X: 10 20 30 40 50

Y: 20 40 60 80 100

2. Non – Linear correlation: The correlation is called as non-linear or curvilinear when the
amount of change in one variable does not bear a constant ratio to the amount of change in the
other variable. For example, if the amount of fertilizers is doubled the yield of wheat would not
be necessarily being doubled.

Methods / Measures of correlation:


1. Scatter Diagram
2. Method Karl Pearson’s Coefficient of Correlation
3. Spearman’s Rank Correlation Coefficient;

Scatter Diagram:

www.everstudy.co.in Query: [email protected]


 Scatter Diagram is a graph of observed plotted points where each points represents the
values of X & Y as a coordinate.
 It portrays the relationship between these two variables graphically.
 If the line goes upward and this upward movement is from left to right it will show
positive correlation.
 Similarly, if the lines move downward and its direction is from left to right, it will
show negative correlation.
 The degree of slope will indicate the degree of correlation.

Karl Pearson's Coefficient of Correlation


 Pearson’s ‘r’ is the most common correlation coefficient. „
 Karl Pearson’s Coefficient of Correlation denoted by- ‘r’
 The coefficient of correlation ‘r’ measure the degree of linear relationship between two
variables say x & y.
 Karl Pearson’s Coefficient of Correlation denoted by- r -1 ≤ r ≥ +1

www.everstudy.co.in Query: [email protected]


 Degree of Correlation is expressed by a value of Coefficient „
 Direction of change is indicated by sign ( - ve) or ( + ve)

r(x, y)= Σxy / √ Σx² Σy²

Properties of Coefficient of Correlation


 The value of the coefficient of correlation (r) always lies between ±1.
 Such as: r=+1, perfect positive correlation
 r=-1, perfect negative correlation
 r=0, no correlation
 The coefficient of correlation is independent of the origin and scale.
 By origin, it means subtracting any non-zero constant from the given value of X and Y
the value of “r” remains unchanged.
 By scale it means, there is no effect on the value of “r” if the value of X and Y is divided
or multiplied by any constant.
 The coefficient of correlation is a geometric mean of two regression coefficient

www.everstudy.co.in Query: [email protected]


 The coefficient of correlation is “ zero” when the variables X and Y are independent. But,
however, the converse is not true.

Probable Error of Correlation Coefficient:


The Probable Error of Correlation Coefficient helps in determining the accuracy and
reliability of the value of the coefficient that in so far depends on the random sampling.

 The probable error of correlation coefficient can be obtained by applying the following
formula:

 r = coefficient of correlation N = number of observations

 Probable Error is used to:

1. Interpret the value of ‘r’


 If r < P.E. then it is not at all significant(No Correlation)
 If r > 6P.E. then r is highly significant
 If P.E. < r < P.E. then we cannot say anything about the significance of r
2. Constant confidence limits within which the correlation in the population p is expressed
in line.
 By adding and subtracting the value of P.E from the value of ‘r,’ we get the upper limit
and the lower limit, respectively within which the correlation of coefficient is expected to lie.
Symbolically, it can be expressed

www.everstudy.co.in Query: [email protected]


where rho denotes the correlation in a population

Conditions under which Probable error is used:


The probable Error can be used only when the following three conditions are fulfilled:

1. The data must approximate to the bell-shaped curve, i.e. a normal frequency curve.
2. The Probable error computed from the statistical measure must have been taken from the
sample.
3. The sample items must be selected in an unbiased manner and must be independent of
each other.

Thus, the probable error is calculated to check the reliability of the value of coefficient
calculated from the random sampling.

Spearman’s Rank Correlation Coefficient


 The Spearman’s Rank Correlation Coefficient is the non-parametric statistical measure
used to study the strength of association between the two ranked variables.
 This method is applied to the ordinal set of numbers, which can be arranged in order,
i.e. one after the other so that ranks can be given to each.
 When statistical series in which the variables under study are not capable of quantitative
measurement but can be arranged in serial order, in such situation Pearson’s correlation
coefficient cannot be used in such case Spearman Rank correlation can be used. „

 R = Rank correlation coefficient „


 D = Difference of rank between paired item in two series. „
 N = Total number of observation

 The value of R lies between ±1 such as:

www.everstudy.co.in Query: [email protected]


 R = +1, there is a complete agreement in the order of ranks and move in the same
direction.
 R= -1, there is a complete agreement in the order of ranks, but are in opposite
directions.
 R =0, there is no association in the ranks.

Types of problems:

1. Where actual Ranks are assigned

2. Where ranks are not assigned

3. Equal Ranks or Tie in Ranks or where ranks are repeated

1. Where actual ranks are assigned :

An individual must follow the following steps to calculate the correlation coefficient:

a) The difference between the ranks (R1-R2) must be calculated, denoted by D.


b) Then, square these differences to remove the negative sign and obtain its sum ∑D2.
c) Substitute the values obtained in the formula.
2. Where ranks are not assigned:
In case the ranks are not given, then the individual may assign the rank by taking either the
highest value or the lowest value as 1. Whatever criteria is being decided the same method
should be applied to all the variables.
3. Equal Ranks or Tie in Ranks or when ranks are repeated:
a) In case the same ranks are assigned to two or more entities, then the ranks are assigned on
an average basis.
b) Such as if two individuals are ranked equal at third position, then the ranks shall be
calculated as: (3+4)/2 = 3.5
c) The formula to calculate the rank correlation coefficient when there is a tie in the ranks is:

www.everstudy.co.in Query: [email protected]


Where m = number of items whose ranks are common.

Regression
➢ Regression analysis is the scientific technique for making such prediction.
➢ M.M. Blair has described Regression analysis as a mathematical measures of the average
relationship two or more variables in terms of the original units of the data.
➢ Regression Analysis: The Regression Analysis is a statistical tool used to determine the
probable change in one variable for the given amount of change in another. It is used to
get the measure of the error involved while using the regression line as a basis for
estimation
➢ It estimates the values of dependent variables from the values of the independent
variable. This means, the value of the unknown variable can be estimated from the known
value of another variable.
➢ Regression Line:
➢ The degree to which the variables are correlated to each other depends on the Regression
Line.
➢ The regression line is a single line that best fits the data, i.e. all the points plotted are
connected via a line in the manner that the distance from the line to the points is the
smallest.

The regression lines have equations,


 Regression line of Y on X: This gives the most probable values of Y from the given
values of X.

www.everstudy.co.in Query: [email protected]


 Regression line of X on Y: This gives the most probable values of X from

Regression Coefficient
 The constant ‘b’ in the regression equation (Ye = a + bX) is called as the Regression
Coefficient.
 It determines the slope of the line, i.e. the change in the value of Y corresponding to the
unit change in X and therefore, it is also called as a “Slope Coefficient.”
 The correlation coefficient is the geometric mean of two regression coefficients.
 r2=byx*bxy
 r = √ byx * bxy

 The value of the coefficient of correlation cannot exceed unity i.e. 1.

byx * bxy ≤ 1

 The sign of both the regression coefficients will be same, i.e. they will be either positive
or negative.
 It is an absolute measure
 The average value of the two regression coefficients will be greater than the value of the
correlation.

Meaning of Probability
www.everstudy.co.in Query: [email protected]
A probability is a measure of the likelihood that an event in the future will happen. It can only
assume a value between 0 and 1. A value near zero means the event is not likely to happen. A
value near one means it is likely. There are three ways of assigning probability: ◦ classical, ◦
empirical, and ◦ subjective.

Basic Statement about probability:


1. The probability, P, of any event or state of nature occurring is greater than or equal to 0 and
less than or equal to 1.

2. The sum of the simple probabilities for all possible outcomes of an activity must equal 1.

3. Probability „p‟ of the happening of an event is also known as probability of success & „q‟
the non-happening of the event as the probability of failure.

4. If P(E) = 1, E is called a certain event & if P(E) = 0, E is called an impossible event

Simple Definitions:

❖ Trial & Event ◦ Example: - Consider an experiment which, though repeated under
essentially identical conditions, does not give unique results but may result in any one of the
several possible outcomes. ◦ Experiment is known as a Trial & the outcomes are known as
Events or Cases.
Example: throwing a die is a Trial & getting 1 (2,3,…,6) is an event.  Tossing a coin is a Trial
& getting Head (H) or Tail (T) is an event.

❖ A probability experiment is a chance process that leads to well-defined results called


outcomes.
❖ An outcome is the result of a single trial of a probability experiment.
❖ A sample space is the set of all possible outcomes of a probability experiment.
❖ An event is the collection of one or more outcomes of an experiment

Three approaches to assigning probabilities:


1. Classical
2. Empirical
3. Subjective

www.everstudy.co.in Query: [email protected]


Mathematical/ Classical/ priori Probability:

▪ Basic assumption of classical approach is that the outcomes of a random experiment are
“equally likely”.
▪ According to Laplace, a French Mathematician: “Probability, is the ratio of the number of
„favorable‟ cases to the total number of equally likely cases”.
▪ If the probability of occurrence of A is denoted by p(A), then by this definition, we have:

Limitations of Classical definition:

1. Classical probability is often called a priori probability because if one keeps using orderly
examples of unbiased dice, fair coin, etc. one can state the answer in advance (a priori)
without rolling a dice, tossing a coin etc.
2. Classical definition of probability is not very satisfactory because of the following reasons:
• It fails when the number of possible outcomes of the experiment is infinite.
• It is based on the cases which are “equally likely” and as such cannot be applied to
experiments where the outcomes are not equally likely.

www.everstudy.co.in Query: [email protected]


Relative/ Statistical/ Empirical Probability:
▪ Empirical Probability of an event is an "estimate" that the event will happen based on
how often the event occurs after collecting data or running an experiment (in a large number of
trials).
▪ It is based specifically on direct observations or experiences.

Empirical Probability Formula:

P(E) = probability that an event, E, will occur. n(E) = number of equally likely outcomes of E.
n(S) = number of equally likely outcomes of sample space S.

Limitations of Statistical/ Empirical method:

✓ The Empirical probability P(A) defined earlier can never be obtained in practice and we
can only attempt at a close estimate of P(A) by making N sufficiently large.
✓ The experimental conditions may not remain essentially homogeneous and identical in a
large number of repetitions of the experiment.
✓ The relative frequency of m/N, may not attain a unique value, no matter however large N
may be.

Subjective Probability:

The probability of a particular event happening that is assigned by an individual based on


whatever information is available. If there is little or no past experience or information on
which to base a probability, it may be arrived at subjectively.

Examples of subjective probability are:


1. Estimating the likelihood the Singapore Soccer Team make it to World Cup.
2. Estimating the likelihood that your neighbor will be married before the age of 30.

www.everstudy.co.in Query: [email protected]


3. Estimating the likelihood the Singapore budget surplus would exceed $5B next year.

The Bayes Rule/ Theorem

➢ The Bayes Theorem was developed and named after English statistician Thomas
Bayes(1702-1761), who discovered the formula in 1763.

➢ Show the Relation between one conditional probability and its inverse.

➢ Provide a mathematical rule for revising an estimate or forecast in light of experience and
observation.

➢ It is considered the foundation of the special statistical inference approach called the
Bayes’ inference.

➢ The Bayes’ theorem describes the probability of an event based on prior knowledge of
the conditions that might be relevant to the event.

Explanation: Bayes' theorem thus gives the probability of an event based on new information
that is, or may be related, to that event. The formula can also be used to see how the probability
of an event occurring is affected by hypothetical new information, supposing the new
information will turn out to be true.
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule)
describes the probability of an event, based on conditions that might be related to the event.
➢ a theorem about conditional probabilities:

➢ the probability that an event A occurs given that another event B has already occurred is
equal to the probability that the event B occurs given that A has already occurred multiplied by
the probability of occurrence of event A and divided by the probability of occurrence of event B

Where A and B are events:


➢ P(A) and P(B) are the probabilities of A and B without regard to each other.

www.everstudy.co.in Query: [email protected]


➢ P(A | B), a conditional probability, is the probability of observing event A given that B is
true.

➢ P(B | A) is the probability of observing event B given that A is true.

where:
P(A) is the probability of A occurring;
P(B) is the probability of B occurring;
P(A∣B) is the probability of A given B;
P(B∣A) is the probability of B given A; and
P(A⋂B)) is the probability of both A and B occurring.
When to Apply Bayes' Theorem:
Bayes' theorem should be considered when the following conditions exist:
➢ Within the sample space, there exists an event B, for which P(B) > 0.
➢ The analytical goal is to compute a conditional probability of the form: P( Ak | B ).
➢ at least one of the two sets of probabilities described below should be known:

• P( Ak ∩ B ) for each Ak

• P( Ak ) and P( B | Ak ) for each Ak

Example of Bayes theorem:

www.everstudy.co.in Query: [email protected]


➢ Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years,
it has rained only 5 days each year.

➢ Unfortunately, the weatherman has predicted rain for tomorrow.

➢ When it actually rains, the weatherman correctly forecasts rain 90% of the time.

➢ When it doesn't rain, he incorrectly forecasts rain 10% of the time. What is the
probability that it will rain on the day of Marie's wedding?

➢ The sample space is defined by two mutually-exclusive events – it rains or it does not
rain.

➢ Additionally, a third event occurs when the weatherman predicts rain.

Notation for these events appears below.


Event A1 --It rains on Marie's wedding.
Event A2 --It does not rain on Marie's wedding.
Event B --The weatherman predicts rain.
✓ In terms of probabilities, we know the following:

✓ P( A1 ) = 5/365 =0.0136985 [It rains 5 days out of the year.]

✓ P( A2 ) = 360/365 = 0.9863014 [It does not rain 360 days out of the year.]

✓ P( B | A1 ) = 0.9 [When it rains, the weatherman predicts rain 90% of the time.]

✓ P( B | A2 ) = 0.1 [When it does not rain, the weatherman predicts rain 10% of the time.]

We want to know P( A1 | B ), the probability it will rain on the day of Marie's wedding, given a
forecast for rain by the weatherman. The answer can be determined from Bayes' theorem, as
shown below.

www.everstudy.co.in Query: [email protected]


The Kruskal-Wallis Test
• A nonparametric test that can be used to determine whether three or more independent
samples were selected from populations having the same distribution.

• It is also called Kruskal-Wallis H test.

• Kruskal-Wallis was presented by : William Kruskal and W. Allen Wallis

• The null and alternative hypotheses for the Kruskal- Wallis test are as follows.

• H0: There is no difference in the distribution of the populations.

• Ha: There is a difference in the distribution of the populations.

The conditions for using the Kruskal-Wallis test are :


1. Each sample must be randomly selected.

2. The size of each sample must be at least 5.

3. Your observations should be independent.

4. Your two variables should be measured on an ordinal scale or a continuous scale (i.e., an
interval or ratio scale).

If these conditions are met, the test is approximated by a chi-square distribution with k – 1
degrees of freedom where k is the number of samples.
Given three or more independent samples, the test statistic H for the Kruskal-Wallis test is

www.everstudy.co.in Query: [email protected]


Where,
k - Number of samples,
ni - size of the ith sample,
n - Sum of the sample sizes,
Ri - the sum of the ranks of the ith sample.
Test Statistic for the Kruskal-Wallis Test:
H0: the k distributions are identical versus
Ha: at least one distribution is different
When H0 is true, the test statistic H has an approximate chi-square distribution with df = k-1.

Mann Whitney U-test


➢ The Mann Whitney U-test is a nonparametric test

➢ It is commonly portrayed as the non-parametric substitute for Student's t-test when


samples are not normally distributed.

➢ The Mann-Whitney U-test is used to test whether two independent samples of


observations are drawn from the same or identical distributions.

➢ An advantage with this test is that the two samples under consideration do not necessarily
need to have the same number of observations or instances.

The test has two important assumptions.


1. The first is that the two samples under consideration are random, and are independent
of each other, as are the observations within each sample.

2. The second is that the observations are numeric or ordinal (i.e. arranged in ranks/orders)

www.everstudy.co.in Query: [email protected]


Hypothesis on Equality of Medians: Often this statistic is used to compare a hypothesis
regarding equality of medians. Since the U statistic tests if two samples are drawn from
identical populations, we can also use it to test whether two group medians are equal.

n1 (n1 + 1)
U1 = (n1 )(n2 ) + −  R1
2

n2 (n2 + 1)
U 2 = (n1 )(n2 ) + −  R2
2

Concept of Research:

▪ Research is an investigative process of finding reliable solution to a problem through a


systematic selection, collection, analysis and interpretation of data relating to the problem
▪ In order words research is all activities that makes us discover new knowledge about
things around us.

Characteristics of Research
1. Research is a Systematic and critical Investigation, into a Phenomenon.

2. It adopts Scientific method.

www.everstudy.co.in Query: [email protected]


3. It is Objective and Logical.

4. It is based up on Observable Experience or empirical evidence.

5. Research is directed towards finding answers to pertinent questions and solutions to


Problems.

6. It emphasizes the development of generalization, principles or theories.

Types of Research

Business research methods can be defined as “a systematic ad scientific procedure of data


collection, compilation, analysis, interpretation, and implication pertaining to any business
problem”.
Types of research methods can be classified into several categories according to the nature and
purpose of the study and other attributes.

General Classification of Types of Research Methods


Types of research methods can be broadly divided into two quantitative and qualitative
categories.

• Quantitative research “describes, infers, and resolves problems using numbers.


Emphasis is placed on the collection of numerical data, the summary of those data and
the drawing of inferences from the data”.

• Qualitative research, on the other hand, is based on words, feelings, emotions, sounds
and other non-numerical and unquantifiable elements. It has been noted that “information
is considered qualitative in nature if it cannot be analysed by means of mathematical
techniques. This characteristic may also mean that an incident does not take place often
enough to allow reliable data to be collected”

Types of Research Methods According to Nature of the Study


Types of the research methods according to the nature of research can be divided into two
groups: descriptive and analytical.

• Descriptive research usually involves surveys and studies that aim to identify the facts.
In other words, descriptive research mainly deals with the “description of the state of
affairs as it is at present” and there is no control over variables in descriptive research.

www.everstudy.co.in Query: [email protected]


• Analytical research, on the other hand, is fundamentally different in a way that “the
researcher has to use facts or information already available and analyse these in order t o
make a critical evaluation of the material”.

Types of Research Methods According to the Purpose of the Study


According to the purpose of the study, types of research methods can be divided into two
categories: applied research and fundamental research.

Applied research is also referred to as an action research, and the fundamental research is
sometimes called basic or pure research.

The table below summarizes the main differences between applied research and fundamental
research. Similarities between applied and fundamental (basic) research relate to the adoption of
a systematic and scientific procedure to conduct the study.
Applied Research Fundamental Research

§ Aims to solve a problem by adding to the


field of application of a discipline
§ Tries to eliminate the theory by adding to
§ Often several disciplines work together
the basics of a discipline
for solving the problem
§ Problems are analysed from the point of one
§ Often researches individual cases without
discipline
the aim to generalise
§ Generalisations are preferred
§ Aims to say how things can be changed
§ Forecasting approach is implemented
§ Acknowledges that other variables are
§ Assumes that other variables do not change constant by changing
§ Reports are compiled in a language of § Reports are compiled in a common
technical language of discipline language

Differences between applied and fundamental research

Types of Research Methods according to Research Design


On the basis of research design the types of research methods can be divided into two groups –
exploratory and conclusive.
• Exploratory studies only aim to explore the research area and they do not attempt to
offer final and conclusive answers to research questions.

www.everstudy.co.in Query: [email protected]


• Conclusive studies, on the contrary, aim to provide final and conclusive answers to
research questions.

Table below illustrates the main differences between exploratory and conclusive research
designs:

Exploratory research Conclusive research

Well structured and systematic in


Structure Loosely structured in desing design

Have a formal and definitive


Are flexible and investigative in methodology that needs to be
Methodology methodology followed and tested

Most conclusive researches are


Do not involve testing of carried out to test the formulated
Hypotheses hypotheses hypotheses

Findings might be topic specific


and might not have much
relevance outside of researcher’s Findings are significant as they have
Findings domain a theoretical or applied implication

Research Design
▪ Research design is a pre-planned sketch for the explanation of a problem. It is the first
step to take and the whole research.
▪ RESEARCH DESIGN refers to the plan, structure, and strategy of research--the
blueprint that will guide the research process.
▪ Study will conduct on the basis of this research design.
▪ It gives us a due that how the further process would be taking place and how would be
the research study carry into classification, interpretation and suggestions.
▪ This is a guideline for the whole work.

www.everstudy.co.in Query: [email protected]


Good research design should fulfill the following features:
• Good research design must be realistic, workable, appropriate and able to give us
intended information.
• Design must be flexible, efficient, thrift and parsimonious to economy.
• Design should be consistent with the research capability of the researcher or feasible for
the research.
• Research design must be based on and synchronize with the purpose of the research
problem.
• It must be flexible so that it can be changed as per the situation changed.
• Good research design should be formulated after the crucial study of the nature of the
problem.
• Design should provide well developed guidelines for entire research steps.
• Design should constitute valid, reliable and generalizable features.
• Research design should cover the data collection and analysis technique properly.
• Design should able to recommend appropriate methods for hypothesis formulation and
testing.

There are different types of research design depend on the nature of the problem and
objectives of the study. Following are the four types of research design.

1. Explanatory Research Design


2. Descriptive Research Design
3. Diagnostic Research Design
4. Experimental Research Design

Explanatory Research Design


• In explanatory research design a researcher uses his own imaginations and ideas.

www.everstudy.co.in Query: [email protected]


• It is based on the researcher personal judgment and obtaining information about
something.

• He is looking for the unexplored situation and brings it to the eyes of the people.

• In this type of research there is no need of hypothesis formulation.

Descriptive Research Design


• In descriptive research design a researcher is interested in describing a particular situation
or phenomena under his study.

• It is a theoretical type of researcher design based on the collection designing and


presentation of the collected data.

• Descriptive research design covers the characteristics of people, materials, Scio-


economics characteristics such as their age, education, marital status and income etc.

• The qualitative nature data is mostly collected like knowledge, attitude, beliefs and
opinion of the people. Examples of such designs are the newspaper articles, films,
dramas, and documentary etc.

Diagnostic Research Design


Here researcher wants to know about the root causes of the problem. He describes the factors
responsible for the problematic situation. It is a problem solving research design that consists
mainly:

1. Emergence of the problem


2. Diagnosis of the problem
3. Solution for the problem and
4. Suggestion for the problem solution

Experimental Research Design


In this type of research design is often uses in natural science but it is different in social
sciences. Human behavior cannot be measured through test-tubes and microscopes. The social
researcher use a method of experiment in that type of research design. One group is subjected to
experiment called independent variables while other is considered as control group called
dependent variable. The result obtained by the comparison of both the two groups. Both have
the cause and effect relationship between each other.

Research Report Writing

www.everstudy.co.in Query: [email protected]


➢ Research reports are recorded data prepared by researchers or statisticians after analyzing
information gathered by conducting organized research, typically in the form of surveys
or qualitative methods.

➢ A written statement prepared for the benefit of others describing what has happened or a
state of affairs normally based on investigation.

➢ A report is a piece of factual writing, usually based on some kind of research or real-life
experience.

Features of good Report Writing:


✓ It has a clear thoughts

✓ It is complete & self-explanatory

✓ It is comprehensive but compact

✓ It is accurate in all aspects

✓ It has suitable format for readers

✓ It support facts & is factual

✓ It has an impersonal style

✓ It has proper date & signature

✓ It has a reference to relevant details

✓ It follows an impartial approach

✓ It has all essential technical details

www.everstudy.co.in Query: [email protected]


✓ It is presented in a lucid style

✓ It is a reliable document

✓ It is arranged in a logical manner

Steps in writing research report


Research reports are the product of slow, painstaking, accurate inductive work. The usual steps
involved in writing report are:
1. logical analysis of the subject-matter;

2. preparation of the final outline;

3. preparation of the rough draft;

4. rewriting and polishing;

5. preparation of the final bibliography; and writing the final draft.

www.everstudy.co.in Query: [email protected]


www.everstudy.co.in Query: [email protected]

You might also like