0% found this document useful (0 votes)
22 views9 pages

Statistics 1

The document discusses measures of central tendency, including arithmetic mean, median, and mode, as well as measures of dispersion such as range and standard deviation. It explains the concept of normal distribution and its characteristics, including the standard normal distribution and hypothesis testing. Additionally, it covers the definitions of key statistical terms and the steps involved in testing hypotheses.

Uploaded by

Preeti Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Statistics 1

The document discusses measures of central tendency, including arithmetic mean, median, and mode, as well as measures of dispersion such as range and standard deviation. It explains the concept of normal distribution and its characteristics, including the standard normal distribution and hypothesis testing. Additionally, it covers the definitions of key statistical terms and the steps involved in testing hypotheses.

Uploaded by

Preeti Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Review

Measures of central tendency:

The single value, which represents the group of values, is termed as a

“measure of central tendency” or a measure of location or an average.


Types of average:1. Arithmetic Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean
Arithmetic Mean (A.M): It is defined as the sum of the given observations divided

by the number of observations. A.M. is measured with the same units as that of the

observations.
Let x1, x2 , ………,xn be ‘n’ observations then the A.M is computed from the formula:

A.M.= where = sum of the given observations


n = Number of observations

Median : The median is the middle most item that divides the distribution into two

equal parts when the items are arranged in ascending order of magnitude.
If the number of observations is odd, then median is the middle value after

the values have been arranged in ascending or descending order of magnitude. In

case of even number of observations, there are two middle terms and median is

obtained by taking the arithmetic mean of the middle terms.

Mode: Mode is the value which occurs most frequently in a set of observations or

mode is the value of the variable which is predominant in the series.

Measures of Dispersion:
Dispersion means scattering of the observations among themselves or from a

central value (Mean/ Median/ Mode) of data. We study the dispersion to have an

idea about the variation.


Suppose that we have the distribution of the yields (kg per plot) of two Ground nut
varieties from 5 plots each. The distribution may be as follows:
Variety 1: 46 4850 5254
Variety 2: 30 40 50 60 70

It can be seen that the mean yield for both varieties is 50 kg. But we can not say

that the performances of the two varieties are same. There is greater uniformity of

yields in the first variety where as there is more variability in the yields of the
second variety. The first variety may be preferred since it is more consistent in yield

performance.

Types of dispersion:
1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation and Variance
5. Coefficient of Variation
6. Standard Error
Range: It is the difference between maximum value and minimum value.

Standard Deviation (: It is defined as the positive square root of the

arithmetic mean of the squares of the deviations of the given values from

arithmetic mean. The square of the standard deviation is called variance.


Let x1, x2 , …….,xn be n observations then the standard deviation is given by the

formula

S.D. = where A.M. = ,


where n = no. of observations.
Simplifying the above formula, we have

or S.D () =
Example:
Calculate S.D. for the values 5, 6, 7, 7, 9, 4, 5.
S.D.=
= 1.55 kg.

Coefficient of Variation (C.V):


Coefficient of variation is the percentage ratio of standard deviation and the
arithmetic mean. It is usually expressed in percentage. The formula for C.V. is,

C.V. = x10 0
The coefficient of variation will be small if the variation is small of the two
groups, the one with less C.V. said to be more consistent.
Note: 1. Standard deviation is absolute measure of dispersion
2. Coefficient of variation is relative measure of dispersion.
NORMAL DISTRIBUTION

The Normal Distribution (N.D.) was first discovered by De- Moivre as the limiting

form of the binomial model in 1733, later independently worked by Laplace and

Gauss.

The Normal distribution is ‘probably’ the most important distribution in statistics. It

is a probability distribution of a continuous random variable and is often used to

model the distribution of discrete random variable as well as the distribution of


other continuous random variables. The basic form of normal distribution is that of

a bell, it has single mode and is symmetric about its central values.

Definition: A random variable X is said to follow a Normal Distribution with


2
parameter and and if its density function is given by the probability law

f(x) = - < x< ; - < < ; > 0


where = a mathematical constant equality = 22/7
e = Naperian base equaling 2.7183
= population mean
= population standard deviation
x = a given value of the random variable in the range - < x <
Characteristics of Normal distribution and normal curve:

i. The curve is bell shaped and symmetrical, about the mean


ii. The height of normal curve is at its maximum at the mean. Hence the

mean and mode of normal distribution coincides. Also the number of

observations below the mean in a normal distribution is equal to the

number of observations about the mean. Hence mean and median of

N.D. coincides. Thus, N.D. has Mean = median = mode


iii. As ‘x’ increases numerically, f(x) decreases rapidly, the maximum

probability occurring at the point x = , and given by

p[(x)] max =

the area under the normal curve is distributed as follows

i) - < x < + covers 68.26% of total area (or) 0.6826

ii)- 2 < x < +2 covers 95.44% of total area (or) 0.9544

iii) - 3 < x < +3 coves 99.73% of total area (or) 0.9973

Standard Normal Distribution: If ‘X’ is a normal random variable with Mean and

standard deviation , then Z = is a standard normal variate with zero mean


and standard deviation = 1.
The probability density function of standard normal variate ‘z’ is
f(z) = and =1

A graph representing the density function of the Normal probability distribution

is also known as a Normal Curve or a Bell Curve (see Figure below). To draw

such a curve, one needs to specify two parameters, the mean and the standard

deviation. The graph below has a mean of zero and a standard deviation of 1, i.e.,

(m =0, s =1). A Normal distribution

with a mean of zero and a standard deviation of 1 is also known as the Standard

Normal Distribution.

Standard Normal Distribution

Testing of Hypothesis

Introduction: The estimate based on sample values do not equal to the true value

in the population due to inherent variation in the population. The samples drawn

will have different estimates compared to the true value. It has to be verified that

whether the difference between the sample estimate and the population value is

due to sampling fluctuation or real difference. If the difference is due to sampling

fluctuation only it can be safely said that the sample belongs to the population

under question and if the difference is real we have every reason to believe that

sample may not belong to the population under question. The following are a few
technical terms in this context.

Hypothesis: The assumption made about any unknown characteristics is called


hypothesis. It may or may be true.
Ex: 1. = 2.3; be the population mean
2. = 2.1 ; be the population standard deviation

Population follows Normal Distribution. There are two types of hypothesis, namely

null hypothesis and alternative hypothesis.


Null Hypothesis: Null hypothesis is the statement about the parameters. Such a

hypothesis, which is usually a hypothesis of no difference is called null hypothesis

and is usually denoted by H 0. (or) any statistical hypothesis under test is called null

hypothesis. It is denoted by H 0 .
1. H 0 : = 0

2.
H0: 1 = 2

Alternative Hypothesis: Any hypothesis, which is complementary to the null

hypothesis, is called an alternative hypothesis, usually denoted by H1.


Ex: 1. H 1: ≠ 0

2. H 1: 1 ≠ 2

Population: In a statistical investigation the interest usually lies in the assessment

of the general magnitude and the study of variation with respect to one or more

characteristics relating to objects belonging to a group. This group of objects

under study is called population or universe. i.e the totality of all the objects under

study is called Population.

Sample: A finite subset of statistical objects in a population is called a sample and

the number of objects in a sample is called the sample size.

Parameter: A characteristics of population values is known as parameter. For


2
example, population mean () and population variance ( ).

In practice, if parameter values are not known and the estimates based on the
sample values are generally used.

Statistic: A Characteristics of sample values is called a statistic. For example,

2
sample mean ( ), sample variance (s ) where =

2
and s =

Sampling distribution: The distribution of a statistic computed from all possible

samples is known as sampling distribution of that statistic.

Standard error: The standard deviation of the sampling distribution of a statistic is

known as its standard error, abbreviated as S.E.

S.E.( )= ; where = population standard deviation and n = sample size

Random sampling: If the sampling units in a population are drawn independently

with equal chance, to be included in the sample then the sampling will be called

random sampling. Simple Hypothesis: A hypothesis is said to be simple if it

completely specifies the distribution of the population. For instance, in case of

normal population with mean and standard deviation , a simple null hypothesis

is of the form H0 : = , is known, knowledge about would be enough to


understand the entire distribution.

Composite Hypothesis: If the hypothesis does not specify the distribution of the

population completely, it is said to be a composite hypothesis. Following are some

examples:

H0 : and is known

H0 : and is known
Types of Errors:

In testing of statistical hypothesis, there are four possible types of decisions


1. Rejecting H 0 when H 0 is true

2. Rejecting H 0 when H 0 is false


3. Accepting H0 when H 0 is true

4. Accepting H0 when H 0 is false


th
1 and 4 possibilities leads to error decisions. Statistician gives specific names

to these concepts namely Type- I error and Type- II error respectively.

The above decisions can be arranged in the following table

H 0 is true H 0 is false

Rejecting H 0 Type- I error Correct

(Wrong decision)

Accepting H0 Correct Type- II error

Type- I error: Rejecting H 0 when H 0 is true

Type- II error: Accepting H 0 when H 0 is false

The probabilities of type- I and type- II errors are denoted by and respectively.

Degrees of freedom: It is defined as the difference between the total number of

items and the total number of constraints.

If ‘n’ is the total number of items and ‘k’ the total number of constraints then the

degrees of freedom (d.f.) is given by d.f. = n- k

Level of significance(LOS): The maximum probability at which we would be willing

to risk a type- I error is known as level of significance or the size of Type- I error is

level of significance. The level of significance usually employed in testing of

hypothesis are 5% and 1% . The Level of significance is always fixed in advance

before collecting the sample information. LOS 5% means the results obtained will

be true is 95% out of 10 0 cases and the results may be wrong is 5 out of 10 0
cases.

Critical value: while testing for the difference between the means of two
populations, our concern is whether the observed difference is too large to believe

that it has occurred just by chance. But then the question is how much difference

should be treated as too large? Based on sampling distribution of the means, it is

possible to define a cut- off or threshold value such that if the difference exceeds

this value, we say that it is not an occurrence by chance and hence there is
sufficient evidence to claim that the means are different. Such a value is called the

critical value and it is based on the level of significance.

Steps involved in test of hypothesis:

1. The null and alternative hypothesis will be formulated

2. Test statistic will be constructed

3. Level of significance will be fixed

4. The table (critical) values will be found out from the tables for a given level

of significance

5. The null hypothesis will be rejected at the given level of significance if the
value of test statistic is greater than or equal to the critical value.

Otherwise null hypothesis will be accepted.

6. In the case of rejection the variation in the estimates will be called

‘significant’

variation. In the case of acceptance the variation in the estimates will be

called ‘not- significant’.

*****

You might also like