Unit 4.
Unit 4.
(II SEM)
INTRODUCTION TO PROBABILITY
Even in out day-to-day life we say or hear phrases like “It may rain today” ;
“Probably I will get a first class in the examination”; “India might draw or win the
cricket series against Australia”; and so on. In all the above cases there is involved
an element of uncertainty or chance.
A random variable has a probability distribution, which defines the probability of its
unknown values. Random variables can be discrete (not constant) or continuous or both.
That means it takes any of a designated finite or countable list of values, provided with a
probability mass function feature of the random variable’s probability distribution or can
take any numerical value in an interval or set of intervals. Through a probability density
function that is representative of the random variable’s probability distribution or it can
be a combination of both discrete and continuous.
A probability distribution is a formula or a table used to assign probabilities to each
possible value of a random variable X. A probability distribution may be
either discrete or continuous. A discrete distribution means that X can assume one of a
countable (usually finite) number of values, while a continuous distribution means
that X can assume one of an infinite (uncountable) number of different values.
Normal Probability Distributions
The normal or Gaussian Probability Distribution is most popular and important
because of its unique mathematical properties which facilitate its application to
practically any physical problem in the real world; if not for the data’s distribution
directly, then in terms of the sampling distribution, this will be the discussion in
next Section . It constitutes the basis for the development of many of the statistical
methods.
The normal probability distribution was discovered by Abraham De Moivre in 1733
as a way of approximating the binomial probability distribution when the number of
trials in a given experiment is very large. In 1774, Laplace studied the mathematical
properties of the normal probability distribution. Through a historical error, the
discovery of the normal distribution was attributed to Gauss who first referred to it in
a paper in 1809.
In the nineteenth century, many scientists noted that measurement errors in a given
experiment followed a pattern (the normal curve of errors) that was closely
approximated by this probability distribution. The normal probability distribution is
formally defined as follows:
Normal Probability Distribution
A continuous random variable X is normally distributed or follows a normal probability distribution if its
probability distribution is given by the following function:
The universally accepted notation X~N(μ, σ2) is read as “the continuous random variable X is normally
distributed with a population mean μ and population variance σ2. Of course in real world problems we do not
know the true population parameters, but we estimate them from the sample mean and sample variance.
However, first, we must fully understand the normal probability distribution.
The graph of the normal probability distribution is a “bell-shaped” curve, as shown
in Figure. The constants μ and σ2 are the parameters; namely, “μ” is the population
true mean (or expected value) of the subject phenomenon characterized by the
continuous random variable, X, and “σ2” is the population true variance
characterized by the continuous random variable, X.
Hence, “σ” the population standard deviation characterized by the continuous
random variable X; and the points located at μ−σ and μ+σ are the points of
inflection; that is, where the graph changes from cupping up to cupping down.
curve is equal to one. The random variable X can assume values anywhere from minus infinity to plus infinity,
but in practice we very seldom encounter problems in which random variables have such a wide range.
The normal curve graph of the normal probability distribution) is symmetric with respect to the mean μ as
the central position. That is, the area between μ and κ units to the left of μ is equal to the area
Figure B Normal probability from the center, μ to μ + κ; that is, k above center.
Pμ≤X≤μ+κ
There is not a unique normal probability distribution, since the mathematical formula of
the graph depends on the two variables, the mean μ and the variance σ2. Figure C is a
graphical representation of the normal distribution for a fixed value of σ2 with μ varying.
Carl Friedrich Gauss in 1809 used the normal distribution to solve the important statistical
problem of combining observations. Because Gauss played such a prominent role in
determining the usefulness of the normal probability distribution, the normal probability
distribution is often called the Gaussian distribution. Gauss and Laplace noticed that
measurement errors tend to follow a bell-shaped curve, a normal probability distribution.
Today, the normal probability distribution arises repeatedly in diverse areas of applications.
For example, in biology, it has been observed that the normal probability distribution fits
data on the heights and weights of human and animal populations, among others.
3.2.4 Normal probability distribution
We should also mention here that almost all basic statistical inference is based on the
normal probability distribution. The question that often arises is, when do we know
that our data follow the normal distribution?
To answer this question, we have specific statistical procedures that we study in later
chapters, but at this point we can obtain some constructive indications of whether the
data follow the normal distribution by using descriptive statistics. That is, if the
histogram of our data can be capped with a bell-shaped curve (Fig. 3.2), if the stem-
and-leaf diagram is fairly symmetrical with respect to its center, and/or by invoking
the empirical rule “backward,” we can obtain a good indication of whether our data
follow the normal probability distribution.
PROPERTIES OF A NORMAL
DISTRIBUTION
The mean, mode and median are all equal.
The curve is symmetric at the center (i.e. around the mean, μ).
Exactly half of the values are to the left of center and exactly half the values are to
the right.
The total area under the curve is 1.