Distributions and Sampling - Tuesday
Distributions and Sampling - Tuesday
Distributions and Sampling - Tuesday
Syllabus
• UNIT-I: Probability Distributions
• The manager of a departmental store informs that the probability that a customer
who is just browsing will eventually buy some items is 0.4. During the pre-lunch
session on a day, 7 customers are seen to browse in the department. Find:
• We know: P(0)= 0.02801, P(1)= 0.13062, P(2)= 0.26133, P(3)= 0.29034, P(4)=
0.19355, P(5)= 0.07746, P(6)= 0.01727, P(7)= 0.0017
Expected value and standard deviation
Variance = n*p*(1-p)
Poisson distribution
• In many cases, the number of trials “n” and probability “p” is not given
• There are many discrete phenomena that are represented by a Poisson process.
• A Poisson distribution is said to exist when we can observe discrete events in some
area of interest (which may be a continuous interval of time, space, length, etc.)
Poisson distribution
• The only condition for Poisson distribution is that the expected number of successes (or events) must be
known which is represented by . [Example: Average customers visiting a site is 5 per minute]
• If we know , we can find the probabilities of exactly getting 0 success in future, 1 success in future, 2, 3,4 ,5
…………………to infinity [Example: Probability of exactly 0 customer per minute, 1 customer per minute,
prob. of exactly 2 customer per minute, 3 customer…………..]
• So, x is the random variable denoting number of successes or number of arrivals x={0,1,2,3,4,…..∞}
• (a) The minimum number of successes in a Poisson distribution is zero while there is no upper limit.
• (b) In calculating probabilities, the value of should be defined carefully. To illustrate, it is given
that, on an average, 12 accidents occur in a quarter of a year on a certain crossing. In this case, for
calculating probabilities,
(ii) A certain number of accidents to occur over a two-month period, we should take = 8.
(iii) A certain number of accidents to occur over a three-month period, we should take = 12.
(iv) A certain number of accidents to occur over a one-and-a-half month period, we should take = 6.
EXAMPLE:
• If, on an average, 2 customers arrive at a shopping mall per minute, what is the
probability that
• There are several phenomena which seem to follow this distribution very closely or
can be approximated by it
• When the data is very large, then in most of the cases, the random variable follows
the normal distribution
• where, e = 2.7183
• = expected value
• σ =standard deviation
[standard deviation=
• x = a particular value of the random variable,
• y(x) = density for x
• Samples are taken and analysed not just for their sake but to learn about the
populations from which they are drawn.
• Economic: Sampling is done mainly for the economic reasons as it may be too
expensive or too time-consuming to attempt either a complete or a nearly
complete coverage in a statistical study.
• Destructive nature of tests: Where the testing results in the destruction of the
elements in the process of examination
• Very large populations: When the population in question is very large in size or
is infinite, then sampling is the only choice
Types of Sampling
1. Simple Random Sampling
• Then, the ratio of the population size, N, to the sample size, n, is calculated and
represented by k. Thus, k = N/n.
• Note that only integer value of k is considered here, ignoring the fractional part, if any.
• After this, an element is chosen randomly from the first k elements. This is the first
element selected in the sample.
• It is followed by choosing every kth element from the element chosen, for inclusion in
the sample.
3. Stratified Sampling
• In stratified sampling, the N elements of the population are first sub-divided into distinct and
mutually exclusive sub-populations, also called strata, according to some common
characteristic.
• For example, the employees of a large company can be divided by their rank, gender,
department, and so forth.
• After a population is divided into appropriate strata, a simple random sample is taken within
each strata
• Stratified sampling is more efficient than simple random sampling or systematic sampling
because such sampling ensures representation of individuals or items across the total population.
4. Cluster Sampling
• In this type of sampling, the investigator or his people have the freedom to
choose whomsoever they find conveniently.
• For example, sample mean and standard deviation are represented by , and s
respectively, while the population parameters are represented by μ and .
• Since a sample is only a part of the population, we do not expect a statistic value
to match exactly the corresponding parameter, except only by chance.
• Such an error is likely to occur due to the fact that a sample is only a subset of
the population.
• However, this is not the only reason of having errors. There are other reasons
also that cause errors.
• The sampling errors arise only for the reason of sampling and result from
the chance selection of sampling units
• This type of error occurs simply because only a part of the population is
observed and is expected to disappear when a census study is undertaken.
• They may arise because of bias, vague definitions used in the data
collection, defective methods of data collection, incomplete coverage of the
population, wrong entries made in the questionnaire, etc.
(a) Take all possible samples of size n from a population of size N, having mean μ and standard
deviation
(c) Tabulate the mean values and calculate the relative frequency of each value of mean by
dividing the frequency with which it appears by the total frequency (equal to the number of
samples). The relative frequency of each value indicates its probability.
Example
• Central Limit Theorem (CLT): If random samples of size n are drawn from any
population with mean μ and standard deviation σ, and if n is sufficiently large, then
the distribution of possible mean values will be approximately normal with expected
value μ, and standard error, , regardless of the population distribution.
Sampling Distribution of Mean and the Probabilities
2. When population is not normally distributed but sample size n is large enough.
• We can use normal area table to calculate probabilities involving the sample
mean.
Example
(a) A random sample of 10 batteries will have a mean life of 412 hours or
greater.
(b) A sample of 100 batteries selected randomly will have a mean life of at least
412 hours.
Example