Lecture 13

The document discusses different types of sampling methods used in statistical inference. It introduces simple random sampling, where every member of the population has an equal chance of being selected for the sample. Systematic sampling selects elements at uniform intervals from a sampling frame. Stratified sampling divides the population into homogeneous subgroups and samples separately from each subgroup. Cluster sampling selects clusters randomly and samples from the selected clusters. These sampling methods allow statisticians to make inferences about unknown population parameters from sample data.

Uploaded by

ABHIJIT SAHOO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Lecture 13

Uploaded by

ABHIJIT SAHOO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Sampling and Sampling

Distribution
Introduction
 Statistical inference is the process of converting data into information.
 Parameters describe populations.
 Parameters are almost always unknown.
 We take a random sample of a population to obtain the necessary data.
 We calculate one or more statistics from the sample data.
 For example, to estimate a population mean, we compute the sample mean.
 Although there is very little chance that the sample mean and the population mean are identical,
we would expect them to be quite close. However, for the purposes of statistical inference, we
need to be able to measure how close.
 The sampling distribution provides this service. It plays a crucial role in the process because the
measure of proximity it provides is the key to statistical inference.
Problem Introduction
 Although there are 200 million TV viewers in the United States and somewhat over half that
many TV sets, only about 1000 of those sets are samples to determine what programs Americans
watch. Why select only about 1000 sets out of 100 million? Because time and the average cost
of an interview prohibit the rating companies from trying to reach millions of people and since
polls are reasonably accurate, interviewing everybody is unnecessary. In this domain, we will
examine questions such as-
 How many people should be interviewed?
 How should they be selected?
 How do we know when our sample accurately reflects the entire population?
Concept
 In statistical inference we are concerned with populations; the samples are of no interest
to us in their own right. We wish to use known random sample in the extraction of
information about the unknown population from which it is drawn.
 The information we extract is in the form of summary statistics: a sample mean, a sample
standard deviation, or other measures computed from the sample.
 A statistic such as the sample mean is considered an estimator of a population
parameter—the population mean.
 We will define sample estimators and population parameters. Then we explore the
relationship between statistics and parameters via the sampling distribution.
 Finally, we discuss desirable properties of statistical estimators.
Problem
Deans and other faculty members in professional schools often monitor how well the graduates
of their programs fare in the job market. Information about the types of jobs and their salaries
may provide useful information about the success of a program. In the advertisements for a
large university, the dean of the School of Business claims that the average salary of the
school’s graduates one year after graduation is $800 per week, with a standard deviation of
$100. A second-year student in the business school who has just completed his statistics course
would like to check whether the claim about the mean is correct. He does a survey of 25 people
who graduated one year earlier and determines their weekly salary. He discovers the sample
mean to be $750. To interpret his finding, he needs to calculate the probability that a sample of
25 graduates would have a mean of $750 or less when the population mean is $800 and the
standard deviation is $100. After calculating the probability, he needs to draw some conclusion.
Reason to Sample
 Time saving
Example-A sample poll using regular staff and field interviews of a professional polling firm would take only 1 or 2 days.
Think what time will it consume for a population of 100 million?
 Cost Effective
Example-Public opinion polls and consumer testing organizations, such as Gallup Polls and Roper ASW, usually contact
fewer than 2000 of nearly 60 million families in US. One consumer panel-type organization charges about $40000 to mail
samples and tabulate responses in order to test a product (such as breakfast cereal or perfume). The same product using all
60 million families would cost $1 billion.
 Physical Impossibility of checking all items
Example-It would be impossible to check all the water in a lake for determining the bacterial level.
 Destructive nature of some test
Example-In a plant, steel plates, wires etc must have a certain minimum tensile strength. To ensure the product meets the
minimum standard, the Quality Assurance Department selects a sample from the current production and is stretched until
it breaks and the breaking point is noted. If all the units are tested, nothing would be available for sale. For same reason,
only a sample of photographic film is selected and tested by Kodak to determine the quality of all the film produced.
Types of Sampling
Judgement Sampling
 Personal knowledge and opinion are used to identify the items from the population that are to be
included in the sample. The process of selecting a sample using judgmental sampling involves the
researchers carefully picking and choosing each individual to be a part of the sample. The
researcher’s knowledge is primary in this sampling process as the members of the sample are not
randomly chosen.
 Example-Consider a scenario where a panel decides to understand what are the fa ctors which lead a
person to select ethical hacking as a profession. Ethical hacking is a skill which has been recently
attracting youth. More and more people are selecting it as a profession. The researchers who understand
what ethical hacking is will be able to decide who should form the sample to learn about it as a profession.
That is when judgmental sampling is implemented. Researchers can easily filter out those participants who
can be eligible to be a part of the research sample.
Types of Sampling
Random or Probability Sampling
All the items in the population have an equal chance of being chosen in the sample. The following methods can be
adopted for the random sampling-
i) Simple Random Sampling
ii) Systematic Sampling
iii) Stratified Sampling
iv) Cluster Sampling
v) Bootstrap Aggregating (Bagging)
Simple Random Sampling
 Simple random sampling selects samples by methods that allow each possible sample to have an
equal probability of being picked and each item in the entire population to have an equal chance of
being included in the sample.
 To obtain a random sample from the entire population, we need a list of all the elements in the population
of interest. Such a list is called a frame. The frame allows us to draw elements from the population by
randomly generating the numbers of the elements to be included in the sample.
 Simple random sampling can be viewed as the basis for the other random sampling techniques. With
simple random sampling, each unit of the frame is numbered from 1 to N. Next, a table of random
numbers or a random number generator is used to select n items into the sample. A random number
generator is usually a computer program that allows computer-calculated output to yield random numbers.
 Suppose we need a simple random sample of 100 people from a population of 7,000. We make a list of all
7,000 people and assign each person an identification number. This gives us a list of 7,000 numbers—our
frame for the experiment. Then we generate by computer or by other means a set of 100 random numbers
in the range of values from 1 to 7,000. This procedure gives every set of 100 people in the population an
equal chance of being included in the sample.
Systematic Sampling
 In systematic sampling, elements are selected from the population at a uniform interval that is
measured in time, order or space. If we wanted to interview every twentieth student on a campus, we
would choose a random starting point in the first 20 names in the student directory and then pick
every twentieth name thereafter.
 Each sample does not have the equal chance of being selected.
 Suppose you are studying a particular product from a departmental store for every Monday. Think about
the problem.
 Systematic sampling may require less time and sometimes results in lower costs than simple random
sampling.
Stratified Sampling
 The technique of stratified sampling is useful when the population can be divided into relatively
homogenous group or strata and random sampling is made only on the strata of interest. The groups are
mutually exclusive and exhaustive of the population. Example-the strata may be students, people of a
certain community, male or female, socio-economic levels, affiliated to manufacturing or service sectors,
etc.
 Stratified sampling is used because it accurately reflects the characteristics of the target population.
Example-People in a certain economic class has interest to buy sports car, people in a specific region may
have a special choice of music,
 We use stratified sampling when each group has small variation within itself but there is a large variation
between the groups. Example-The choice of perfume vary with gender, the choice of transport within
different classes etc.
 Stratified sampling is necessary when the population is heterogenous and creating homogenous stratum
before sampling is recommended for precise estimation of population parameters.
Cluster Sampling
 In cluster sampling, we divide the population into groups or clusters and then select a random sample of
these clusters.
 We assume that these clusters are representative of the population as a whole.
 Example- If a market research team is attempting to determine by sampling the average number of
television sets per household in a large city, they could use a city map to divide the territory into blocks
and then choose a certain number of blocks (clusters) for interviewing. Every household in each of these
blocks would be interviewed.
 A well-designed cluster sampling procedure can produce a more precise sample at considerably less cost
than that of simple random sampling.
 In both stratified and cluster sampling, the population is divided into well-defined groups. The major
difference is that in a stratified sample, all strata will be represented in the sample, whereas in a cluster
sampling, not all clusters will be represented.
 Cluster sample is used when clusters are large in number. For example- assume that we are interested in
impact of demonetization on Indian industry. There are large number of industrial sectors. We can focus on
any major two.
Bootstrap Aggregating (Bagging)
 Bootstrap Aggregation (Bagging) is ampling with replacement used in machine learning approaches.
 In Bagging, several samples are generated with replacement from the population and analytical models are
developed using each sample.
Sampling Distribution
 Sampling distribution refers to the probability distribution of a statistic such as sample mean, sample
standard deviation computed from several random samples of same size.

 A sampling distribution is the distribution of statistics that would be produced in repeated random
sampling from the same population.

 Sampling distributions are used to calculate the probability that sample statistics could have occurred by
chance and thus to decide whether something that is true of a sample statistic is also likely to be true of a
population parameter.

 It allow the researcher to come to conclusions about a population on the basis of descriptive statistics
about a sample.
Sampling Distribution-Example

 Your sample says that a candidate gets support from 47%.

 Inferential statistics allow you to say that the candidate gets support from 47% of
the population with a margin of error of +/- 4%.
 This means that the support in the population is likely somewhere between 43%
and 51%.
Sampling Distribution- Example
 Your sample says that a candidate gets support from 47%.
 Inferential statistics allow you to say that the candidate gets support from 47% of the population with a
margin of error of +/- 4%.
 This means that the support in the population is likely somewhere between 43% and 51%.
 Margin of error is taken directly from a sampling distribution.

95% of Possible Sample Means

47%
43% 51% Your Sample Mean
Sampling Distribution