0% found this document useful (0 votes)
42 views15 pages

Stat Chapter 2

1) The document discusses sampling and sampling distributions in statistics. It defines key terms like population, sample, sampling unit and explains why sampling is commonly used instead of a full census. 2) There are two main types of sampling - probability sampling, where every unit has a known chance of selection, and non-probability sampling, where the chance of selection is unknown. 3) The sampling distribution is the probability distribution of all possible sample statistics like the mean, taken from the population. It allows us to understand the variability we expect from samples.

Uploaded by

Hamza Abdureman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views15 pages

Stat Chapter 2

1) The document discusses sampling and sampling distributions in statistics. It defines key terms like population, sample, sampling unit and explains why sampling is commonly used instead of a full census. 2) There are two main types of sampling - probability sampling, where every unit has a known chance of selection, and non-probability sampling, where the chance of selection is unknown. 3) The sampling distribution is the probability distribution of all possible sample statistics like the mean, taken from the population. It allows us to understand the variability we expect from samples.

Uploaded by

Hamza Abdureman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistics for finance Chapter -3- Sampling and Sampling Distributions

2
CHAPTER THREE

3. SAMPLING AND SAMPLING DISTRIBUTIONS

I. Sampling Theory

Basic Concepts of Some Statistical Terms:

Population: aggregation of the elements from which a sample is actually selected. It is the entire

group of individuals or objects under consideration.

Sample: it is a subgroup or part of the population selected by some method in order estimate

population characteristics.

Elementary unit (unit of analysis): an element or group of elements on which information is

required or it is the object that we observe or measure. Thus, persons, vehicles, households, farms

are examples of elementary units.

Sampling units: for the purpose of sample selection, the population is divided in to a finite number

of distinct, non-overlapping and identifiable units called sampling units.

Sample Frame: is a list of elements covering the survey population, and serves as a base for sample

selection.

Data: These are measurements or observations (values) recorded for each element.

Variable: is a characteristic or attribute that can assume different values.

Population parameters: These are facts about population/descriptions of population.

Statistic: it is characteristic or a fact about a sample.

THE NEED FOR SAMPLING:

The following points summarize the benefits of studying samples.

Sampling can save time and money. A sample study is usually less expensive than a census

study and produces results at a relatively faster speed. There could be resource (time, finance,

manpower, etc.) limitations which would make it difficult to study the whole population.

Sampling may enable more accurate measurements for a sample study is generally conducted

by trained and experienced investigators

Page 1 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

Sampling remains the only choice when a test involves the destruction of the item under study.

In some cases, tests may be destructive. For example, when we test the breaking strength of

materials, we must destroy them. A census would mean complete destruction of materials. In

such a case, we must sample.

Sampling provides much quicker results than does a census. When the time between the

recognition of the need of information and the availability of that information is short, sampling

helps not to miss the information.

Sampling is the only process possible if the population is infinite.

Sampling usually enables to estimate the sampling errors and, thus, assists in obtaining

information concerning some characteristic of the population

ERRORS IN SAMPLING

Sampling result may not always be correct, because sample results are either based on partial or

incomplete analysis of the population. This error is referred to as the sampling error.

 Sampling error (estimation error): is caused by observing a sample instead of a whole

population.

 Non-sampling errors: arise during both census as well as sampling surveys due to biases and

mistakes. The errors that occur in the collection, recording, and tabulation of data

are called non-sampling errors

TYPES OF SAMPLING

Several alternative ways to take a sample are available. The main alternative sampling plans may be

grouped into two categories: probability techniques and non-probability techniques.

In probability sampling, every element in the population has a known, nonzero probability of

selection. The simple random sample, in which each member of the population has an equal

probability of being selected, is the best-known probability sample.

In non-probability sampling, the probability of any particular member of the population being

chosen is unknown. The selection of sampling units in non-probability sampling is quite arbitrary,

as researchers rely heavily on personal judgment.

Page 2 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

SAMPLING DISTRIBUTIONS

Sampling distribution is the distribution of the individual values included in a sample. It is a

distribution of all the possible values of a sample static for a given size of sample from a population.

In the previous chapter we defined a random variable as a numerical description of the outcome of

an experiment. If we consider the process of selecting a simple random sample (a type of sampling

that every unit in the population has an equal and known chance of being selected in the sample) as

an experiment, the sample mean X is the numerical description of the outcome of the experiment.

Thus, the sample mean X is a random variable. As a result, just like other random variables, X

have a mean or expected value, a standard deviation, and a probability distribution. Because the

various possible values of X are the result of different simple random samples, the probability

distribution of X is called the sampling distribution of X .

Sampling Distribution of the Mean ( X )

It is a probability distribution of all possible sample means of a given sample size. The sampling

distribution of the mean is described by determining the mean of such a distribution, which is the

expected value E( X ), and the standard deviation of the distribution of sample means, designated

by  X .

Example: Assume there is population size N=4 Random variable, X, is age of individuals. Values of

X: 18, 20, 22, 24 (years). Find population mean and standard deviation.

μ
X i

18  20  22  24
 21
N 4

σ
 (X i  μ) 2
 2.236
N

Example 2: ABC Industries has seven production employees (considered the population). The
hourly earnings of each employee are given below.
Employee Hourly Earnings
A Br 7
B 7
C 8
D 8
E 7
F 8
G 9
Page 3 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

REQUIRED:

(1) Determine the population mean?

Ans: The population mean is Br 7.71, found by


X 7  7  8  8  7  8  9 54
   Br 7.71
N 7 7
(2) Determine the sampling distribution of the sample mean for samples of size 2?

To arrive at the sampling distribution of the sample mean, we need to select all possible samples of 2 without

replacement from the population, and then compute the mean of each sample. There are 21 possible samples:

N  n  7  2  21
Where N = 7 is the number of items in the population and n = 2 is the number of items in the

sample.

Summarized results of Sampling Distribution of the Sample Mean for n = 2

Sample MeanNumber of Means Probability


7.00 3 0.1429
7.50 9 0.4285
8.00 6 0.2857
8.50 3 0.1429
21 1.00
(3) What is the mean of the sampling distribution?

The mean of the sampling distribution of the sample mean is obtained by summing the

various sample means and dividing the sum by the number of samples. The mean of all the

sample means is usually written  x . The  reminds us that it is a population value


because we have considered all possible samples. The subscript x indicates that it is the

sampling distribution of the sample mean.

 x  Tota
Sum of a ll sa mpleme a ns (7 x3)  (7.50 x9)  (8 x 6)  (8.50 x3)
lnumbe rof sa mple s

21

162
21
 Br 7.71

The following observations can be made about the population and the sampling distribution.

 The mean of the distribution of the sample mean (Br 7.71) is equal to the mean of the

population.

 The range (spread) in the distribution of the sample mean is less than the range (spread)

in the population values. The samples mean ranges from Br 7.00 to Br 8.50, while the

population values vary from Br 7.00 up to Br 9.00.


Page 4 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

The Standard Deviation of the Sample Mean

Let us define the standard deviation of the sampling distribution of X . We will use the following
notation.
σ x = The Standard Deviation of x
σ = The Standard Deviation of Population
n = The sample Size and

N = The population Size

It can be shown that the formula for the standard deviation of X depends on whether the

population is finite or infinite. The two formulas for the standard deviation of X follow

STANDARD DEVIATION OF X

Finite Population Infinite Population

N n     
x    x   

N 1  n   n

 x   
 
As a general rule, when n < 0.05N, we can use the  formula
 n 
Example:
Take the above example & determine population standard deviation  and obtain the standard
deviation,  x , of the variable X for samples of size 2. Indicate any apparent relationship between
 x and 
Solutions:
To determine the population standard deviation we will use the formula:

   ( xi   )2
N
2 2 2 2 2 2 2
 ( 7  7.71)  ( 7  7.71)  (8  7.71)  (8  7.71)  ( 7  7.71)  (8  7.71)  ( 9  7.71)
 
7

3.4287
   0.4899  0.6998  Populationstandarddeviation
7
To obtain the standard deviation of the variable X for samples of size 2, we apply the following
formula:

 (( 7  7.71) 2 x3)  ((7.50  7.71) 2 x9)  ((8  7.71) 2 x 6)  ((8.50  7.71) 2 x3)
 
21

4.2861
x  0.2047  0.4517  Standard deviation of a Sample Mean
21
OR
Page 5 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

When sampling is done without replacement from a finite population, as in the above example:

x
N  n   
N 1  n 

Thus, x
N  n   
N 1  n 

=
7  2  0.6998  = 5
0.4948
7  1  2  6

= 0.4517

When sampling is done with replacement from a finite population or when it is done from an
  
infinite population, the appropriate formula is  x   
 n
Example 1:

Suppose the mean of a very large population is = 50 and the standard deviation of the

measurements is  = 12. We determine the sampling distribution of the sample means for a sample

size of n = 36, in terms of the expected value & the standard error of the distribution, as follows:

Example 2:

Suppose that in the above Example the sample of n = 36 values were taken from a population of just

100 values. The sample thus constitutes 36 percent of the population. The expected value and

standard error of the sampling distribution of the mean are:

Example 3:

As reported by MOFED, the mean living expense for a single-family is 1,742 Birr. Assume a

standard deviation of 568 Birr.

A. For samples of 25 single-family, determine the mean and standard deviation of the variable x .
Page 6 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

B. Repeat part (a) for a sample of size 500.

Exercise:

The mean wage per hour for all 5000 employees who work at a large company is $17.50 and the

standard deviation is $2.90. Let x be the mean wage per hour for a random sample of certain

employees selected from this company. Find the mean and standard deviation of x for a sample

size of: (a) 30 (b) 75 (c) 200

Ans: (a)  x =  = $17.50 and  x = $ 0.529

(b)  x =  = $17.50 and  x = $ 0.335

(c)  x =  = $17.50 and  x = $ 0.205

From the preceding calculations we observe that the mean of the sampling distribution of x is

always equal to the mean of the population whatever the size of the sample. However, the value of

the standard deviation of x decreases from $ 0.529 to $ 0.335 and then to $ 0.205 as the sample size

increases from 30 to 75 and then to 200

N.B:

 The larger the sample size, the smaller is the standard deviation of x .

 The smaller the standard deviation of x , the more closely the possible values of x (the possible

sample means) cluster around the mean of x .

 The mean of x equals the population mean

The Central Limit Theorem

If the population or process from which a sample is taken is normally distributed, then the

sampling distribution of the mean also will be normally distributed, regardless of sample size.

However, what if a population is not normally distributed? Remarkably, a theorem from

mathematical statistics still permits application of the normal distribution with respect to such

sampling distributions.
Page 7 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

The central limit theorem states that as sample size is increased, the sampling distribution of the

mean (and for other sample statistics as well) approaches the normal distribution in form,

regardless of the form of the population distribution from which the sample was taken.

In selecting random samples of size n from a population, the sampling distribution of the sample

mean can be approximated by a normal distribution as the sample size becomes large.

The sample size is usually considered to be large if n > 30.

Thus, according to the central limit theorem,

1. When n > 30, the shape of the sampling distribution of x is approximately normal irrespective

of the shape of the population distribution.

2. The mean of x ,  x is equal to the mean of the population, .


  
3. The standard deviation of x ,  x   
 n

  
Again, remember that to apply  x    formula, n < 0.05N
 n

Determining Probability Values for the Sample Mean

If the sampling distribution of the mean is normally distributed, either because the population is

normally distributed or because the central limit theorem is invoked, then we can determine

probabilities regarding the possible values of the sample mean, given that the population mean and

standard deviation are known. The process is similar to determining probabilities for individual

observations using the normal distribution, as described in chapter 2. In the present application,

however, it is the designated value of the sample mean that is converted into a value of z in order to

use the table of normal probabilities. This conversion formula uses the standard error of the mean

because this is the standard deviation for the x variable. Thus, the conversion formula is

Page 8 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

Example:

An auditor takes a random sample of size n = 36 from a population of 1,000 accounts receivable. The

mean value of the accounts receivable for the population is = Br 260, with the population

standard deviation  =Br 45. What is the probability that the sample mean will be less than Br 250

f ( x)

250 260 Mean acct Balance

- 1.33 0 z

Therefore:

Using the above example what is the probability that the sample mean will be within Br 15 of the

population mean?

Page 9 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

Sampling distribution of difference between two sample means


Another sampling distribution that you will soon encounter is that of the difference between two

sample means. The difference between two sample means x 1 - x 2 is normally distributed if both
populations are normal. Through the use of the laws of expected value and variance we derive the

expected value and variance of the sampling distribution of x 1 - x 2.

And

Thus, it follows that in repeated independent sampling from two populations with means  1 and
 2 and and standard deviations  1 and  2, respectively, the sampling distribution of x 1 - x 2 is
normal with mean

Page 10 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

And standard deviation (which is the standard error of the difference between two means)

Example:

Suppose that the starting salaries of MScs at Haramaya University (HU) are normally distributed,

with a mean of Br 62,000 and a standard deviation of Br 14,500. The starting salaries of MScs at the

Private University (PU) are normally distributed, with a mean of Br 60,000 and a standard deviation

of Br 18,300. If a random sample of 50 HU MScs and a random sample of 60 PU MScs are selected,

what is the probability that the sample mean starting salary of HU graduates will exceed that of the

PU graduates?

Given:

Haramaya University;  1 = 62,000  1 = 14,500 n = 50

Private university;  2 = 60,000  2 = 18,300 n = 60

We want to determine p( x 1 - x 2 > 0) . We know that x 1 - x 2 is normally distributed with mean


of  1 -  2 = 62,000 – 60,00 = 2,000 and standard deviation

= P (Z > -0.64)

= 0.50 + 0.2389 = 0.7389

There is a 0.7389 probability that for a sample of size 50 from the HU graduates and a sample of size

60 from the PU graduates, the sample mean starting salary of HU graduates will exceed the sample

means of PU graduates.

Page 11 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

POPULATION AND SAMPLE PROPORTIONS


The concept of proportion is the same as the concept of relative frequency discussed in previous

Chapters and the concept of probability of success in a binomial experiment. The relative frequency

of a category or class gives the proportion of the sample or population that belongs to that category

or class. Similarly, the probability of success in a binomial experiment represents the proportion of

the sample or population that possesses a given characteristic.

The population proportion, denoted by p, is obtained by taking the ratio of the number of elements

in a population with a specific characteristic to the total number of elements in the population. The

sample proportion, denoted by p̂ (pronounced p hat), gives a similar ratio for a sample.

EXAMPLE:

Suppose a total of 789,654 families live in a city and 563,282 of them own homes. A sample of 240

families is selected from this city, and 158 of them own homes. Find the proportion of families who

own homes in the population and in the sample.

Solutions

For the population of this city,

N = population size = 789,654

X =families in the population who own homes = 563,282

The proportion of all families in this city who own homes is

Now, suppose a sample of 240 families is taken from this city and 158 of them are homeowners.

Page 12 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

Then,

n = sample size = 240

x = families in the sample who own homes = 158

The sample proportion is

Sampling distribution of p̂

Just like the sample mean x the sample proportion p̂ is a random variable. Hence, it possesses

a probability distribution, which is called its sampling distribution.

Sampling Distribution of the Sample Proportion, p̂ is the probability distribution of the sample

proportion, p is called its sampling distribution. It gives the various values that p̂ can assume

and their probabilities.

The value of p̂ calculated for a particular sample depends on what elements of the population are

included in that sample

Mean and Standard Deviation of p̂

The mean of p̂ , which is the same as the mean of the sampling distribution of p is always equal to

the population proportion, p, just as the mean of the sampling distribution of x is always equal to
the population mean,  .

Mean of the Sample Proportion: The mean of the sample proportion, p̂ , is denoted by  p̂ and is

equal to the population proportion, p. Thus,  p̂ = P

The standard deviation of p̂ denoted by σ pˆ is given by the following formula. This formula

is true only when the sample size is small compared to the population size. The sample size is said

to be small compared to the population size if n < 0.05N

Standard Deviation of the Sample Proportion The standard deviation of the sample proportion,

p̂ is denoted by  p̂ and is given by the formula

pq
σ pˆ 
n
Page 13 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

Where p is the population proportion, q = 1 - p, and n is the sample size. This formula is used when

n < 0.05N, where N is the population size.

However, if n > 0.05N, then  p̂ is calculated as follows

pq N  n
 pˆ 
n N 1
We use the concepts of the mean, standard deviation, and shape of the sampling distribution of to

determine the probability that the value of computed from one sample falls within a given interval.

The z value for p̂ is computed using the following formula.

pˆ  p
z
σ pˆ
EXAMPLE -1-

The proportion of a population with a characteristic of interest is p = 0.37. Find the mean and

standard deviation of the sample proportion p̂ obtained from random samples of size 1,600.
Ans: Since  p̂ = P = 0.37 and  p̂ = 0.012

EXAMPLE -2-

A random sample of size 121 is taken from a population in which the proportion with the

characteristic of interest is p = 0.47. Find the indicated probabilities.

A. P (0.45 ≤ p̂ ≤ 0.50) Ans = 0.4154

B. P ( p̂ ≥ 0.50) Ans. = 0.2546

Page 14 of 15
Statistics for finance Chapter -3- Sampling and Sampling Distributions

Sampling Distribution of the Difference of Sample Proportions


When sampling is done from two populations with proportions p1 and p2 respectively, the sampling
distribution of the difference of sample proportions p1  p 2 approaches to a normal distribution
p1q1 p 2q 2
with mean p1 - p2 and standard deviation of  as the sample sizes n1 and n2 increases.
n1 n2
Example:

It has been experienced that proportions of defaulters (in tax payments) belonging to business class

and professional class are 0.20 and 0.15 respectively. The results of a sample survey are:

Business class Professional class

Sample size: n1 = 400 n2 = 420

Proportion of defaulters: p1 = 0.21 p2 = 0.14

Find the probability of drawing two samples with a difference in the two sample proportions larger

than what is observed.

Solution:

Given
p1 = 0.20 p2 = 0.15
q1 = 1-0.20 = 0.80 q2 = 1-0.15 = 0.85
n1 = 400 n2 = 420
p1  0.21 p 2  0.14

Since the population is infinite and also the sample sizes are large, the central limit theorem applies.

i.e.

p1q1 p 2q 2 0.2 x0.80 0.15 x0.85


 p1  p 2     = .0004  0.0003 = 0.0264
n1 n2 400 420
So we can find the required probability using standard normal variable

z
p1  p2   p1  p2
p1q1 p 2q 2

n1 n2

P( p1  p 2 > 0.07) = P(Z 


0.21  0.14  0.20  0.15
0.2 x0.80 0.15 x0.85

400 420
= p (z>0.75) = 0.2266

Page 15 of 15

You might also like