Note For Int To Statistics
Note For Int To Statistics
1 2
1.4 Applications of statistics ➢ All arithmetic operations except division are applicable.
Apart from helping elicit an intelligent assessment from a body of figures and facts, ➢ Relational operations are also possible.
statistics is indispensable tool for any scientific enquiry-right from the stage of planning Examples:
enquiry to the stage of conclusion. It applies almost all sciences: pure and applied, ▪ IQ, Temperature in F0.
physical natural, biological, medical, agricultural and engineering. It also finds 4. Ratio Scales: Level of measurement which classifies data that can be ranked, differences are
applications in social and management sciences, in commerce, business and industry. meaningful, and there is a true zero. True ratios exist between the different units of measure.
In almost all fields of human endeavor. ➢ All arithmetic and relational operations are applicable.
Almost all human beings in their daily life are subjected to obtaining numerical facts. Examples:
Applicable in some process e.g. invention of certain drugs, extent of environmental ✓ Weight
pollution. ✓ Height
In industries especially in quality control area. ✓ Number of students
I. Uses of Statistics ✓ Age
Statistics presents fact in the form of numerical data
It condenses and summarizes a mass of data in to a few presentable and precise figures.
Chapter Two
It facilitates comparison of data
It helps in formulating and testing hypothesis Methods of data collection and Presentation
It helps in predicting future trend Types of Data:
It helps in formulating polices.
There are two types (sources) for the collection of data.
1.5. Scales of measurement
Scales of measurement refer to ways in which variables or numbers are defined and categorized. (1). Primary data
Each scale of measurement has certain properties which in turn determine the appropriateness for The primary data are the first hand information collected, compiled and published by
use of certain statistical analyses. The four scales of measurement are nominal, ordinal, interval,
organization for some purpose. They are most original data in character and have not
and ratio.
Nominal Scales: Level of measurement which classifies data into mutually exclusive, all undergone any sort of statistical treatment.
inclusive categories in which no order or ranking can be imposed on the data. Refer to those that are collected by conducting survey to meet the specific problem needs at
✓ No arithmetic and relational operation can be applied. hand.
✓ Thus only gives names or labels to various categories.
Examples: Example: Population census reports are primary data because these are collected, complied and
❖ Political party preference (Republican, Democrat, or Other,) published by the population census organization.
❖ Sex (Male or Female.) (2). Secondary data -
❖ Marital status (married, single, widow, divorce)
The secondary data are the second hand information which are already collected by someone
❖ Country code
2. Ordinal Scales: Level of measurement which classifies data into categories that can be (organization) for some purpose and are available for the present study. The secondary data
ranked, however differences between the ranks do not exist. are not pure in character and have undergone some treatment at least once.
➢ Arithmetic operations are not applicable but relational operations are applicable.
data taken from already available published or unpublished source.
➢ Ordering is the sole property of ordinal scale.
Examples: 2.1 Methods of collection
▪ Letter grades (A, B, C, D, F). There are three major methods of data collection
▪ Rating scales (Excellent, Very good, Good, Fair, poor).
1. Self-administered questionnaire
▪ Military status.
3. Interval Scales: Level of measurement which classifies data that can be ranked and 2. Direct investigation-measurement (observation) of the subject and interviewing(face-to-
differences are meaningful. However, there is no meaningful zero, so ratios are meaningless. face, telephone, --- )
3 4
3. Use of documentary source 2.2. Methods of Data Presentation
1. Self-administered questionnaire This topic introduces tabular and graphical methods commonly used to summarize both
Questionnaire is the main data collection instrument in formal sample survey. Before qualitative and quantitative data. Tabular and graphical summaries of data can be obtained in
examining the steps in designing a questionnaire we need to review the annual reports, newspaper articles and research studies.
Everyone is exposed to these types of presentations, so it is important to understand how they are
types of questions used in questionnaires. Depending on the amount of freedom given to prepared and how they will be interpreted.
respondent in offering responses, there are two basic types of questions that can be used in 2.2.2. Classification of Data
questionnaires: open-ended questions and closed ended questions. The process of arranging data into homogenous group or classes according to some common
The type of questions for use will be determined by the form of responses wanted, the nature of characteristics present in the data is called classification.
the respondents and their ability to answer the questions. For Example: The process of sorting letters in a post office, the letters are classified according
Open-ended questions: - allows the respondent to answer it freely in his or her own words to the regions and further arranged according to zones, cities, etc.
Example: what do you think are the reasons for a high drop-out rate of village health 2.2.4 Frequency distribution
committee members? A frequency distribution is the organization of row data in table form, using classes and
Closed – ended questions:- frequencies. There are three basic types of frequency distributions, and there
Predetermined list of alternate responses is presented to the respondent for checking the are specific procedures for constructing each type. The three types are categorical, ungrouped
appropriate one(s). It implies that the respondent’s answers are restricted in some way to a and grouped frequency distributions.
limited range of alternatives. The reasons for constructing a frequency distribution are as follows
Direct investigation To organize the data in a meaningful, intelligible way.
I. Measurement or/and observation To enable the reader to determine the nature or shape of the distribution
data can be obtained through direct observation or measurement To facilitate computational procedures for measures of average and spread
provides accurate information but it is expensive and inconvenient To enable the researcher to draw charts and graphs for the presentation of data
eg: Land area measurement, Animal weight gain, Physical examination, direct To enable the reader to make comparisons between different data set
observation of work. 2.2.4.1. Categorical Frequency Distribution
II. Interview The categorical frequency distribution is used for data which can be placed in specific categories
a) Face-to-Face interview such as nominal or ordinal level data. For example, data such as data such as political affiliation,
b) Telephone Interview religious affiliation, or major field of study would use categorical frequency distribution.
3. The use of documentary source The major components of categorical frequency distribution are class, tally and frequency.
Extracting information from existing resources. Example 2.1: Twenty-five army inductees were given a blood test to determine their blood type.
Is much less expensive than any other two sources The data set is given as follows:
It is difficult to get the information needed when records are compiled in unstandardized A B B AB O
O O B AB B
manner.
B B O A O
Example: - Hospital records, professional institutes, Official statistics, - - -
5 6
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution:
A B C D
Class Tally Frequency Percent
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16
2.2.4.2 Ungrouped Frequency Distribution
When the data are numerical interested of categorical, the range of data is small and each class is
B. Since 12 of the 50workers had no days of sick leave, the answer is 50-12=38
only one unit, this distribution is called an ungrouped frequency distribution.
C. The answer is the sum of the frequencies for values 3, 4 and 5 that is 4+5+8=17
The major components of this type of frequency distributions are class, tally, frequency and
2.2.4.3. Grouped Frequency Distribution
cumulative frequency. The steps are almost similar with that of categorical frequency
Some of basic terms that are most frequently used while we deal with frequency distribution are
distribution.
the following:
Cumulative frequencies are used to show how many values are accumulated up to and including
Lower Class Limits are the smallest number that can belong to the different class.
a specific class.
Upper Class Limits are the largest number that can belong to the different classes.
Example 2.2: The following data represent the number of days of sick leave taken by each of 50
Class Boundaries are the number used to separate classes, but without the gaps created by
workers of a company over the last 6 weeks.
class limits.
2 0 0 5 8 3 4 1 0 0 7 1
Class midpoints are the midpoints of the classes. Each class midpoint can be found by
7 1 5 4 0 4 0 1 8 9 7 0
1 7 2 5 5 4 3 3 0 0 2 5 adding the lower class limit to the upper class limit and dividing the sum by 2.
1 3 0 2 4 5 0 5 7 5 1 1
Class width is the difference between two consecutive lower class limits or two
0 2
A. Construct ungrouped frequency distribution consecutive lower class boundaries.
B. How many workers had at least 1 day of sick leave? When the range of the data is large, the data must be grouped in which each class has more than
C. How many workers had between 3 and 5 days of sick leave? one unit in width. While we construct this frequency distribution, we have to follow the
Solution: following steps.
A. Since this data set contains only a relatively small number of distinct or different 1. Identify the largest and lowest values
values, it is convenient to represent it in a frequency table which presents each distinct 2. Find number of classes by using formula;
value along with its frequency of occurrence. Struge’s rule, where the number of class is and is the number
of observations.
7 8
3. Find the range; or and Find the class width 6 12 18 24 30 36
4. Upper class limit. Since unit of measurement is one. So 11 is the UCL of the
by dividing the range by the number of classes
first class.
11 17 23 29 35 41
5. Complete the frequency distribution as
Note that: Round the answer up to the nearest whole number if there is a reminder. For
Class limit 6-11 12-17 18-23 24-29 30-35 36-41
instance, and
Class Boundaries 5.5-11.5 11.5-17.5 17.5-23.5 23.5-29.5 29.5-35.5 35.5-41.5
4. Select the starting point as the lowest class limit. This is usually the lowest score Frequency 2 2 7 4 3 2
Example: find the cumulative frequency of example 2.3.
(observation). Add the width to that score to get the lower class limit of the next class.
Keep adding until you achieve the number of desired class calculated in step 3.
5. Find the upper class limit; subtract unit of measurement from the lower class limit of
the second class in order to get the upper limit of the first class. Then add the width to
each upper class limit to get all upper class limits.
Unit of measurement: Is the next expected upcoming value. For instance, 28, 23, 52, and
then the unit of measurement is one. Because take one datum arbitrarily, say 23, then the
next upcoming value will be 24. Therefore, If the data is 24.12, 30, 21.2
then give priority to the datum with more decimal place. Take 24.12 and guess the next 2. 4. Relative Frequency Distribution
possible value. It is 24.13. Therefore, . 2.5 Class Mark (Mid-point)
Note that: is the maximum value of unit of measurement and is the value when we An important variation of the basic frequency distribution uses relative frequencies, which are
don’t have a clue about the data. easily found by dividing each class frequency by the total of all frequencies. A relative frequency
6. Find the class boundaries. distribution includes the same class limits as a frequency distribution, but relative frequencies are
used instead of actual frequencies. The relative frequencies are sometimes expressed as percents.
and . In short, and
Example 2.3: Consider the following set of data and construct the frequency distribution. Relative frequency distribution enables us to understand the distribution of the data and to
11 29 6 33 14 21 18 17 22 38 compare different sets of data.
31 22 27 19 22 23 26 39 34 27
Steps 2.6. Graphical and Diagrammatic Presentation of Data
1. We have discussed the techniques of classification and tabulation that help us in organizing the
2. collected data in a meaningful fashion. However, this way of presentation of statistical data does
not always prove to be interesting to a layman. Too many figures are often confusing and fail to
convey the massage effectively.
3. Select starting point. Take the minimum which is 6 then add width 6 on it to get the next
class LCL.
9 10
One of the most effective and interesting alternative way in which a statistical data may be
presented is through diagrams and graphs. There are several ways in which statistical data may
be displayed pictorially such as different types of graphs and diagrams.
2.6.1. Diagrammatic display of data: Pie-chart and Bar charts
I. Pie chart
Pie chart can used to compare the relation between the whole and its components. Pie chart is a
circular diagram and the area of the sector of a circle is used in pie chart. Circles are drawn with
radii proportional to the square root of the quantities because the area of a circle is . To
II. Bar Charts
construct a pie chart (sector diagram), we draw a circle with radius (square root of the total). The The bar graph (simple bar chart, multiple bar chart and stratified or stacked bar chart) uses
total angle of the circle is . The angles of each component are calculated by the formula. vertical or horizontal bars to represent the frequencies of a distribution. While we draw bar chart,
we have to consider the following two points. These are
Make the bars the same width
Make the units on the axis that are used for the frequency equal in size
These angles are made in the circle by mean of a protractor to show different components. The A. A simple bar chart is used to represents data involving only one variable classified on
arrangement of the sectors is usually anti-clock wise. spatial, quantitative or temporal basis. In simple bar chart, we make bars of equal width but
Example2.4: The following table gives the details of monthly budget of a family. Represent variable length, i.e. the magnitude of a quantity is represented by the height or length of the
these figures by a suitable diagram. bars. Following steps are undertaken in drawing a simple bar diagram:
Draw two perpendicular lines one horizontally and the other vertically at an appropriate
place of the paper.
Take the basis of classification along horizontal line (X-axis) and the observed variable
along vertical line (Y-axis) or vice versa.
Marks signs of equal breath for each class and leave equal or not less than half breath in
between two classes.
Finally marks the values of the given variable to prepare required bars.
Example 2.5: Draw simple bar diagram to represent the profits of a bank for 5 years.
Solution: The necessary computations are given below:
11 12
in the ratio of various components. This type of diagram shows the variation in different
components within each class as well as between different classes. Sub-divided bar diagram
is also known as component bar chart or staked chart.
Example 2.7: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats
produced on a certain form during the years 1991 to 1994. Draw stratified bar chart.
B. Multiple bar charts are used two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomenon). The technique of
simple bar chart is used to draw this diagram but the difference is that we use different Solution: To make the component bar chart, first of all we have to take year wise total
shades, colors, or dots to distinguish between different phenomena. production. The required diagram is given below:
Example 2.6: Draw a multiple bar chart to represent the import and export of Canada (values in
$) for the years 1991 to 1995.
2.0
Measures of central tendency are measures of the location of the middle or the center of a
1.0
distribution. The definition of "middle" or "center" is purposely left somewhat vague so
0.0 5.5 11.5 17.5 23.5 29.5 35.5 41.5
Class boundaries
that the term "central tendency" can refer to a wide variety of measures.
Relative frequency histogram has the same shape and horizontal ( ) scale as a histogram, ❖ The tendency of statistical data to get concentrated at certain value is called central
tendency. And various methods that determine the actual value at which the data tend to
but the vertical ( ) scale is marked with relative frequencies instead of actual frequencies.
concentrate are called measure of central tendency.
(2). Frequency Polygon ❖ One of the most important objectives of statistical analysis is to get one single
A frequency polygon uses line segment connected to points located directly above class midpoint value that describes the characteristics of the entire data. Such a value is called the
values. The heights of the points correspond to the class frequencies, and the line segments are central value or average.
extended to the left and right so that the graph begins and ends on the horizontal axis with the ❖ When we want to make comparison between groups of numbers it is good to have a
same distance that the previous and next midpoint would be located. single value that is considered to be a good representative of each group. This single
Example 2.9: Take the data in example 2.3. value is called the average of the group.
2.0
3.2 Characteristic of a good measure of central tendency
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5 (A typical average should posses the following):
Midpoints
It should be defined rigidly which means that it should have a definite value.
It should be based on all observation under investigation.
It should be not be affected by extreme observations.
It should be capable of further algebraic treatment.
It should be as little as affected by fluctuations of sampling or should be stable with
sampling.
15 16
It should be ease to calculate and simple to understand. n
= Xi / n f
130
X= i
n i=1 i=1
When the data are arranged or given in the form of frequency distribution i.e. there
3) Calculate the mean for the following age distribution.
are k variate values such that a value X i has a frequency f i ( i=1,2,---,k) ,then the
Arithmetic mean will be
Class frequency Class mark Frequency X class mark
6- 10 15 8 120
k
f i Xi
∑
k
f i= n
11- 15 23 13 299
X= i=1 Where k is the number of classes and 16- 20 35 18 630
k i= 1
fi 21- 25 12 23 276
i=1
26- 30 9 28 252
31- 35 6 33 198
• Arithmetic Mean for Grouped Data total 100 1775
If data are given in the shape of a continuous frequency distribution, then the arithmetic
mean is obtained as follows:
17 18
n
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
fY i i
1775
X= i=1
= = 17.75 It is a bimodal Data: 8 and 9
n
f
100
i 3. Find the mode of 4, 12, 3, 6, and 7.
i=1
No mode for this data.
Weighted Mean - The mode of a set of numbers X1, X2, …Xn is usually denoted by X̂ .
- When a proper importance is desired to be given to different data a weighted mean is Mode for Grouped data.
appropriate.
If data are given in the shape of continuous frequency distribution, the mode is defined as:
- Weights are assigned to each item in proportion to its relative importance.
- Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn their corresponding
Δ1
Xˆ = Lmod + ( )W
weights , then the weighted mean denoted X w is defined as: Δ1 + Δ2
X W
i=1
i i
Where: X̂ = the mode of the distribution
X w = n Lmo= the lower class boundary of the modal class
W
Δ1 = f mo − f1
i
i=1
Δ2 = f mo − f 2
Example 3.5: A student obtained the following marks in his examinations: English 60,
Biology 75, Physics 59 and Chemistry 55. find the students weighted mean if weights 1, 2,
fmo= frequency of the modal class
1, 3 and 3 respectively allotted to the subjects.
Solution: f1= frequency of the class preceding the modal class
n
X w i i
60 1 + 75 2 + 63 1 + 59 3 + 55 3 615 f2= frequency of the class succeeding the modal class
Xw = i =1
= = = 61.5
n
1+ 2 +1+ 3 + 3
w
10
i
i =1 W=the size of the modal class
3.3.2 The Mode
Note: The modal class is a class with the highest frequency
Mode is a value which occurs most frequently in a set of values
The mode may not exist and even if it does exist, it may not be unique. Example 3.7: The following is the distribution of the size of certain farms selected at
In case of discrete distribution the value having the maximum frequency is the random from a district. Calculate the mode of the distribution.
model value. Class Frequency
If in a set of observed values, all values occur once or equal number of times, there 6- 10 15
is no mode 11- 15 23
16- 20 35
Examples: 21- 25 12
1. Find the mode of 5, 3, 5, 8, and 9 26- 30 9
Mode =5 31- 35 6
19 20
total 100 70-74 3 75
th th
Modal class is a class with the highest frequency. Thus, modal class is 16- 20 n 75
The median class is the class which contains obs. = = 35 th observation =50-54
Δ1 12 2 2
Xˆ = Lmod + ( )W = 15.5 + 5 = 17.21
Δ1 + Δ2 12 + 23 ~
X = Lmed +
W n
( − f c )= 49.5 +
5
(35 − 17 ) = 53.6
f med 2 22
3.3.3 The Median
- In a distribution, median is the value of the variable which divides it in to two equal ~
Empirical relationship between X, Xˆ, and X
halves.
- In an ordered series of data median is an observation lying exactly in the middle of the ~
• X = Xˆ = X , for symmetrical distribution
series. It is the middle most value in the sense that the number of values less than the
median is equal to the number of values greater than it.
( ~
)
• X − Xˆ = 3 X − X , for uni-modal skewed or asymmetrical frequency distribution.
-If X1, X2, …Xn be the observations, then the numbers arranged in ascending order will be
Chapter Four
X[1], X[2], …X[n], where X[i] is ith smallest value.
X[1]< X[2]< …<X[n]
➢ Measure of Variation (Dispersion)
-Median is denoted by. X
➢ The scatter or spread of items of a distribution is known as dispersion or variation. In other
Median for grouped data.
words the degree to which numerical data tend to spread about an average value is called
-If data are given in the shape of continuous frequency distribution, the median is defined dispersion or variation of the data.
as: ➢ Measures of dispersions are statistical measures which provide ways of measuring the extent
W n in which data are dispersed or spread out.
~
X = Lmed + ( − fc )
f med 2
There are many types of Measures of dispersions.
Where: L med =lower class boundary of the median class.
The Variance
f med = The frequency of the median class Population Variance
f c = The comulative frequency (less than type) preceding the median class. If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".
W=the size of the median class. N
( xi − u ) 2
Population Variance = = i =1
, i = 1,2,3,..., N
2
n=total number of observation. N
Note: The median class is the class with the smallest cumulative frequency (less than type) Sample Variance
One would expect the sample variance to simply be the population variance with the population
greater than or equal to n/2.
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
Example 3.9: Find the median of the following distribution. the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To counteract this, the sum of the squares of the deviations is divided by
Class Frequency LCF one less than the sample size.
40-44 7 7
n
( xi − x ) 2
45-49 10 17 Sample Variance = i =1
50-54 22 39 n −1
55-59 15 54
60-64 12 66
65-69 6 72
21 22
2
I.e. The sample variance, denoted by s , of a set of n observed values having a mean x is the • Is defined as the ratio of standard deviation to the mean usually expressed as percents.
sum of the squared deviations divided by n − 1 . S
C.V = 100 0
0
X
The following steps are used to calculate the sample variance: • The distribution having less C.V is said to be less variable or more consistent.
1. Find the arithmetic mean. Examples:
2. Find the difference between each observation and the mean. 1. An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to the
3. Square these differences. same industry gives the following results
4. Sum the squared differences. Value Firm A Firm B
5. Since the data is a sample, divide the number (from step 4 above) by the number of Mean wage 52.5 47.5
observations minus one, i.e., n-1 (where n is equal to the number of observations in the data Median wage 50.5 45.5
set). Variance 100 121
For the case of frequency distribution it is expressed as:
n
In which firm A or B is there greater variability in individual wages?
f i ( xi − x) 2 Solutions: Calculate coefficient of variation for both firms.
= i =1
2
SA 10
s n −1 C.VA = 100 0 0 = 100 0 0 = 19.05 0 0
XA 52.5
We usually use the following short cut formula.
n 2
x
2
− nx SB 11
s
2
= i =1 i
, for raw data C.VB = 100 0 0 = 100 0 0 = 23.16
n −1 XB 47.5
n 2
f x − nx
2
i
= i =1
for frequency distribiti on, where fi = n Since C.VA < C.VB, in firm B there is greater variability in individual wages.
2 i
s n −1
,
Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
Chapter Five: Probability
root must be taken.
population s tan dard deviation = = 5.Introduction
2
23 24
4. Sample Space(S): Set of all possible outcomes of a probability experiment. - Combination rule
Example: Sample space of a trial conducted by three tossing of a coin is Addition Rule
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} If event A can occur in m possible ways and event B can occur in n possible ways, there are m + n
Sample space can be possible ways for either event A or event B to occur, but only if there are no events in common
Countable (finite or infinite) between them.
Uncountable I.e. n (A or B) =n (A) +n (B)-n (A B)
5. Event (Sample Point): It is a subset of sample space. It is a statement about one or • To list the outcomes of the sequence of events, a useful device called tree diagram is used.
more outcomes of a random experiment. It is denoted by capital letter A, B, C - - -. Example: A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk
For example, in the event, that there are exactly two heads in three tossing of a coin, it with bread, cake and sandwich. How many possibilities does he have?
would consist of three points HTH, HHT and THH.
Remark: If S (sample space) has n members with two possible outcomes in each trial then Solutions:
there are exactly 2n subsets or events. Tea
Bread Coffee Milk
6. Equally Likely Events: Events which have the same chance of occurring.
Bread
7. Complement of an Event: the complement of an event A means non- occurrence of A Cake cake Bread
Sandwich Sandwich Cake
and is denoted by A' orAc or {A , contains those points of the sample space which Sandwich
don’t belong to A. ➢ There are nine Possibility
8. Elementary (simple) Event: an event having only a single element or sample point. The Multiplication Rule:
9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same If a choice consists of k steps of which the first can be made in n1 ways, the second can be made
time. in n2 ways…, the kthcan be made in nkways, then the whole choice can be made in
10. Independent Events: Two events are said to be independent if the occurrence of one ( n1 n2 ,...,nk ) ways.
does not affect the probability of the other occurring. Example 5.1: 1) A student has two shoes, three trousers and three jackets. In how many can be
11. Dependent Events: Two events are dependent if the first event affects the outcome or dressed?
occurrence of the second event in a way the probability is changed. Permutation
5.3. Counting Rules An arrangement of n objects in a specified order is called permutation of the objects.
In order to calculate probabilities, we have to know Permutation Rules:
• The number of elements of an event 1. The number of permutations of n distinct objects taken all together is n!
• The number of elements of the sample space. Where n! =n*(n-1)*(n-2)*,…,*2*1.
That is in order to judge what is probable, we have to know what is possible. 2. The arrangement of n objects in a specified order using r objects at a time is called the
• In order to determine the number of outcomes, one can use several rules of counting. permutation of n objects taken r objects at a time. It is written as n P r and the formula is
- The addition rule n!
Pr =
- The multiplication rule
n
(n − r )!
- Permutation rule
25 26
3. The number of permutations of n objects in which k1 are alike, k2 are alike ---- etc is b) One particular Statistician should be included
n! c) Two particular Mathematicians can not be included on the committee.
nPr =
k1!k 2 !....k n ! 3. A committee of 5 people must be selected out 5men and 8 women. In how many
Example 5.2: 1. Suppose we have a letters A, B, C, D ways can be selection made if there are three women on the committee?
a) How many permutations are there taking all the four? 5.4. Approaches to measuring Probability
b) How many permutations are there two letters at a time? There are three different conceptual approaches to study probability theory. These are:
2. How many different permutations can be made from the letters in the word • The classical approach.
“MISSISSIPPI”? • The frequencies approach.
Combination • The subjective approach.
A selection of objects without regard to order is called combination. The classical approach
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two This approach is used when:
letters. - All outcomes are equally likely and mutually exclusive.
Solutions: Combination: - Total number of outcome is finite, say N.
Definition: If a random experiment with N equally likely outcomes is conducted and out of these
Permutation:
NA outcomes are favorable to the event A, then the probability that event A occur denoted P (A)
AB BA CA DA AB BC
AC BC CB DB AC BD is defined as:
AD DC N A n( A) No. of outcomes favourable for A
AD BD CD DC P ( A) = = =
N n(s ) Total Number of outcomes
Limitation:
✓ If it is not possible to enumerate all the possible outcomes for an experiment.
✓ If the sample points (outcomes) are not mutually independent.
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
✓ If the total number of outcomes is infinite.
Combination Rule ✓ If each and every outcomes is not equally likely.
The number of combinations of r objects selected from n objects is denoted by nC r or (nr ) and Example 5.4: 1.A fair die is tossed once. What is the probability of getting
a) Number 4?
is given by the formula:
b) An odd number?
n!
nCr =
(n − r )!r! c) Number greater than 4?
Example 5.3: 1. In how many ways can a committee of 5 people be chosen out of 9 people? 2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of these
2. Out of 5 Mathematician and 7 Statistician a committee consisting of 2 candles are selected at random, what is the probability?
Mathematician and 3 Statistician is to be formed. In how many ways this can be done a) All will be defective.
may be very different from Daniel's. Abebe, using only his knowledge of the current team and • Toss a coin n time and count the number of heads.
past achievements may rate the chances at 30%. Daniel, on the other hand, may rate the chances • Number of children in a family.
as 10% based on some inside knowledge he has about key players having to be sold in the next • Number of car accidents per week.
two months. • Number of defective items in a given company.
5.5. Some basic properties of probability • Number of bacteria per two cubic centimeter of water.
2. Continuous random variable: are variables that can assume all values between any
1) For any event A , P( A) 0
two give values.
2) P(φ) = 0
Examples:
3) For any event A and B ,P(AuB)=P(A)+P(B)-P(AnB)
• Height of students at certain college.
( )
4) P A− = 1 − P(A)
• Mark of a student.
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the • Life time of light bulbs.
other event then the two events are conditional or dependant events. • Length of time required to complete a given training.
Probability distribution:- consists of a value a random variable can assume and the
Conditional probability of an event
corresponding probabilities of the values or it is a function that assigns probability for
The conditional probability of an event A given that B has already occurred, denoted by P(A|B).
each element of random variable.
Since A is known to have occurred, it becomes the new sample space replacing the original
Probability distribution can be discrete or continues.
sample space.
From this we are led to the definition A) Discrete probability distribution:- is a formula, a table, a graph or other devices
p(A B used to specify all possible values of the discrete random variable(R.V) X along with
P (A B ) = , P (B) 0 or P (A B) = P (A|B).P(B)
P(B ) their respective probabilities.
29 30
Example: Consider the experiment of tossing a coin three times. Let X be the number of 1. Let a discrete random variable X assume the values X , X , ….,X with the
1 2 n
heads. Construct the probability distribution of X. probabilities P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X, denoted
2) A balanced die is tossed two twice, construct a probability distribution if as E(X) is defined as:
E(X) =X1.P(X1) +X2.P(X2) +…. +Xn.P(Xn)
A) X is the sum the number of spots in the two trials. n
= X i .P ( X i )
B) X is the absolute difference of the number of spots in the trials. i =1
b −1 E ( X ) = X . f ( x)d ( x)
P(a X b ) = P( x) a
X = a +1
b −1
Mean and Variance of a random variable
P(a X b ) = P( x)
X =a Let X is given random variable.
b 1. The expected value of X is its mean
P(a X b ) = P( x) Mean of X=E(X)
X = a +1
2. The variance of X is given by:
( )
b
P(a X b ) = P( x) Variance of X=Var(x) = E X 2 − ( E ( X )) 2
X =a
Where
B) Continuous probability distribution
n
Definition: a non negative function f(x) is called probability distribution of continuous E ( X 2 ) = X i .P( X i )
2
If X is discrete
R.V X if the total area bounded by the curve and the X-axis is 1 and if the sub area i =1
under the curve bounded by the curve & X-axis and perpendicularly erected at any
= X 2 f (x )d ( x) if X is continuous
points a and b give the probability that X is between a and b.
x
31 32
c. Var(x) E(X)=np and var(X)=npq
d. E (3x 2 − 2 x) Example:
1. Binomial Distribution 1. The Department of Civil Service for the state of Gambella reports that 20% of the workforce
A binomial experiment is a probability experiment that satisfies the following four in is unemployed. From a sample of 14 workers, calculate the following probabilities using
requirements called assumptions of a binomial distribution. the formula for the binomial probability distribution (n=14, p =.2,
1. The experiment consists of n identical trials. – three are unemployed: P(x=3)=.250
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a – Note: These are also examples of cumulative probability distributions:
3. The probability of each outcome does not change from trial to trial, and – P(x >3)=0.250 +0.172 +0.086 +0.032 +0.009 +0.002=0.551
– at least one of the workers is unemployed: P(x >1) = 1-P(x=0) =1-0.044 =0 .956
4. The trials are independent, thus we must sample with replacement.
– at most two of the workers are unemployed: P(x<2)=0.044 +0.154 +0.250 =0.448
Examples of binomial experiments
Exercises
• Tossing a coin 20 times to see how many tails occur.
1. What is the probability of getting three heads by tossing a fair con four times?
• Asking 200 people if they watch BBC news.
2. Suppose that an examination consists of six true and false questions, and assume that a student has
• Registering a newly produced product as defective or non defective.
no knowledge of the subject matter. The probability that the student will guess the correct answer
• Asking 100 people if they favor the ruling party. to the first question is 30%. Likewise, the probability of guessing each of the remaining questions
• Rolling a die to see if a 5 appears. correctly is also 30%.
Definition: The outcomes of the binomial experiment and the corresponding a) What is the probability of getting more than three correct answers?
probabilities of these outcomes are called Binomial Distribution. b) What is the probability of getting at least two correct answers?
Let p=probability of success q= 1-p=probability of failure on any given trials
Then the probability getting x success in n trials becomes 2. Poisson Distribution
n x n − x - A random variable X is said to have a Poisson distribution if its probability distribution is given
. p q x = 0,1,3,....n
P( X = x ) = x by:
0 otherwise x . −
x = 0,1,2.....
P( X = x) = x! Where is the average number occurrence of an
And this sometimes written as 0
otherwise
X ~ Bin (n, p )
event in the unit length of interval or distance and x is the number of occurrence in
When using the binomial formula to solve problems, we have to identify three things:
a Poisson process
• The number of trials (n)
- The Poisson distribution depends only on the average number of occurrences per unit
• The probability of a success on any one trial (P) and
time of space.
• The number of successes desired (X).
- The Poisson distribution is used as a distribution of rare events, such as:
Remark: If X is a binomial random variable with parameters n and p then
• Number of misprints.
33 34
• Natural disasters like earth quake. 4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different
• Accidents. normal distribution. Thus, the normal distribution is completely described by two parameters:
• Arrivals 5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is
• Number of misprints per page 0.5 f ( x)d (x ) = 1
−
- The process that gives rise to such events is called Poisson process.
6. It is unimodal, i.e., values mound up only in the center of the curve.
- If X is a Poisson random variable with parameters λ then
7. Median=Mean=mod =μ and located at the center of the distribution.
E(x) = λ, var(x)= λ 8. The probability that a random variable will have a value between any two points is equal to the area
under the curve between those points.
Example: Note: To facilitate the use of normal distribution, the following distribution known as the standard
1. The Sylvania Urgent Care facility specializes in caring for minor injuries, colds, and flu. For normal distribution was derived by using the transformation
the evening hours of 6-10 PM the mean number of arrivals is 4.0 per hour. What is the
−1
X −
probability of 4 arrivals in an hour? P(X= 4) = (44)(e-4)/4! = 0.1954.
Z=
f (z ) =
1
e2
Z2
( )
i.e. if X ~ N , 2 then Z ~ (0,1)
2. If 1.6 accidents can be expected an intersection on any given day, what is the probability that 2
there will be 3 accidents on any given day? Properties of the Standard Normal Distribution:
3. A sale firm receives, on the average, 3 calls per hour on its toll-free number. For any given
Same as a normal distribution, but also...
hour, what is probability that it will receive the following at most 3 calls?
• Mean is zero
Common Continuous Probability Distributions
1. Normal Distribution • Variance is one
A random variable X is said to have a normal distribution if its probability density
• Standard Deviation is one
function is given by
- Areas under the standard normal distribution curve have been tabulated in various ways.
x−
2
f (x ) =
1 −1
The most common ones are the areas between Z=0 and a positive value of Z.
.e 2
where − x , − , 0
2
- Given a normal distributed random variable X with Mean μ and standard deviation σ
= E (x ) and 2 = var iance(x ) are parameters of the normal distribution. a− X − b−
P(a X b ) = P
Properties of Normal Distribution: a− b−
P Z
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum ordinate is at
μ=x and is given by
Examples:
f (x ) =
1
2 1. Find the area under the standard normal distribution which lies
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
a) Between Z = 0 and Z = 0.96
3. It is a continuous distribution i.e. there is no gaps or holes.
35 36
Solution: Area = P ( −0.67 Z 0.75)
= P ( −0.67 Z 0) + P (0 Z 0.75)
Area = P(0 Z 0.96) = 0.3315 = P (0 Z 0.67) + P (0 Z 0.75)
= 0.2486 + 0.2734 = 0.5220
Solution: Solution:
Solution: P ( Z z ) = 0.9868
= P( Z 0) + P(0 Z z )
Area = P( Z −0.35)
= 0.50 + P(0 Z z )
= 1 − P( Z −0.35) P(0 Z z ) = 0.9868 − 0.50 = 0.4868
= 1 − 0.6368 = 0.3632 and from table
P(0 Z 2.2) = 0.4868
z = 2.2
e) Between Z = −0.67 and Z = 0.75
Solution: 3. A random variable X has a normal distribution with mean 80 and standard deviation 4.8.
What is the probability that it will take a value
37 38
b) Greater than 76.4 Area between 0 and z
Solution
X is normal with mean, = 80, s tan dard deviation, = 4.8 Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
a)
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
X − 87.2 − 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
P ( X 87.2) = P ( )
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
87.2 − 80 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
= P( Z )
4.8 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
= P ( Z 1.5) 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
= P ( Z 0) + P (0 Z 1.5) 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
= 0.50 + 0.4332 = 0.9332 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
b) 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
X − 76.4 − 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
P ( X 76.4) = P ( )
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
76.4 − 80 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
= P( Z )
4.8 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
= P ( Z −0.75)
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
= P ( Z 0) + P (0 Z 0.75)
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
= 0.50 + 0.2734 = 0.7734
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
c)
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
81.2 − X − 86.0 − 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
P(81.2 X 86.0) = P( )
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
81.2 − 80 86.0 − 80 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
= P( Z ) 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
4.8 4.8
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
= P(0.25 Z 1.25)
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
= P(0 Z 1.25) − P(0 Z 1.25) 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
= 0.3934 − 0.0987 = 0.2957 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
39 40
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 3. Sample size: The number of sampling units which are selected from a population. The sample
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 size depends on a number of considerations which are as follows.
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
a) The purpose for which the sample is drawn.
Sampling is that part of statistical practice concerned with the selection of individual
d) Resources allotted for the study in terms of time and money.
observations intended to yield some knowledge about a population of concern, especially for the
purposes of statistical inference. Before having further discussion on the specific type of e) Precision required.
sampling methods, it is valuable to be acquainted to the following terms:
4. Study Unit: The unit on which information is collected.
1. Sampling: The process or method of sample selection from the population.
Sampling can be done either with replacement or with out replacement. 5. Sampling Fraction (Sampling Interval): The ratio between the numbers of units in the
1.1) Sampling with replacement (swr): in this case, a unit is selected from a population with a sample to the number of units in the source population.
known probability and a unit is returned to the population before the next selection is made (after
6. Sampling frame: The list of all the units in the source population from which a sample is to
records its characteristic(s)).Thus, in this method at each selection, the population size remains
be taken.
constant and the probability at each selection or draw remains the same and a unit has chances of
being selected more than once. There are Nn possible samples of size n from a population of N Examples:
units. ▪ List of house holds.
1.2) ASampling without replacement (swor): in this selection procedure, if a unit from a ▪ List of students in the registrar office.
population size N is selected, it is not returned to the population. Thus, for any subsequent 7. Errors in sample survey:
N There are two types of errors
selection, the population size reduced by one. There are possible samples of size n from a
n a) Sampling error:
population of N units. - It is the discrepancy between the population value and sample value due to the fact
2. Sampling unit: the ultimate unit to be sampled or elements of the population to be sampled. that the sample is not a perfect representation of the population.
Examples: - May arise due to inappropriate sampling techniques applied
✓ If some body studies Scio-economic status of the house holds, house holds b) Non sampling errors: are errors due to procedure bias such as:
is the sampling unit. ✓ Due to incorrect responses ( is called response or observational error)
✓ If one studies performance of freshman students in some college, the student is the ✓ Measurement or lack of preciseness of definition.
sampling unit. ✓ Errors at different stages in processing the data such as editing and tabulating
of data..
41 42
Reasons for Sampling assigning a number to each sampling unit then samples are selected using Table of
❖ Reduced cost: Finances required to cover the whole population can hardly be made Random Numbers or Computer application.
available Table of Random Numbers Table of random numbers are tables of the digits 0, 1, 2,…,, 9, each
❖ Greater speed: Too much time required studying the whole population and often the digit having an equal chance of selection at any draw. For convenience, the numbers are put in
study becomes outdated by the time it is complete. blocks of five. In using these tables to select a simple random sample, the steps are:
❖ Greater accuracy: Complete enumeration (census study) adds many errors which are i. Number the units in the population from 1 to N (prepare frame of the population).
reduced or eliminating by sampling. ii. Then proceed in the following way
❖ The only option when the population is infinite: Incase, the population is infinite or If the first digit of N is a number between 5 and 9 inclusively, the following method of selection
consists uncountable number of units, its study is impossible. is adequate. Suppose N=528 and we want n=10.
Because of the above consideration, in practice we take sample and make conclusion about the Select three columns from the table of random numbers, say columns 25 to 27. Go down the
population values such as population mean and population variance, known as parameters of the three columns selecting the first 10 distinct numbers between 001 & 528. These are 36, 509, 364,
population. 417, 348, 127, 149, 186, 439, and 329. Then the units with these roll numbers are our samples.
Sometimes taking a census makes more sense than using a sample. Some of the reasons Note: If sampling is with out replacement, reject all the numbers that comes more than once.
include: 2. Stratified Random Sampling:
➢ Universality ➢ The population will be divided in to non-overlapping and exhaustive groups called strata.
➢ Qualitative ness ➢ A separate sample is taken from each stratum using Simple or Systematic Random
➢ Detailed ness Sampling techniques.
➢ Non-representative ness ➢ Elements in the same strata should be more or less homogeneous while different in
Sampling Techniques different strata.
- There are two types of sampling techniques. ➢ It is applied if the population is heterogeneous.
A. Random Sampling or probability sampling. ➢ The main advantage is it improves representativeness of the sample and it creates
A probability sampling scheme is one in which every unit in the population has a known reasonable comparison among strata. The major limitation is it requires separate sampling
nonzero probability of being sampled and the process involves random selection. Probability frame for each stratum.
sampling includes: Simple Random Sampling, Systematic Sampling, Stratified Sampling, ➢ Some of the criteria basis for stratification is: Characteristics of the population (Sex, Age,
Cluster Sampling or Multistage Sampling. ethnic origin and Occupation, etc.) and Geographical
1. Simple Random Sampling:
3. Cluster sampling
➢ It is a method of selecting items from a population such that every possible sample of
specific size has an equal chance of being selected. In this case, sampling may be with or
❖ Dividing the population into separate groups of elements called clusters. Each element of
without replacement. Or all elements in the population have the same pre-assigned non
the population belongs to one and only one clusters.
zero probability to be included in to the sample.
❖ A simple random sample of the clusters is then taken. All elements within each sampled
➢ This could be accomplished by writing each study units name on a slip of paper and
cluster form the sample.
selecting adequate number of them using Lottery Method. It can also be done by
43 44
❖ Cluster sampling tends to provide best results when the elements within the clusters are The general advantage of Systematic Random Sampling is the fact that it is easier and less time
heterogeneous. consuming to perform. In some situation it can also be conducted without sampling frame.
❖ It is used in large geographic samples where no list is available of all the units in the However, this method can be biased when there is cyclic patter in the order of the subjects.
population but the population boundaries can be well-defined.
▪ For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all the
❖ Cluster sampling must use a random sampling method at each stage. This may result in a
room numbers in the dorm. Say there are 100 rooms. Divide the total number of rooms (100)
somewhat larger sample than using a simple random sampling method, but it saves time
by the number of rooms you want in the sample (25). The answer is 4. This means that you are
and money.
going to select every fourth dorm room from the list. But you must first consult a table of
❖ Cluster sampling is useful when it is difficult or costly to generate a simple random
random numbers. Pick any point on the table, and read across or down until you come to a
sample.
number between 1 and 4. This is your random starting point. Say your random starting point is
For example, to obtain information about the drug habits of all high school students in a state, "3". This means you select dorm room 3 as your first room, and then every fourth room down
you could obtain a list of all the school districts in the state and select a simple random the list (3, 7, 11, 15, 19, etc.) until you have 25 rooms selected.
sample of school districts. Then, within in each selected school district, list all the high
B. Non Random Sampling or non probability sampling.
schools and select a simple random sample of high schools. Within each selected high
❖ It is a sampling technique in which the choice of individuals for a sample depends on the
school, list all high school classes, and select a simple random sample of classes. Then use
basis of convenience, personal choice or interest.
the high school students in those classes as your sample.
❖ It is any sampling method where some elements of the population have no chance of
4. Systematic Random Sampling: selection or where the probability of selection can't be accurately determined.
❖ In No probability sampling, the sample is less likely to be representative of the
This method selects units at a fixed interval throughout the sampling frame after a random
population, thus information about the relationship between sample and population is
start.
limited, making it difficult to extrapolate from the sample to the population.
❖ Non probability sampling is used when there is no sampling frame to conduct probability
❖ Is obtained by numbering each subject of the population and then selecting every
sampling, or when it is impossible to conduct probability sampling due to economical and
k th number.
feasibility factors
❖ Here are the steps you need to follow in order to achieve a systematic random sample:
❖ Non probability sampling is divided into Purposive, Convenience, Quota and Snowball
▪ Number the units in the population from 1 to N, Sampling.
▪ Decide on the n (sample size) that you need,
A. Judgmental or Purposive Sampling: The researcher chooses the sample based on who
▪ Calculate the Sampling Fraction k (K = N/n),
he/she think would be appropriate for the study.Samples are taken based on previous knowledge
▪ Randomly select an integer between 1 to k, suppose it is j (1 j k )
of the population (from which the samples are taken), and the specific purpose of the study or
j th unit is selected at first and then ( j + k ) , ( j + 2k ) , ....etc until the
th th
▪ The investigation. Researchers use their personal judgment in selecting the sample(s)
required sample size is reached
45 46
B. Convenience Sampling: The selection of units from the population is based on easy
availability and/or accessibility.
D. Snowball Sampling: The researcher begins by identifying someone who meets the inclusion
criteria of the study. Then the study subject would be asked to recommend others who s/he may
know who also meet the criteria.
47