Chapter 3
Chapter 3
Chapter 3
TADESSE AWOKE
History of statistics
Sir Ronald Aylmer Fisher (1890-1962)
The theory of estimates and statistical inference Analysis of variance (Fisher's ANOVA) Maximum likelihood estimation
Thomas Bayes ()
Probabilities theory Prior and posterior distribution
Develop a hypothesis Design experiments or other tests Collect and record data Peer review Analyze and interpret data Disseminate results Public understanding of research Scientific impact of research
8
Replication of results
Sampling
Research and Sample:
research is to search or investigate exhaustively It is a careful or diligent search, studious inquiry or examination especially investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts or practical application of such new or revised theories or laws it can also be the collection of information about a particular subject. In research terms a sample is a group of people, objects, or items that are taken from a larger population for measurement. The sample should be representative of the population to ensure that we can generalize the findings from the research sample to the population as a whole.
10
Sampling Strategy
The manner in which a sample is drawn is an important factor in determining how useful the sample will be for making inferences about the population from which it is drawn. It is quite possible to have a very large sample upon which no sound decision can be based. This occurs because the respondents in the sample are not really similar to the population about which we want to make generalizations. To be useful, the sample must be representative of the population about which we wish to make generalizations.
11
Basic Terms
A population is a group of individuals persons, objects, or items from which samples are taken for measurement for example a population of presidents or professors, books or students. Census Obtained by collecting information about each member of a population Sample Obtained by collecting information only about some members of a "population Sampling Frame is the list of people from which the sample is taken. It should be comprehensive, complete and up-to-date. Examples of sampling frame: Electoral Register; Postcode Address File; telephone book and so on.
12
Probability samples: With probability sampling methods, each population element has a known (non-zero) chance of being chosen for the sample. Non-probability samples: With non-probability sampling methods, we do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen
13
What is Sampling?
Sampling is the act, process, or technique of selecting a suitable sample, or a representative part of a population for the purpose of determining parameters or characteristics of the whole population. When dealing with people, it can be defined as a set of respondents (people) selected from a larger population for the purpose of a survey. The purpose of sampling is to draw conclusions about populations from samples, we must use inferential statistics which enables us to determine a population`s characteristics by directly observing only a portion (or sample) of the population.
14
What is Sampling?
We obtain a sample rather than a complete enumeration (a census ) of the population for many reasons. Obviously, it is cheaper to observe a part rather than the whole, but we should prepare ourselves to cope with the dangers of using samples. There would also be difficulties measuring whole populations because: The large size of many populations Inaccessibility of some of the population Destructiveness of the observation Accuracy
15
16
Error in Sampling
A sample is expected to mirror the population from which it comes, however, there is no guarantee that any sample will be precisely representative of the population. The uncertainty associated with an estimate that is based on data gathered from a sample of the population rather than the full population is known as sampling error. So, the question is, why do sample estimates have uncertainty associated with them? There are two reasons for this question. These are:
Estimates of characteristics from the sample data can differ from those that would be obtained if the entire population were surveyed.
Estimates from one subset or sample of the population can differ from those based on a different sample from that same population.
18
Sampling Error
One of the most frequent cause that makes the sample unrepresentative of the total population is sampling error. Sampling error comprises the differences between the sample and the population that are due solely to the particular participants that have been selected. Sampling error can make a sample unrepresentative of its population and it is related to sample size.
19
20
21
The Cause of Non-Sampling Error NonThe interviewers effect The respondent effect Knowing of the study purpose Induced bias Non-response In general, Non-sampling error can be grouped into two main types: systematic and random. Systematic error makes survey results unrepresentative of the target population by distorting the survey estimates in one direction. Random error can distort the results in any given direction but tend to balance out on average Thus, the total survey error is the sum of both sampling error and nonsampling error.
22
Advantage of Sampling: Sampling is a must in some situations: Saves time Sampling reduces the study population to a reasonable size that expense are greatly reduced. Sometimes the experiments are done on sample basis Sampling saves the source of data from being all consumed. Sample data is also used to check the accuracy of the census data
23
Disadvantage of Sampling
If sampling is biased, or not representative or too small the conclusion may not be valid and reliable In research the respondents of the study must have a common characteristics which is the basis of the study If the population is very large and there are many sections and subsections, the sampling procedure becomes very complicated If the researcher does not possess the necessary skill and technical knowhow in sampling procedure, then the outcome will be devastated.
24
Types of Sampling
There are many methods of sampling when doing research. One of the most important decisions that any researcher makes is how to obtain the type of participants needed for the study. The sample that we draw for our study determines the generalizability of our findings. When we draw our sample, we want to have a good representation of all of the kinds of people in the population. In General there are two methods of sampling: Probability Sampling Method Non-Probability Sampling Method
25
Samples
Non-Probability Samples
Probability Samples
Stratified Cluster
27
Sampling Frame
The sampling frame is the population as it is defined and available through records. There are a number of probability sampling techniques that can be used depending on the types of the population complexity we want to study. Where do we start? When we use probability sampling, we begin by defining our population.
28
29
30
Steps in selecting sample using table of random number define the population Determine the desired sample size List the population from 1 to N Assign each of the individuals on the list a consecutive number from 0 to the required number, like 01-99, or 001-999. Decided row wise or column wise to read. Select an arbitrary number in the table of random number with defined number of row and column. Make sure that the number of digits of the selected number should be the same with the number of digit of N. If the selected number corresponds to the number assigned to the individual in the population, then that individual is in the sample, otherwise drop the number and proceed to the next number either row wise or column wise. Repeat the steps until the desired sample size is reached.
31
Example
Assume that the total number of patients who visit Gondar University Hospital for the last six months is N. We want to see the prevalence of TB among those patients who visited the hospital. So if we thing that those patients who visited the hospital within the specified time period are homogeneous with respect to the variable of interest and list of the patients are available, then we can use simple random sampling to select the sample.
32
it needs the population to be homogeneous, however the method does not require frame.
Hence, in the absence of frame, this method will be the best choice.
33
Steps in systematic sampling: Define the population Determine the desired sample List the population from 1 to N Determine K, where k=N/n Select a random number between 1 and k, let us denote this number by a Starting at a, take every Kth number on the list until the desired sample is obtained.Then the selected list will be a, a+k, a+2k, a+3k, ,
34
Let's say that we lined up our population into a nice and neat sampling frame and selected every 3rd member. What would our sample look like? Does it look good? Since systematic random sampling is a type of probability sampling, the researcher must ensure that all the members of the population have equal chances of being selected as the starting point or the initial subject. The researcher must be certain that the chosen constant interval between subjects do not reflect a certain pattern of traits present in the population. If a pattern in the population exists and it coincides with the interval set by the researcher, randomness of the sampling technique is compromised.
35
39
There are two methods to get the study subject from each subgroup, proportional allocation or equal allocation. We use proportional allocation technique when our subgroups vary dramatically in size in our population
40
The higher the population in the subgroup, the higher the sample size will be. However, equal allocation will be used if the total population from each subgroup is approximately equal. Consider the following figure:
41
Advantage of stratified sampling over simple random sampling It can provide greater precision than a simple random sample of the same size. Because it provides greater precision, a stratified sample often requires a smaller sample, which saves money. A stratified sample can guard against an "unrepresentative" sample We can ensure that we obtain sufficient sample points to support a separate analysis of any subgroup. The main disadvantage of a stratified sample is that it may require more administrative effort than a simple random sample.
42
43
45
1. Purposive Sampling
When the desired population for the study is rare or very difficult to locate and recruit for a study, purposive sampling may be the only option. For example, you are interested in studying cognitive processing speed of young adults who have suffered closed head brain injuries in automobile accidents. This would be a difficult population to find.
48
2. Convenience Sampling
Convenience sampling selects a particular group of people but it does not come close to sampling all of a population. The sample would generalize only to similar programs in similar cities. Convenience sampling looks just like cluster sampling. The major difference is that the clusters of research participants are selected by convenience rather than by a random process.
3. Judgment Sampling
The researcher selects the sample based on judgment
4. Quota sampling
It is the non-probability equivalent of stratified sampling. This differs from stratified sampling, where the stratums are filled by random sampling.
49
5. Snowball sampling
It is a special non-probability method used when the desired sample characteristic is rare. Snowball sampling relies on referrals from initial subjects to generate additional subjects. What we need to do in case of snowball sampling is that first identify someone who meets the criteria and then let him/her bring the other he/she knew. While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
50
Sample Size
Determining the sample size for a study is a crucial component of study design. The goal is to include sufficient numbers of subjects so that statistically significant results can be detected. Among the questions that a researcher should ask when planning a survey or study is that "How large a sample do I need? The answer will depend on the aims, nature and scope of the study and on the expected result. All of which should be carefully considered at the planning stage. In general, sample size depends on:
The type of data analysis to be performed
The desired precision of the estimates one wishes to achieve The kind and number of comparisons that will be made The number of variables that have to be examined simultaneously How heterogeneous the sampled population is.
51
There are three possible categories of outcome variables. The first is where the variable of interest has only two alternatives response: yes/no, dead/alive, vaccinated/not vaccinated and so on. The second category covers those outcome variable with multiple, mutually exclusive alternatives responses, such as marital status, religion, blood group and so on. For these two categories of outcome variables, the data are generally express as percentages or rates. So we can use percentage to compute the sample size. The third category covers continuous response variables such as birth weight, age at first marriage, blood pressure and cerium uric acid level, for which numerical measurement are usually made. In this case the data are summarize in the form of means and standard deviations or their derivatives.
52
from a pilot study, from secondary data, from judgment of the researcher.
53
Maximum acceptable difference: This is the maximum amount of error that you are willing to accept. Desired confidence level : The confidence level is your level of certainty that the sample mean does not differ from the true population mean by more than the maximum acceptable difference. Commonly we use a 95% confidence level. Then the sample size determination formula for single population mean is defined by:
Where = The level of significance which can be obtain as 1-confidence level. =Standard deviation of the population w= Maximum acceptable difference z /2 = The value under standard normal table for the given value of confidence level
54
item from a pilot study, item judgment of the researcher. item Simply taking 50% Maximum acceptable difference: Desired confidence level:
55
Then the formula for the sample size of single population proportion is defined as:
56
Example
One of MPH student want to conduct a research on the prevalence of ANC utilization of mothers in DABAT district. Given that the prevalence from the previous study found to be 45.7% , what will be the sample size he should take to address his objective? Solution Margin of error d= 5% A confidence level of 95% will give the value of as Z/2=1.96. Then using the formula of:
n=382
57
Example A new calcium channel blocker is to be tested for treatment of patients with unstable angina. The effect on heart rate is unknown. Suppose it is determined that a clinically important change in the heart rate as a result of taking this medication is 5 beats per minute in either direction over the initial 48 hour period after taking the medication. What sample size is needed for a study of the change in heart rate if the study is to have 80% power, with =0.05, for detecting a change in =0.05, heart rate of 5 beats per minute in either direction, assuming that the standard deviation is 10 beats/minute.
Some Considerations
59
60
61
62
Note that, the formula will give you the sample size which will be taken from each sample and moreover it does not include continuity correction in to account. Continuity correction brings normal curve probability in closer agreement with the binomial probabilities. By applying continuity correction, we increase n slightly.
63
example
An investigator wants to determine if the mortality rate in calves raised by farmer's wives differs from the mortality rate in calves raised by hired managers. He/she hypothesizes a calf mortality rate of 0.25 for calves raised by farmer's wife and 0.40 for calves raised by hired managers. The level of significance, alpha, is stated to be 0.01, and the desired power of the test is 0.95. How many calves should be included in the study? Solution: From the given information, the required sample size can be computed using the following as:
64
65
Continuity correction
66
Example
The case-fatality rate among cancer patients undergoing standard therapy is 0.90, and is 0.70 for cancer patients receiving a new treatment. Find the required sample size to test a hypothesis that the case-fatality rate differed between groups at the stated level of significance, alpha = 0.05, and desired power of the test, 0.90. Assume that the multiplicative factor is 2. Solution: From the given information the required sample size can be computed as:
Design effects
The loss of effectiveness by the use of cluster sampling instead of simple random sampling is design effect. The design effect is basically the ratio of actual variance under the sampling method actually used, to the variance computed under the assumption of simple random sampling Working definition of design effect is that factor by how much sample variance for the sample plan exceeds simple random sample of same size. How much worse your sample is from a simple random sample
Formula
Two correlation (within and between clusters) Measures of homogeneity within cluster (intra-class correlation) Intra-class correlation is the degree to which person or hh in the same cluster has same characteristics compared to another selected at random in the whole population Hence deff is affected by cluster size and intra-class correlation Where =intra class correlation Rule of thumb is try deff of 2 or less Sample size clustered = Sample size unclustered deff.
Example
Cluster size used gave an ICC of approximately 0.015. Using this ICC as an approximation, and with a chosen cluster size of 80 this gives us a design effect of 2.11. Hence the sample size will be given by Sample size clustered = Sample size unclustered 2.11.
72
Start page
73
74
75
76
77
1. The main objective of sampling is to: a. get representative sample b. compute summary measure from the sample c. draw conclusion about the population d. get information about the sample we selected 2. . The population in which we can get access to select the sample is known as: a. target pop b. theoretical pop. C. study pop. D. study subject 3 . The error that will come due to bad luck is known as non-sampling error a. True b. False 4. If the cases are too rare, which sampling techniques is advisable to be used? a. systematic b. quota c. snowball d. cluster
78
5. Assume the population that we want to conduct a research is on patients who are following ART in Gondar, moreover these patients have similar characteristics with respect to the study variable. In order to select a sample of patients from all 5000 patients on follow up, which sampling technique is more appropriate? a. simple random b. systematic c. stratified d. cluster
79