Determination of Sample Size and Samplin
Determination of Sample Size and Samplin
003
Proceedings on Engineering
Sciences
www.pesjournal.net
T S Nanjundeswaraswamy1
Shilpa Divakar
ABSTRACT
Keywords:
Sample size; Sampling Method, Sample size, sampling method and sampling technique plays a major role in
Health science; Agricultural science, social sciences, business, health science, agricultural science research and
survey research. survey research. If the sample size is inappropriate it may lead to wrong
inferences on the population. Precise sampling techniques are used for
specific research problems because one technique may not be appropriate for
all problems. Significant factors considered for the estimation of sample size
were population size, confidence level proportion of outcome in case of
categorical variables, standard deviation of the outcomes in case of ordinal
variables, essential precision from the research. The present paper gives an
outline of commonly used techniques in the research to find out the sample
size precisely. Probability sampling techniques are more suitable for health
science related research.
© 2021 Published by Faculty of Engineeringg
1
Corresponding author: T S Nanjundeswaraswamy
Email: [email protected] 25
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
Rao (2012) argued that sample size is a significant part 3.1.3 Cluster Sampling
of the design of both, analytic and descriptive studies.
Research also concluded that to achieve the desired aim This method is also known as area sampling. In this
in research studies concerned with establishing a method entire population to be analyzed is divided into
difference between groups or in those conducted to smaller groups called as clusters. In this cluster samples
estimate a quantity, appropriate sample size planning is offers more heterogeneity within the groups and
mandatory. homogeneity among groups. It is most suitable for
market research and agricultural research and it is not
The present paper is an attempt to give an overview of suitable for medical research.
commonly used techniques for sample size calculation
for social sciences, business, health science, agricultural 3.2 Non Probability sampling
science research and survey research.
In this method following four techniques are preferred
2. SAMPLING for accurate un biased sampling.
Brown (2006) defined population as "the entire group of 3.2.1 Judgement sampling
people that a particular study is interested in”, sample is
the part or portion of the population. The process of This method is preferred when a limited number of
selecting a sample from a population is known as respondents have the information that needed. This
sampling. The sample is the representative of the method involves the selection of sample respondents
population. Population statistical characteristics are who are in the best position to provide the required
inferred based on the sample statistics. If any error information. The validity of the result is depends on the
occurred in the sampling methods it directly effect on proper judgment of the investigator in selecting the
the inferences drawn on the population. That is sample sample.
size is one of the prominent factors effects on the result
and inferences of the whole research. 3.2.1.2 Convenience sampling
3. TYPES OF SAMPLING METHOD In this method samples are selected at the convenience
of the investigator. This method is very easy for data
Sampling methods are broadly classified into collection on a particular issue. Probability of
probability sampling methods or random sampling and misinterpreting about the population is more. Marshall
non probability sampling method or non random et al. (2013) argued that most of the present researches
sampling method. are not taken care about the sampling method and
sample size because of this researchers are fail to draw
3.1 Probability sampling method accurate and reliable inferences about the population.
326
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
to consider for appropriateness they are required a level Cohen (1988) argued that sample size is one of the
of precision, level of confidence or risk and degree of major factor effects on the statistical precision.
variability in attributes to be measured.
4.2 Confidence / Risk Level
Rao (2012) defined that the sample size is an estimation
of the number of subjects required to detect an Confidence/risk level is the degree to which an
association of a given effect size and variability, at a assumption or number is likely to be true. That is the
specified likelihood of making Type I (false-positive) probability that a random variable lies within the
and Type II (false negative) errors. Browner et al., confidence interval of an estimate. This confidence level
(1988) proved that Type Ι error is very serious as is based on the central limit theorem. According to
compared to Type ΙΙ error. central limit theorem when repeatedly draw the sample
from the population, the average of an attribute like
4.1 Level of statistical precision / Sampling mean obtained from that sample are equal to the true
Error population attribute. And the attributes obtained from
those samples are normally distributed about the true
Cohen (1988) defined the statistical precision as "the value.
closeness between the calculated value and relevant
population value”. Thompson (2006) suggested that 4.2.1 Interval estimation for single mean
statistical precision is normally estimated by standard
error in two ways one is descriptively and other is Confidence interval = Point estimator ± [( Z critical
inferentially in his book foundation of behavioral Value) * (Standard Error)]
statistics: An insight based approach.
𝜎
Confidence Interval = ̅
X ± 𝑍𝛼 ∗ (5)
√𝑛
Descriptively, precision can be estimate using standard
error that is the difference between sample estimate and 𝜎
Upper Confidence Limit UCL = ̅
X + 𝑍𝛼 ∗ (6)
population parameters. √𝑛
𝜎
s2 Lower Confidence Limit LCI = ̅
X − 𝑍𝛼 ∗ (7)
SEM = (3) √𝑛
√n
̅
X = Sample Mean
SEM = Standard error of the mean
S = Standard deviation 𝑍𝛼 = Represents the preferred level of statistical
N = sample size significance
σ = Sample standard deviation
Inferentially standard error is commonly used in n = sample size
estimating the significance differences between or
among parameter estimates. 4.2.2 Interval estimation for Differences of Two
mean
MT −MC
t= (4)
S2 S2 Confidence interval = Differences between the two
√ T+ C
nT nC means ± [(Z critical Value) * (Differences
between the two Standard Error)]
MT = Mean of the treatment group
MC = Mean of control group 𝜎12 𝜎22
Confidence Interval = (x̅1 − x̅2 ) ± 𝑍𝛼 ∗ √ + (8)
ST = Standard deviation of Treatment group 𝑛1 𝑛2
SC = Standard deviation of controlled group
nT = Sample size of Treatment group x̅1 = Sample mean of Group 1
nc = Sample size of Controlled group x̅2 = Sample mean of group 2
t = t test statistics 𝑍𝛼 = Represents the preferred level of statistical
significance
Level of precision is the range in which the true value of σ1 = Standard deviation of group 1
the population is to be estimated; this range is expressed σ2 = Standard deviation of group 1
in percentage points like ±5. This level of precision or n1 = Sample size of group 1
error based on the type of the research, the researcher n2 = Sample size of group 2
has to define. In general for political polling research
level of precision is consider as ±10%, for market 4.3 Degree of Variability
research ±5%, and for manufacturing, medical research
±1%. The degree of variability is the distribution of attributes
in the population. For more heterogeneous population
larger sample size is required for given precision level,
27
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
at the same time for homogenous population small of the estimate and is based on the variability of the
sample size is sufficient to meet given precision level. estimate. Let E denote the margin of error.
The sample size is important because it effects on the Calculate the minimum sample size required to verify
statistical power at the same time statistical power this if allowable error at some % risk, for 1 % risk E=1,
influence on the statistical test significance. Browner for 2% of risk E= 2 it depends on the accuracy level
and Newman (1978) argued that sensitivity of the test required for the research.
depends on the statistical power. Many types of research
provide evidence that an adequate sample size gives a Z2 ∗ σ 2
n= (10)
statistical test enough power. Moher et al. (1994) and E2
In census method consider the entire population as the Which is valid where n0 is the sample size, Z2 is the area
sample; this method is suitable only when population under the acceptance region in a normal distribution (1
size is very small otherwise cost associated with this – α), e is the preferred level of precision, p is the
method is more. This method is very suitable for estimated proportion of an attribute that is present in the
medical research because of its accurate preciseness. population, and q is 1- p.
If the researcher doing research in same field or domain 5.2.2 Modified Cochran Formula for Small
and literature are available, replicate the sample size of Populations
similar studies. The disadvantage of this method is the
same error will carry forward from the previous If the population is small then the sample size can be
research what we consider for sample size reduced slightly. This is because a given sample size
determination. provides proportionately more information for a small
population than for a large population.
A third method is determined to sample size is based on
published tables which provide the sample size for The sample size (n0) can be adjusted as
predefined criteria. n0
n= (n0 −1) (12)
[1+{ }]
Using formulas for a different combination of levels of N
328
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
n = Sample size in the case group, N
n = [1+N∗e2] (18)
r =ratio of controls to cases,
̅)(1 − ̅
(P P)= measure of variability
Where n is the sample size, N is the population size, and
Zβ = required power,
e is the level of precision.
Zα = required a level of statistical significance, (P1 −
2
P2 ) = deference in proportion and 5.8 Rao
̅ = P1+P2
P Rao (1985) presented some another calculation for
2
(14) sample size under different circumstances in a simple
manner.
5.4 For Observational Studies sample size for a
When it is a field survey to estimate the prevalence rate
case-control study under continuous exposure of specific event or cases
Use difference in means formula
4∗p∗q
2 n= [L2 ]
(19)
σ2 (Zβ −Zα )
r+1
n=( )∗ ̅ 1 −X
(X
2
̅ 2 )2
(15)
r Where n is the required sample size, p is the
approximate prevalence rate for which the survey is to
n = Sample size in the case group, be conducted. The knowledge of this is to be obtained
r =ratio of controls to cases, from previous surveys or from a pilot survey. q = 1 – p
σ = Standard deviation, and L is the permissible error in the estimate.
Zβ = required power,
When conducting research investigation on quantitative
= required level of statistical significance,
Zα data, the sample size is calculated by the given formula
2
̅1 − ̅
(X X 2 ) = difference in the mean.
t2
∝ ∗s
2
n= (20)
𝛆𝟐
5.5 Sample size if unequal numbers in each
group Where n is the preferred sample size, s is the standard
deviation of observations, ε is permissible in the
σ2 +σ2 estimate of mean and tα is the value of at 5% level of
(Zβ +Zα )( 1 2 )
λ
n1 = 2
(16) significance.
δ2
29
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
When p is unknown, generally it is best to set it at 0 .5, All the parameters in the above equation are in fact the
d is the margin of error – Bartlett et al. (2001) degrees of freedom hence; their numbers are subtracted
recommend using 5%. by one before incorporation into the equation.
If the estimate 𝑛0 is greater than 5% of the overall 5.13 One-sample t-test and Paired t-test
population, make the following correction
n0
For testing the hypothesis: H0: µ = k vs. H1: µ≠ k with a
n1 = n (23) two-tailed test, the formula is:
[1+ 0 ]
N
2
𝑛1 is the adjusted minimum estimated sample size, (Zβ +Zα )σ
𝑛0 is the minimum estimated sample size, t is the value m = Size of the cluster
of the t-distribution corresponding to the chosen alpha σ2A = Variance between the clusters
level for .05 this is 1.96, is the estimate of standard σ2W = Variance within cluster
deviation, d is the margin of error. ρ = Intra cluster correlation coefficient
330
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
H0: µ1 = µ2 is given by such case probability assessment method is most
suitable and accurate to determine the sample size. It is
2
assumed that mean of the two identical and independent
(Zα + Zβ ) (2σ2 ) [1+(m−1)ρ]
n= 2
(32) variables distributed as Bernoulli random variables.
(µ1 −µ2 )2
Z2
β [P1 (1−P1 )+P2 (1−P2 )]
µ1 = Mean of the intervention group n= (P2 −P1 )2
(34)
µ2 = Mean of the control group
Zβ = Represents the preferred power (typically .84 for
5.14.4 Comparison of means of unequal cluster size 80% power)
P1 = Mean of group 1
The number of subjects required per intervention group P2 = Mean of group 2
to test the hypothesis
H0: µ1 = µ2 is given by 6. CONCLUSION
2
(Zα + Zβ ) (2σ2 ) [1+(mmax −1)ρ] Sampling method and sample size in research play a
n= 2
(µ1 −µ2 )2
(33) vital role in research. Based on the collected sample
data, researcher has to draw the inference on the
µ1 = Mean of the intervention group population, if the sample itself is insufficient the
µ2 = Mean of the control group inference will lead misinterpretation about population,
at the same time if the sample size is too big, it leads to
5.15 Determination of sample size based on excessive utilization of resources like manpower, time,
cost etc. the present paper is an attempt to suggest the
Probability Assessment method
generalized sample size technique for social sciences,
business, health science, agricultural science research
Sample size determination based on power analysis in and survey research.
clinical research for detecting small differences in the
incidence rate of rare events may not be appropriate; in
References:
Barlett, J. E., Kotrlik, J. W., & Higgins, C. C. (2001). Organizational research: Determining appropriate sample size in
survey research. Information technology, learning, and performance journal, 19(1), 43.
Brown, J. D. (2006). Statistics Corner. Questions and answers about language testing statistics: Generalizability from
second language research samples. Shiken: JALT Testing & Evaluation SIG Newsletter, 10(2), 24-27.
Browner, W. S., & Newman, T. B. (1987). Are all significant P values created equal? Jama, 257(2459), 63.
Browner, W. S., Newman, T. B., Cummings, S. R., & Hully, S. R. (1988). Getting ready to estimate sample size:
hypotheses and underlying principles. Designing clinical research, 2, 51-63.
Chow, S. C., Shao, J., Wang, H., & Lokhnygina, Y. (2017). Sample size calculations in clinical research. CRC Press.
Cochran, W. G. (1963). Sampling Techniques, 2nd Ed., New York: John Wiley and Sons, Inc.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. 2nd ed. Academic press
Dell, R. B., Holleran, S., & Ramakrishnan, R. (2002). Sample size determination. ILAR journal, 43(4), 207-213.
Dupont, W. D., & Plummer, W. D. (1990). Power and sample size calculations: a review and computer
program. Controlled clinical trials, 11(2), 116-128.
Evans, M.; Hastings, N.; and Peacock, B. Statistical Distributions, 3rd ed. New York: Wiley, 2000.
Freiman, J. A., Chalmers, T. C., Smith Jr, H., & Kuebler, R. R. (1978). The importance of beta, the type II error and
sample size in the design and interpretation of the randomized control trial: Survey of 71 negative trials. New
England Journal of Medicine, 299(13), 690-694.
Gupta, S. C., & Kapoor, V. K. (1970). Fundamental of mathematical statistics. SC Publication, New Delhi, India.
Hubrecht, R. C., & Kirkwood, J. (Eds.). (2010). The UFAW handbook on the care and management of laboratory and
other research animals. John Wiley & Sons.
Israel, Glenn D. (1992). Sampling the Evidence of Extension Program Impact. Program Evaluation and Organizational
Development, IFAS, University of Florida. PEOD-5. October.
Kish, L. (1965). Survey Sampling. New York: John Wiley and Sons, Inc. p. 78-94.
Lehr, W., Fraga, R. J., Belen, M. S., & Cekirge, H. M. (1984). A new technique to estimate initial spill size using a
modified Fay-type spreading formula. Marine Pollution Bulletin, 15(9), 326-329.
31
Nanjundeswaraswamy and Divakar, Proceedings on Engineering Sciences, Vol. 03, No. 1 (2021) 25-32,
doi: 10.24874/PES03.01.003
Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does sample size matter in qualitative research?: A review
of qualitative interviews in IS research. Journal of Computer Information Systems, 54(1), 11-22.
Miaoulis, G., & Michener, R. D. (1976). An Introduction to Sampling. Dubuque, Iowa: Kendall/Hunt Publishing
Company.
Moher, D., Dulberg, C. S., & Wells, G. A. (1994). Statistical power, sample size, and their reporting in randomized
controlled trials. Jama, 272(2), 122-124.
Myers, J. L., Well, A. D., & Lorch Jr, R. F. (2013). Research design and statistical analysis. Routledge.
Rao, N. S. N. (1985). Elements of Health Statsitics, First edition. R. Publication, Varanasi, India.
Rao, U. K. (2012). Concepts in sample size determination. Indian Journal of Dental Research, 23(5), 660.
Sathian, B., Sreedharan, J., Baboo, S. N., Sharan, K., Abhilash, E. S., & Rajesh, E. (2010). Relevance of sample size
determination in medical research. Nepal Journal of Epidemiology, 1(1), 4-10.
Singh, A. S., & Masuku, M. B. (2014). Sampling techniques & determination of sample size in applied statistics
research: An overview. International Journal of Economics, Commerce and Management, 2(11), 1-22.
Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach. Guilford Press.
Yamane, Taro. (1967). Statistics: An Introductory Analysis, 2nd Ed.
32