PC2 Sampling by Praveen Mathur
PC2 Sampling by Praveen Mathur
Praveen Mathur
B.Tech, MBA, JRF-UOR
Introduction
• Sampling is the selection of observations to
acquire some knowledge of a statistical
universe (population).
statistics
Cont…
• In an simple random sample where we had
sampled 100 units out of 1000, suppose we
had a Rs 5,000 total overpayment from the
sample
• The Mean Total Overpayment would then be:
5000
op Nyt 1000 Rs50,000
100
Statistical Sampling in Audit
• It is more objective.
• Readily available
Disadvantages of Non-Statistical
Sampling
• Cannot draw objectively valid statistical inferences from the
sample results
Sampling error
Non Sampling Errors Errors
Biasness In Sampling
Basis of Sampling on Two
Important Laws
1. Law of Statistical Regularity
It lays down that a moderately large number of items chosen at
random from a large group are almost sure on the average to
possess the characteristics of the large group that means if a
sample is taken at random from population, it is likely to possess
almost the same characteristics as that of the population.
likely that we may not get exactly 5 heads and 5 tails. The
is carried out 1,000 times the chance of 500 heads and 500
tails would be very high, i.e., the result would be very near to
Stratified sampling is one, in which the population is Cluster sampling refers to a sampling method wherein the
Meaning divided into homogeneous segments, and then the members of the population are selected at random, from
sample is randomly taken from the segments. naturally occurring groups called 'cluster'.
Randomly selected individuals are taken from all the All the individuals are taken from randomly selected
Sample
strata. clusters.
Selection of population
Individually Collectively
elements
Objective To increase precision and representation. To reduce cost and improve efficiency.
Stratified Sampling vs Cluster Sampling
Systematic sampling
• After randomly selecting a starting point in the population
between 1 and n, every nth unit is selected, where n equals
the population size divided by the sample size.
Uses
• Easier to extract the sample than simple random.
• Ensures cases are spread across the population
Limitation
• Can be costly and time consuming if the sample is
not conveniently located.
• Can’t be used where there is periodicity in the
population
Discovery sampling
A sample size is selected for discovery sampling. If the sample is
error free, then the entire data set or population is accepted as
error free. However, even if only a single error is discovered,
then the entire population is rejected.
The auditor is not as much interested in determining how many
errors there are in the population as in the fact that there was
an error. The auditor's concern would be that there is a
possibility that the internal control system could be
compromised.
The one error may be sufficient to require an examination of the
entire population or to formulate a new plan for further action.
Attribute sampling
• Attribute sampling means that an item being sampled either will or won’t
possess certain qualities, or attributes. An auditor selects a certain number of
records to estimate how many times a certain feature will show up in a
population. When using attribute sampling, the sampling unit is a single record
or document. Auditors typically use attribute sampling to test internal controls.
• An example of an attribute sampling feature may be that per the client’s
internal control procedures, all purchases over $50 are supposed to be
authorized by a purchase order. So every purchase over $50 either will or
won’t be authorized by a purchase order.
• Here’s how you’d use attribute sampling to see whether the client’s internal
control is working properly: Your population consists of all vendor invoices for
purchases over $50, and the number of records you sample from that
population is set at 75 records. Looking through your sample, you see that 3 of
the 75 records aren’t supported by a purchase order. That gives you a
population error rate of 4 percent (3/75).
variables sampling
• A sampling plan that uses the original measurements is
a variables sampling plan. The variables plan decision
rule will reject if the sample average of the
measurements goes outside of some calculated
acceptable range. Typically, the limits of that acceptable
range are calculated with the normal distribution.
• Variables data contains more information than attribute
data per data-point.
• Data points are measurements on a numerical scale you
have variables data -- like weight, diameter, tensile
strength,
Monetary Unit Sampling
• Monetary unit sampling allows you to select
and analyse a small subset of the records in an
account, and based on the result estimate the
total amount of monetary misstatement in the
account.
• You can then compare the estimated
misstatement to the misstatement amount
that you judge, and make a determination
regarding the account.
Cont…
• Auditors use monetary unit sampling, to determine
the accuracy of financial accounts. With monetary
unit sampling, each dollar in a transaction is a
separate sampling unit. A transaction for $40, for
example, contains 40 sampling units. Auditors usually
use monetary unit sampling to sample and test
accounts receivable, loans receivable, and inventory.
• Define the sampling unit an individual dollar in an
account balance.
• Auditor will select individual Dollar
The monetary unit sampling process
general steps
• Calculate the valid sample size for a monetary unit
sampling.
• Choose a sample selection method
• Draw the Sample of records
• Perform your intended audit procedures on the
sampled data.
• Evaluate whether the observed levels of monetary
misstatement in the sampled data represent an
acceptable or unacceptable amount of misstatement
in the account as a whole
Understanding MUS using Example
• The audit client’s accounts receivable book value is
Rs 300,000, and the sample size is set at 96
records.
• Figure the sampling interval by dividing book value
by sample size (300,000/96)=3125.
• Arrange the client’s accounts receivable in an
ordered list using some sort of ordering sequence.
For example, you can arrange them alphabetically
by customer name or numerically by customer
number.
Cont…
• Pick a random number between 1 and 3,125.
• For this method to work correctly, the random
number has to be less than the sampling interval and
greater than the smallest sampling unit.
• Auditors usually use a random-number-generator
computer program to pick the random number. The
sampling unit and sampling interval limits are
programmed into the software before the task is run.
In this case, say the software selects the random
number 556.
Monetary Unit Sampling Table
Customer Name Customer Balance Cumulative Balance Sampling Item
• It looks like there may be a problem with the company's credit policy,
collections policy, or both. The owner needs to re-evaluate the credit and
collections policy and see if the policies need to be tightened up. Perhaps
they are offering credit to marginal credit customers and that needs to be
stopped. Perhaps they are not collecting aggressively enough.
Use of Aging Report in Audit
• An accounts receivable aging report is needed during
an audit to determine whether the company's
accounts receivable balance is properly valued.
• It is used as a gauge to determine the financial
health of a company's customers. If the accounts
receivable aging shows a company's receivables are
being collected much slower than normal, this is a
warning sign that business may be slowing down or
that the company is taking greater credit risk in its
sales practices
Scatter diagram
• The scatter diagram graphs pairs of numerical data,
with one variable on each axis, to look for a
relationship between them. If the variables are
correlated, the points will fall along a line or curve. The
better the correlation, the tighter the points will hug
the line.
• Eg. Variable A is the number of employees trained on
new software, and variable B is the number of calls to
the computer help line. You suspect that more training
reduces the number of calls. Plot number of people
trained versus number of calls.
Correlation and Regression
• Correlation and Regression are the two analysis based
on multivariate distribution. A multivariate distribution
is described as a distribution of multiple variables.
• Correlation is described as the analysis which lets us
know the association or the absence of the relationship
between two variables ‘x’ and ‘y’.
• Regression analysis, predicts the value of the
dependent variable based on the known value of the
independent variable, assuming that average
mathematical relationship between two or more
variables.
Correlation and Regression
Basis for
Correlation Regression
Comparison
Regression describes how
Correlation is a statistical measure
an independent variable is
Meaning which determines co-relationship
numerically related to the
or association of two variables.
dependent variable.
To fit a best line and
To represent linear relationship estimate one variable on
Usage
between two variables. the basis of another
variable.
Dependent and
Independent No difference Both variables are different.
variables
Regression indicates the
Correlation coefficient indicates
impact of a unit change in
Indicates the extent to which two variables
the known variable (x) on
move together.
the estimated variable (y).
To estimate values of
To find a numerical value
random variable on the
Objective expressing the relationship
basis of the values of fixed
between variables.
variable.
Correlation Coefficient
• Statistic showing the degree of relation
between two variables.
• It is also called Pearson's correlation or
product moment correlation coefficient.
• It measures the nature and strength
between two variables of the quantitative
type.
Cont…
• If the sign is +ve this means the relation is
direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one variable
is associated with a
decrease in the other variable).
• While if the sign is -ve this means an inverse or
indirect relationship (which means an increase
in one variable is associated with a decrease in
the other).
• The value of r ranges between ( -1) and ( +1)
Cont…
• The value of r denotes the strength of the
association as illustrated.
• If r = Zero this means no association or
correlation between the two variables.
• If 0 < r < 0.25 = weak correlation.
• If 0.25 ≤ r < 0.75 = intermediate correlation.
• If 0.75 ≤ r < 1 = strong correlation.
• If r = l = perfect correlation
Formula for Correlation coffeciant
xy x y
r n
( x) 2
( y)
2
x
2 . y
2
n n
Regression Analyses
• It is the technique concerned with
predicting some variables by knowing
others.
• The process of predicting variable Y using
variable X
• Tells you how values in y change as a
function of changes in values of x
Cont…
• Regression tells us how to draw the
straight line described by the correlation
220
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
Cont…
• Regression equation describes the regression
line mathematically
– Intercept
– Slope ŷ a bX
x y
ŷ y b(x x) xy n
b1
( x) 2
x 2
n
Y
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Normalization
• In statistics and applications of statistics, normalization can
have a range of meanings. In the simplest cases, normalization
of ratings means adjusting values measured on different scales
to a notionally common scale, often prior to averaging.
• In another usage in statistics, normalization refers to the
creation of shifted and scaled versions of statistics, where the
intention is that these normalized values allow the comparison
of corresponding normalized values for different datasets in a
way that eliminates the effects of certain gross influences, as in
an anomaly time series.
• Some types of normalization involve only a rescaling, to arrive
at values relative to some size variable.
THANK YOU