0% found this document useful (0 votes)
18 views83 pages

PC2 Sampling by Praveen Mathur

Uploaded by

jyoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views83 pages

PC2 Sampling by Praveen Mathur

Uploaded by

jyoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 83

Sampling

Praveen Mathur
B.Tech, MBA, JRF-UOR
Introduction
• Sampling is the selection of observations to
acquire some knowledge of a statistical
universe (population).

• From the characteristics of samples, we can


infer the characteristics of universes, if the
sample is representative of the universe.
Sampling Terminology
• Universe is an event or things of interest that the
researcher wishes to investigate.
• Samples are usually drawn by taking a subset of
sampling units from the total universe
• Sampling units are non-overlapping collection of
elements from the universe that cover the entire
universe
Approaches of Sampling
• Random Sampling (Based on Law of
probability or Statistical )
In this Approach all elements (e.g., persons,
households) in the population have some
opportunity of being included in the sample, and
the mathematical probability that any one of them
will be selected can be calculated.
Cont…
2. Non- Random Sampling (purposive or Judgment
or Non- Statistical)
In this type population elements are selected on the
basis of their availability (e.g., because they volunteered)
or because of the researcher's personal judgment that
they are representative. The consequence is that an
unknown portion of the population is excluded (e.g.,
those who did not volunteer).
The Main Phases of the Sampling
Process
• Both Statistical and Non-Statistical methods

1. Planning the sample


2. Selecting the sample
3. Performing the tests
4. Evaluating the results
Statistical Sampling
• Statistical Sampling is the type of Sampling in which
samples are derived using the underlying laws of
probability. In this
1. All items of the population have equal
chance of being selected in the sample.
2. There is no bias in the selection of items
of the sample.
3. Population is a homogeneous group.
Cont…
• Statistical sampling provides a measurable relationship between

the size of the sample and the degree of risk.

– The auditor can specify a definite degree of risk (assurance

level) using statistical sampling

• Ideally, we would like gather information from the sample and

then estimate that value for the entire universe

• These estimates calculated from the sample data are called

statistics
Cont…
• In an simple random sample where we had
sampled 100 units out of 1000, suppose we
had a Rs 5,000 total overpayment from the
sample
• The Mean Total Overpayment would then be:

 5000 
 op  Nyt  1000   Rs50,000
 100 
Statistical Sampling in Audit

• The theory of statistical sampling is used in a large


number of situations where a characteristic of a
large mass of data is to be evaluated.
• In auditing, the auditor has to form his opinion
about a large mass of data. Therefore, it is possible
to apply Statistical sampling techniques in auditing.
Cont…
• It enable the auditor to form certain conclusions
about that class or balance as a whole.

• An auditor can apply sampling in carrying out both


compliance procedures (to evaluate the effectiveness
of the internal control system) and substantive
procedures (to obtain evidence regarding the
completeness, accuracy and validity of the data).
Cont…
• For sampling to be effective, population should be more
or less homogenous. But in auditing situations (as in
many cases), population can seldom be homogenous in all
respects.
• To make sampling more efficient, the total population in
such a situation is divided into several sub-populations
(each sub-population is called 'stratum') each of which is,
in itself, more homogenous in nature.
Cont…
• each of which is, in itself, more homogenous in
nature, size, importance or other characteristics
than the population as a whole.
• A sample is then selected out of each stratum.
• This process is called “Stratification”
Advantages of Statistical Sampling
• It enables Qualitative Characteristics of the sample
results.

• It Provides a more defensible expression of the test


results.

• It is more objective.

• It estimates the Quantitative Evaluation of a Population

• Ensures units from each main group are included and


may therefore be more reliably representative.
Disadvantages of Statistical
Sampling
• Requires random sample selection which may be more
costly and time consuming.

• Might require additional training costs for staff members to


use statistics or specialized software

• Need complete and accurate population listing.

• May not be practicable, if a country-wide sample would


involve lots of audit visits.
Advantages of Non-Statistical
Sampling
• Allows the auditor to inject his or her subjective judgment
in determining the sample size

• May be designed so that it is equally effective and efficient


as statistical sampling while being less costly

• If there is no sampling frame it may be the only way


forward.

• Readily available
Disadvantages of Non-Statistical
Sampling
• Cannot draw objectively valid statistical inferences from the
sample results

• Cannot quantitatively measure and express sampling risk.

• May be prone to volunteer bias.

• Sample results cannot be extrapolated to give population


results.

• Estimates of the sampling error and confidence limits


probably can’t be calculated.
Error’s In Sampling
Margin of error or precision
The measure of the possible difference between the sample
estimate and the actual population value.
• It is the way of expressing the sampling error in a survey’s
results
• The larger the margin of error, the less faith one should
have that the poll's reported results are close to the "true"
figures, that is, the figures for the whole population
Cont…
• If the margin of error overlaps, it means the
results are too close to call for the population
as whole
– Think of election polls: if the survey results say 52%
favor and 48% favor Y, with a +/-5% margin of error,
the race is too close to call. It is just as probably
that 48% favor X and 52% favor Y
Cont…
Confidence level or Level Of Significance
How certain you want to be that the population figure is
within the sample estimate and its associated precision.

We normally use 99% or 95% confidence to provide


forceful conclusions, however, if you are only seeking an
indication of likely population value a lower level such as
90 % is acceptable.
Types of Errors
• Sampling Error (random error)
• Non-Sampling Error
• Systematic Error or bias
Sampling Error (random error)

• Difference between survey result and population


value due to random selection of sample
• Influenced by:
• Sample size
• Sampling scheme
• Unlike non sampling errors and sampling bias, it
can be predicted, calculated, and accounted for.
Cont…
• Measures of sampling error:
• Confidence limits
• Standard error
• Coefficient of variance
• P values
• Others
• Use these measures to:
• Calculate sample size prior to sampling
• Determine how sure we are of result after
analysis
Cont…
• Confidence level -The certainty with which the
estimate lies within the margin of error.
• Margin of error- A measure of the difference
between the estimate from the sample and
the population value.
Biasness in Sampling or Systematic Errors

• Biasness in sampling occur because we either sampled


the wrong people or get the wrong data. Biasness
occurs even if we randomly select the sample.
• Inaccurate response (information bias): the likelihood
of getting biased information in other words
information which is not represent the whole
population.
• Selection bias: the likelihood that of not sampled
people with equal probability and you have not
accounted for this in your analysis.
Causes of Biasness in Survey
• Wrong target survey is usually a result of not using the correct
procedures to choose your participants. For example, if you have
a snail mail survey for young adults or a Smartphone survey for
older adults; both these scenarios are likely to lead to a lower
response rate for your targeted population.
• Non-response is a type of bias that happens when some people
fail to respond to a survey. People may refuse to answer, or lack
the time or inclination to answer Non-response bias can also
become a factor if you haven’t constructed your survey properly.
• Under coverage is when your respondents aren’t from the
population you hoped for.
Cont…
• Voluntary response bias: Some surveys — like call in radio
shows — tend to attract very opinionated people. These
types of voluntary responses lead to an under-
representation of the general population in favour of
strong opinions.
• Volunteer Bias: This crops up frequently in clinical trials;
the people who volunteer for the trials may not represent
the population you are trying to target. For example, if
your study is for a new drug to treat diabetes and you offer
significant compensation, people with low socio-economic
background may make up the bulk of volunteers.
Non Sampling Error
• Non-sampling error are the errors which is caused by
factors other than those related to sample selection. It
refers to the presence of any factor, whether systemic
or random, that results in the data values not
accurately reflecting the 'true' value for the
population.
• Non-sampling errors cover all other discrepancies,
including those that arise from a poor sampling
technique.
Types Of Non Sampling Errors
• Specification errors: These errors occur at planning stage due to
various reasons, e.g., inadequate and inconsistent specification of
data with respect to the objectives of surveys/census, omission or
duplication of units due to imprecise definitions, faulty method of
enumeration/interview/ambiguous schedules etc.
• Ascertainment errors: These errors occur at field stage due to
various reasons e.g., lack of trained and experienced
investigations, recall errors and other type of errors in data
collection, lack of adequate inspection and lack of supervision of
primary staff etc.
• Tabulation errors: These errors occur at tabulation stage due to
various reasons, e.g., inadequate scrutiny of data, errors in
processing the data, errors in publishing the tabulated results,
graphs etc.
Reasons for Non Sampling Error
• Questions might have been written poorly.
• Surveys did not go to the people best able to answer the questions
– Eg. The survey was intended to be completed by executive directors but
was completed by their assistants.
• The methods of interview and observation collection may be
inaccurate or inappropriate.
• The questionnaire, definitions and instructions may be ambiguous.
• The investigators may be inexperienced or not trained properly.
• The scrutiny of data is not adequate.
• The coding, tabulation etc. of the data may be erroneous.
• There can be errors in presenting and printing the tabulated
results, graphs etc.
Bias and sampling error

Sampling error
Non Sampling Errors Errors
Biasness In Sampling
Basis of Sampling on Two
Important Laws
1. Law of Statistical Regularity
It lays down that a moderately large number of items chosen at
random from a large group are almost sure on the average to
possess the characteristics of the large group that means if a
sample is taken at random from population, it is likely to possess
almost the same characteristics as that of the population.

e.g. Average Height


Cont…
2. Law of Inertia of Large Numbers
• This law is corollary of the law of statistical
regularity. If other things equal, larger the size
of the sample, more accurate results are likely.
• This is because that large numbers are more
stable as compared to small ones
Cont…
• For Eg. ,if a coin is tossed 10 times we should expect equal

number of heads and tails, i.e., 5 each.

• But if the experiment is tried a small number of times it is

likely that we may not get exactly 5 heads and 5 tails. The

result may be a combination of 9 heads and 1 tail, or 8 heads

and 2 tails, or 7 heads and 3 tails. But when same experiment

is carried out 1,000 times the chance of 500 heads and 500

tails would be very high, i.e., the result would be very near to

50% heads and 50% tails.


Cont…
• The basic reason for such likelihood is that the
experiment has been carried out sufficiently large
number of times and possibility of variation in one
direction compensating others in a different direction is
greater. If at one time we get continuously 5 heads, it is
likely that at other time we may get continuously 5 tails,
and so on, and for the experiment as a whole the
number of heads and tails may be more or less equal.
DIFFERENT TYPES OF
SAMPLING METHODS
Cluster sampling
• Units in the population can often be found in
geographical groups or clusters eg. schools,
households etc.
• A random sample of clusters is taken, then all
units within those clusters are examined.
Uses
• Quicker, easier and cheaper than other forms
of random sampling.
• Does not require complete population
information.
• Useful for face-to-face interviews.
• Works best when each cluster can be regarded
as a microcosm of the population.
Limitation
• Larger sampling error than other forms of
random sampling.
• If clusters are not small it can become
expensive.
• A larger sample size may be needed to
compensate for greater sampling error
Convenience
sampling
• Using those who are willing to volunteer, or cases which are
presented to you as a sample.
• It is also known As “Chunk”.
Uses
• Readily available.
• The larger the group, the more information is
gathered.
Limitation
• Sample results cannot be extrapolated to give
population results.
• May be prone to volunteer bias.
Judgmental
sampling

• Based on deliberate choice and excludes any random process


Uses
• Normal application is for small samples from a population that
is well understood and there is a clear method for picking the
sample.
• It Is used to provide illustrative examples or case studies
Limitation
• It is prone to bias.
• The sample is small and can lead to credibility problems.
• Sample results cannot be extrapolated to give population
results
Multi-stage
sampling
• The sample is drawn in two or more stages (eg.
a selection of offices at the first stage and a
selection of claimants at the second stage).
Uses
• Usually the most efficient and practical way to
carry out large surveys of the public.
Limitation
• Complex calculations of the estimates and
associated precision.
Probability proportional to size
• Samples are drawn in proportion to their size giving a higher
chance of selection to the larger items (eg. the more
claimants at an office the higher the offices chance of
selection).
Uses
• Where you want each element (eg. claimants at an office) to
have a equal chance of selection rather than each sampling
unit (eg. offices).
Limitation
• Can be expensive to get the information to draw the sample.
• Only appropriate if you are interested in the elements
Quota sampling
• The aim is to obtain a sample that is representative of the population.
• The population is stratified by important variables and the required
quota is obtained from each stratum
Uses
• It is a quick way of obtaining a sample.
• It can be fairly cheap.
• If there is no sampling frame it may be the only way forward.
• Additional information may improve the credibility of the results
Limitation
• Not random so stronger possibility of bias.
• Good knowledge of population characteristics is essential.
• Estimates of the sampling error and confidence limits probably can’t be
calculated
Stratified sampling
• The population is sub-divided into homogenous groups, for
example regions, size or type of establishment.
• The strata can have equal sizes or you may wish a higher
proportion in certain strata.
Uses
• Ensures units from each main group are included and may
therefore be more reliably representative.
• Should reduce the error due to sampling.
Limitation
• Selecting the sample is more complex and requires good
population information.
• The estimates involve complex calculations
Stratified Sampling vs Cluster Sampling

Basis for Comparison Stratified Sampling Cluster Sampling

Stratified sampling is one, in which the population is Cluster sampling refers to a sampling method wherein the
Meaning divided into homogeneous segments, and then the members of the population are selected at random, from
sample is randomly taken from the segments. naturally occurring groups called 'cluster'.

Randomly selected individuals are taken from all the All the individuals are taken from randomly selected
Sample
strata. clusters.

Selection of population
Individually Collectively
elements

Homogeneity Within group Between groups

Heterogeneity Between groups Within group

Bifurcation Imposed by the researcher Naturally occurring groups

Objective To increase precision and representation. To reduce cost and improve efficiency.
Stratified Sampling vs Cluster Sampling
Systematic sampling
• After randomly selecting a starting point in the population
between 1 and n, every nth unit is selected, where n equals
the population size divided by the sample size.
Uses
• Easier to extract the sample than simple random.
• Ensures cases are spread across the population
Limitation
• Can be costly and time consuming if the sample is
not conveniently located.
• Can’t be used where there is periodicity in the
population
Discovery sampling
A sample size is selected for discovery sampling. If the sample is
error free, then the entire data set or population is accepted as
error free. However, even if only a single error is discovered,
then the entire population is rejected.
The auditor is not as much interested in determining how many
errors there are in the population as in the fact that there was
an error. The auditor's concern would be that there is a
possibility that the internal control system could be
compromised.
The one error may be sufficient to require an examination of the
entire population or to formulate a new plan for further action.
Attribute sampling
• Attribute sampling means that an item being sampled either will or won’t
possess certain qualities, or attributes. An auditor selects a certain number of
records to estimate how many times a certain feature will show up in a
population. When using attribute sampling, the sampling unit is a single record
or document. Auditors typically use attribute sampling to test internal controls.
• An example of an attribute sampling feature may be that per the client’s
internal control procedures, all purchases over $50 are supposed to be
authorized by a purchase order. So every purchase over $50 either will or
won’t be authorized by a purchase order.
• Here’s how you’d use attribute sampling to see whether the client’s internal
control is working properly: Your population consists of all vendor invoices for
purchases over $50, and the number of records you sample from that
population is set at 75 records. Looking through your sample, you see that 3 of
the 75 records aren’t supported by a purchase order. That gives you a
population error rate of 4 percent (3/75).
variables sampling
• A sampling plan that uses the original measurements is
a variables sampling plan. The variables plan decision
rule will reject if the sample average of the
measurements goes outside of some calculated
acceptable range. Typically, the limits of that acceptable
range are calculated with the normal distribution.
• Variables data contains more information than attribute
data per data-point.
• Data points are measurements on a numerical scale you
have variables data -- like weight, diameter, tensile
strength,
Monetary Unit Sampling
• Monetary unit sampling allows you to select
and analyse a small subset of the records in an
account, and based on the result estimate the
total amount of monetary misstatement in the
account.
• You can then compare the estimated
misstatement to the misstatement amount
that you judge, and make a determination
regarding the account.
Cont…
• Auditors use monetary unit sampling, to determine
the accuracy of financial accounts. With monetary
unit sampling, each dollar in a transaction is a
separate sampling unit. A transaction for $40, for
example, contains 40 sampling units. Auditors usually
use monetary unit sampling to sample and test
accounts receivable, loans receivable, and inventory.
• Define the sampling unit an individual dollar in an
account balance.
• Auditor will select individual Dollar
The monetary unit sampling process
general steps
• Calculate the valid sample size for a monetary unit
sampling.
• Choose a sample selection method
• Draw the Sample of records
• Perform your intended audit procedures on the
sampled data.
• Evaluate whether the observed levels of monetary
misstatement in the sampled data represent an
acceptable or unacceptable amount of misstatement
in the account as a whole
Understanding MUS using Example
• The audit client’s accounts receivable book value is
Rs 300,000, and the sample size is set at 96
records.
• Figure the sampling interval by dividing book value
by sample size (300,000/96)=3125.
• Arrange the client’s accounts receivable in an
ordered list using some sort of ordering sequence.
For example, you can arrange them alphabetically
by customer name or numerically by customer
number.
Cont…
• Pick a random number between 1 and 3,125.
• For this method to work correctly, the random
number has to be less than the sampling interval and
greater than the smallest sampling unit.
• Auditors usually use a random-number-generator
computer program to pick the random number. The
sampling unit and sampling interval limits are
programmed into the software before the task is run.
In this case, say the software selects the random
number 556.
Monetary Unit Sampling Table
Customer Name Customer Balance Cumulative Balance Sampling Item

ABC Electric $435 $435

Best Friend Cat Care $785 $1,220 (1) $556

Brandy’s Grill $1,510 $2,730

Buddy’s Gas Station $5,000 $7,730 (2) $3,681


Cont…
• First, pick the records to test: Take the alphabetically ordered list
shown in the Customer Name column, which lists every customer
balance by dollar amount, and count each dollar until you get to
Rs 556 (remember that the random number generator gives you
the number 556 in Step 3 in the previous numbered list). The
cumulative dollar amount for ABC Electric is under $556.
• That tells you that the first sampling item is Best Friend Cat Care,
which at a cumulative total of $1,220 is the first customer in the
list with a cumulative balance over $556. The client gives you the
Best Friend Cat Care file. You go through these ordered invoices
(usually the invoices are ordered by date) to find the invoice with
the 556th dollar. That invoice is your item to sample.
Cont…
• To select your next invoice to sample, add the sampling
interval of $3,125 to your random number of $556. This
equal $3,681, which is your next sampled item dollar
amount. Brandy’s Grill at $2,730 cumulatively is under
$3,681, so skip past Brandy’s to Buddy’s. Follow the same
procedure you use for Best Friends to find the Buddy
invoice with the 3,681st dollar.
• Although the table only goes as far as Buddy’s, your client
has many more customers. To pick the next sampling item,
add the sampling interval of $3,125 to your prior sampling
item of $3,681, which equals $6,806, and so on until you
reach the last name in the customer list.
Cont…
• When you’re sampling, you’re looking for
misstatements. If an invoice should have been
entered for $986, for example, and it was
entered as $896, the misstatement is 9
percent of the transaction (the inverse of
896/986). If the total misstatements exceed
your assessed tolerance level, you have to
decide whether to perform other procedures.
Aging Analysis
What is the meaning of Aging?

• In accounting the term aging is often associated with a


company's accounts receivable. Accounts receivable arise
when a company provides goods or services on credit.
For example, a company may allow its customers to pay
for goods or services 30 days after they are delivered. If
customers do not pay as agreed, the company could
experience a cash problem.
• In order for the company to minimize its cash flow
problems and potential losses from customers who are
unable to pay, companies will routinely prepare an aging
of accounts receivable.
Aging analysis Report
• The aging report will list each customer's outstanding balance
and will then sort the total amount into columns such as:
Current, 1-30 days past due, 31-60 days past due, 61-90 days
past due, 91-120 days past due, and 120+ days past due. The
aging of accounts receivable allows managers to quickly see
which customers are behind in meeting the agreed upon terms.
An aging is usually a standard feature of accounting software.
• To prepare an accounts receivable aging report, credit sales and
cash collections data is needed for each customer granted credit.
• Some companies also do an aging of accounts payable. This
aging sorts the accounts payable amounts by due dates.
Use of Aging Report
• An accounts receivable aging report is used in normal
company operations to provide information for:
− Evaluating current credit policies
− Determining appropriate credit limits for new
customers
−Deciding whether to increase or decrease the credit
limit for existing customers
−Estimating bad debts
−Initiating collection procedures for overdue
accounts
An Example of an Aging Schedule and How to Analyse it

• Here is an example of an accounts receivables aging schedule for a


hypothetical company. This company has $100,000 in accounts
receivable. They offer a discount if customers pay their bills within 10
days, which is the discount period. That's why you see the first line of
the aging schedule as 0-10 days. Looking at the table, you can see that
20% of the firm's customers take the offered cash discount.
• The credit period for this firm is 30 days, so the second line of the aging
schedule is 11-30 days. For this company, 40% of the customers pay
their bills during the credit period but after the discount period. This
means that 60% of the firm's customers pay their bills on time, a
combination of the customers that take the discount and those who
pay during the credit period. That's only a little over half of the firm's
customers and or most companies, this is not enough.
Age of Account Amount % Total Value of Receivables
0-10 days $20,000 20%
11-30 days 40,000 40%
31-60 days 20,000 20%
61-90 days 10,000 10%
Over 90 days 10,000 10%
$100,000 100%
• A full 40% of the company's customers are delinquent
with their payments. 20% are 31-60 days delinquent,
10% are 61-90 days delinquent, and 10% of the
company's credit customers are over 90 days past-due.
That is a sizable percentage of delinquent accounts.
• Usually, if a customer is between 90-120 days past due
on a debt, that bill is seen as uncollectible or a bad
debt. In this example, this company has $10,000 in bad
debts out of $100,000 in accounts receivable. Bad
debts are tax-deductible, but companies would rather
not have them.
• This company is undoubtedly suffering from a cash flow perspective because
of these delinquencies. Their cash flow is probably low and they are having
to borrow short-term funds in order to cover these delinquent accounts with
regard to their working capital. This means they are paying interest on short-
term debt, which hurts their cash flow even more and negatively impacts
profitability.

• It looks like there may be a problem with the company's credit policy,
collections policy, or both. The owner needs to re-evaluate the credit and
collections policy and see if the policies need to be tightened up. Perhaps
they are offering credit to marginal credit customers and that needs to be
stopped. Perhaps they are not collecting aggressively enough.
Use of Aging Report in Audit
• An accounts receivable aging report is needed during
an audit to determine whether the company's
accounts receivable balance is properly valued.
• It is used as a gauge to determine the financial
health of a company's customers. If the accounts
receivable aging shows a company's receivables are
being collected much slower than normal, this is a
warning sign that business may be slowing down or
that the company is taking greater credit risk in its
sales practices
Scatter diagram
• The scatter diagram graphs pairs of numerical data,
with one variable on each axis, to look for a
relationship between them. If the variables are
correlated, the points will fall along a line or curve. The
better the correlation, the tighter the points will hug
the line.
• Eg. Variable A is the number of employees trained on
new software, and variable B is the number of calls to
the computer help line. You suspect that more training
reduces the number of calls. Plot number of people
trained versus number of calls.
Correlation and Regression
• Correlation and Regression are the two analysis based
on multivariate distribution. A multivariate distribution
is described as a distribution of multiple variables.
• Correlation is described as the analysis which lets us
know the association or the absence of the relationship
between two variables ‘x’ and ‘y’.
• Regression analysis, predicts the value of the
dependent variable based on the known value of the
independent variable, assuming that average
mathematical relationship between two or more
variables.
Correlation and Regression
Basis for
Correlation Regression
Comparison
Regression describes how
Correlation is a statistical measure
an independent variable is
Meaning which determines co-relationship
numerically related to the
or association of two variables.
dependent variable.
To fit a best line and
To represent linear relationship estimate one variable on
Usage
between two variables. the basis of another
variable.
Dependent and
Independent No difference Both variables are different.
variables
Regression indicates the
Correlation coefficient indicates
impact of a unit change in
Indicates the extent to which two variables
the known variable (x) on
move together.
the estimated variable (y).
To estimate values of
To find a numerical value
random variable on the
Objective expressing the relationship
basis of the values of fixed
between variables.
variable.
Correlation Coefficient
• Statistic showing the degree of relation
between two variables.
• It is also called Pearson's correlation or
product moment correlation coefficient.
• It measures the nature and strength
between two variables of the quantitative
type.
Cont…
• If the sign is +ve this means the relation is
direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one variable
is associated with a
decrease in the other variable).
• While if the sign is -ve this means an inverse or
indirect relationship (which means an increase
in one variable is associated with a decrease in
the other).
• The value of r ranges between ( -1) and ( +1)
Cont…
• The value of r denotes the strength of the
association as illustrated.
• If r = Zero this means no association or
correlation between the two variables.
• If 0 < r < 0.25 = weak correlation.
• If 0.25 ≤ r < 0.75 = intermediate correlation.
• If 0.75 ≤ r < 1 = strong correlation.
• If r = l = perfect correlation
Formula for Correlation coffeciant

 xy   x y
r n
 ( x) 2
  ( y) 
2
x 
2 .  y 
2 
 n  n 
  
Regression Analyses
• It is the technique concerned with
predicting some variables by knowing
others.
• The process of predicting variable Y using
variable X
• Tells you how values in y change as a
function of changes in values of x
Cont…
• Regression tells us how to draw the
straight line described by the correlation
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
Cont…
• Regression equation describes the regression
line mathematically
– Intercept
– Slope ŷ  a  bX

 x y
ŷ  y  b(x  x)  xy  n
b1 
(  x) 2
 x 2

n
Y
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Normalization
• In statistics and applications of statistics, normalization can
have a range of meanings. In the simplest cases, normalization
of ratings means adjusting values measured on different scales
to a notionally common scale, often prior to averaging.
• In another usage in statistics, normalization refers to the
creation of shifted and scaled versions of statistics, where the
intention is that these normalized values allow the comparison
of corresponding normalized values for different datasets in a
way that eliminates the effects of certain gross influences, as in
an anomaly time series.
• Some types of normalization involve only a rescaling, to arrive
at values relative to some size variable.
THANK YOU

You might also like