0% found this document useful (0 votes)
85 views

Sampling and Data Analysis

This document discusses sampling and data analysis in food analysis. It covers selecting representative samples from a larger population for analysis. Specifically, it defines key terms like population, sample, and laboratory sample. It also discusses important considerations for developing a sampling plan, such as the purpose of analysis, nature of the measured property, and nature of the population. A good sampling plan ensures the laboratory sample accurately represents the overall population.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Sampling and Data Analysis

This document discusses sampling and data analysis in food analysis. It covers selecting representative samples from a larger population for analysis. Specifically, it defines key terms like population, sample, and laboratory sample. It also discusses important considerations for developing a sampling plan, such as the purpose of analysis, nature of the measured property, and nature of the population. A good sampling plan ensures the laboratory sample accurately represents the overall population.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

2.

SAMPLING AND DATA ANALYSIS

2.1 Introduction
Analysis of the properties of a food material depends on the
successful completion of a number of different steps: planning
(identifying the most appropriate analytical procedure),
sample selection, sample preparation, performance of
analytical procedure, statistical analysis of measurements, and
data reporting.� Most of the subsequent chapters deal with
the description of various analytical procedures developed to
provide information about food properties, whereas this
chapter focuses on the other aspects of food analysis.

2.2 Sample Selection and Sampling Plans


A food analyst often has to determine the characteristics of
a large quantity of food material, such as the contents of a
truck arriving at a factory, a days worth of production, or the
products stored in a warehouse. Ideally, the analyst would like
to analyze every part of the material to obtain an accurate
measure of the property of interest, but in most cases this is
practically impossible. Many analytical techniques destroy the
food and so there would be nothing left to sell if it were all
analyzed. Another problem is that many analytical techniques
are time consuming, expensive or labor intensive and so it is
not economically feasible to analyze large amounts of
material. It is therefore normal practice to select a fraction of
the whole material for analysis, and to assume that its
properties are representative of the whole
material.� Selection of an appropriate fraction of the whole
material is one of the most important stages of food analysis
procedures, and can lead to large errors when not carried out
correctly.�

Populations, Samples and Laboratory Samples.� It is


convenient to define some terms used to describe the
characteristics of a material whose properties are going to be
analyzed.�
 Population.� The whole of the material whose
properties we are trying to obtain an estimate of is
usually referred to as the �population�.
 Sample. Only a fraction of the population is usually
selected for analysis, which is referred to as
the �sample�.� The sample may be comprised of
one or more sub-samples selected from different
regions within the population.
 Laboratory Sample.� The sample may be too large
to conveniently analyze using a laboratory procedure
and so only a fraction of it is actually used in the
final laboratory analysis.� This fraction is usually
referred to as the �laboratory sample�.�
The primary objective of sample selection is to ensure that
the properties of the laboratory sample are representative of
the properties of the population, otherwise erroneous results
will be obtained.� Selection of a limited number of samples
for analysis is of great benefit because it allows a reduction in
time, expense and personnel required to carry out the
analytical procedure, while still providing useful information
about the properties of the population. Nevertheless, one must
always be aware that analysis of a limited number of samples
can only give an estimate of the true value of the whole
population.

Sampling Plans.� To ensure that the estimated value


obtained from the laboratory sample is a good representation
of the true value of the population it is necessary to develop a
�sampling plan�. A sampling plan should be a clearly
written document that contains precise details that an analyst
uses to decide the sample size, the locations from which the
sample should be selected, the method used to collect the
sample, and the method used to preserve them prior to
analysis.� It should also stipulate the required documentation
of procedures carried out during the sampling process. The
choice of a particular sampling plan depends on the purpose
of the analysis, the property to be measured, the nature of the
total population and of the individual samples, and the type of
analytical technique used to characterize the samples. For
certain products and types of populations sampling plans have
already been developed and documented by various
organizations which authorize official methods, e.g., the
Association of Official Analytical Chemists (AOAC). Some
of the most important considerations when developing or
selecting an appropriate sampling plan are discussed below.
2.2.1 Purpose of Analysis
The first thing to decide when choosing a suitable sampling
plan is the purpose of the analysis. Samples are analyzed for a
number of different reasons in the food industry and this
affects the type of sampling plan used:
 Official samples. Samples may be selected for
official or legal requirements by government
laboratories. These samples are analyzed to ensure
that manufacturers are supplying safe foods that
meet legal and labeling requirements. An officially
sanctioned sampling plan and analytical protocol is
often required for this type of analysis.
 Raw materials. Raw materials are often analyzed
before acceptance by a factory, or before use in a
particular manufacturing process, to ensure that they
are of an appropriate quality.
 Process control samples. A food is often analyzed
during processing to ensure that the process is
operating in an efficient manner. Thus if a problem
develops during processing it can be quickly
detected and the process adjusted so that the
properties of the sample are not adversely effected.
Techniques used to monitor process control must be
capable of producing precise results in a short time.
Manufacturers can either use analytical techniques
that measure the properties of foods on-line, or they
can select and remove samples and test them in a
quality assurance laboratory.
 Finished products. Samples of the final product are
usually selected and tested to ensure that the food is
safe, meets legal and labeling requirements, and is of
a high and consistent quality. Officially sanctioned
methods are often used for determining nutritional
labeling.
 Research and Development. Samples are analyzed
by food scientists involved in fundamental research
or in product development.� In many situations it is
not necessary to use a sampling plan
in R&D because only small amounts of materials
with well-defined properties are analyzed.
2.2.2 Nature of Measured Property
Once the reason for carrying out the analysis has been
established it is necessary to clearly specify the particular
property that is going to be measured, e.g., color, weight,
presence of extraneous matter, fat content or microbial count.
The properties of foods can usually be classified as
either attributes or variables. An attribute is something that a
product either does or does not have, e.g., it does or does not
contain a piece of glass, or it is or is not spoilt. On the other
hand, a variable is some property that can be measured on a
continuous scale, such as the weight, fat content or moisture
content of a material. Variable sampling usually
requires less samples than attribute sampling.�
The type of property measured also determines the
seriousness of the outcome if the properties of the laboratory
sample do not represent those of the population.� For
example, if the property measured is the presence of a harmful
substance (such as bacteria, glass or toxic chemicals), then the
seriousness of the outcome if a mistake is made in the
sampling is much greater than if the property measured is a
quality parameter (such as color or texture).� Consequently,
the sampling plan has to be much more rigorous for detection
of potentially harmful substances than for quantification of
quality parameters.

2.2.3 Nature of Population


It is extremely important to clearly define the nature of the
population from which samples are to be selected when
deciding which type of sampling plan to use. Some of the
important points to consider are listed below:
 A population may be either finite or infinite. A finite
population is one that has a definite size, e.g., a
truckload of apples, a tanker full of milk, or a vat full
of oil. An infinite population is one that has no
definite size, e.g., a conveyor belt that operates
continuously, from which foods are selected
periodically. Analysis of a finite population usually
provides information about the properties of the
population, whereas analysis of an infinite
population usually provides information about the
properties of the process.� To facilitate the
development of a sampling plan it is usually
convenient to divide an "infinite" population into a
number of finite populations, e.g., all the products
produced by one shift of workers, or all the samples
produced in one day.
 A population may be
either continuous or compartmentalized. A
continuous population is one in which there is no
physical separation between the different parts of the
sample, e.g., liquid milk or oil stored in a tanker. A
compartmentalized population is one that is split into
a number of separate sub-units, e.g., boxes of potato
chips in a truck, or bottles of tomato ketchup moving
along a conveyor belt. The number and size of the
individual sub-units determines the choice of a
particular sampling plan.
 A population may be
either homogenous or heterogeneous. A
homogeneous population is one in which the
properties of the individual samples are the same at
every location within the material (e.g. a tanker of
well stirred liquid oil), whereas a heterogeneous
population is one in which the properties of the
individual samples vary with location (e.g. a truck
full of potatoes, some of which are bad). If the
properties of a population were homogeneous then
there would be no problem in selecting a sampling
plan because every individual sample would be
representative of the whole population. In practice,
most populations are heterogeneous and so we must
carefully select a number of individual samples from
different locations within the population to obtain an
indication of the properties of the total population.
2.2.4 Nature of Test Procedure
The nature of the procedure used to analyze the food may
also determine the choice of a particular sampling
plan, e.g., the speed, precision, accuracy and cost per analysis,
or whether the technique is destructive or non-destructive.
Obviously, it is more convenient to analyze the properties of
many samples if the analytical technique used is capable of
rapid, low cost, nondestructive and accurate measurements.

2.2.5. Developing a Sampling Plan


After considering the above factors one should be able to
select or develop a sampling plan which is most suitable for a
particular application. Different sampling plans have been
designed to take into account differences in the types of
samples and populations encountered, the information
required and the analytical techniques used. Some of the
features that are commonly specified in official sampling
plans are listed below.
Sample size. The size of the sample selected for analysis
largely depends on the expected variations in properties within
a population, the seriousness of the outcome if a bad sample is
not detected, the cost of analysis, and the type of analytical
technique used. Given this information it is often possible to
use statistical techniques to design a sampling plan that
specifies the minimum number of sub-samples that need to be
analyzed to obtain an accurate representation of the
population.� Often the size of the sample is impractically
large, and so a process known as sequential sampling is
used.� Here sub-samples selected from the population are
examined sequentially until the results are sufficiently definite
from a statistical viewpoint.� For example, sub-samples are
analyzed until the ratio of good ones to bad ones falls within
some statistically predefined value that enables one to
confidently reject or accept the population.
Sample location. In homogeneous populations it does not
matter where the sample is taken from because all the sub-
samples have the same properties. In heterogeneous
populations the location from which the sub-samples are
selected is extremely important. In random sampling the sub-
samples are chosen randomly from any location within the
material being tested. Random sampling is often preferred
because it avoids human bias in selecting samples and because
it facilitates the application of statistics.� In systematic
sampling the samples are drawn systematically with location
or time, e.g., every 10th box in a truck may be analyzed, or a
sample may be chosen from a conveyor belt every 1 minute.
This type of sampling is often easy to implement, but it is
important to be sure that there is not a correlation between the
sampling rate and the sub-sample properties.� In judgment
sampling the sub-samples are drawn from the whole
population using the judgment and experience of the analyst.
This could be the easiest sub-sample to get to, such as the
boxes of product nearest the door of a truck. Alternatively, the
person who selects the sub-samples may have some
experience about where the worst sub-samples are usually
found, e.g., near the doors of a warehouse where the
temperature control is not so good. It is not usually possible to
apply proper statistical analysis to this type of sampling, since
the sub-samples selected are not usually a good representation
of the population.
Sample collection. Sample selection may either be carried
out manually by a human being or by specialized mechanical
sampling devices. Manual sampling may involve simply
picking a sample from a conveyor belt or a truck, or using
special cups or containers to collect samples from a tank or
sack.� The manner in which samples are selected is usually
specified in sampling plans.

2.3 Preparation of Laboratory Samples


Once we have selected a sample that represents the
properties of the whole population, we must prepare it for
analysis in the laboratory. The preparation of a sample for
analysis must be done very carefully in order to make accurate
and precise measurements.

2.3.1 Making Samples Homogeneous


The food material within the sample selected from
the population is usually heterogeneous, i.e., its properties
vary from one location to another.� Sample heterogeneity
may either be caused by variations in the properties of
different units within the sample (inter-unit variation) and/or
it may be caused by variations within the individual units in
the sample (intra-unit variation). The units in the sample
could be apples, potatoes, bottles of ketchup, containers of
milk etc.� An example of inter-unit variation would be a box
of oranges, some of good quality and some of bad
quality.� An example of intra-unit variation would be an
individual orange, whose skin has different properties than its
flesh. For this reason it is usually necessary to make
samples homogeneous before they are analyzed, otherwise it
would be difficult to select a representative laboratory
sample from the sample. A number of mechanical devices
have been developed for homogenizing foods, and the type
used depends on the properties of the food being analyzed
(e.g., solid, semi-solid, liquid).� Homogenization can be
achieved using mechanical devices (e.g., grinders, mixers,
slicers, blenders), enzymatic methods (e.g., proteases,
cellulases, lipases) or chemical methods (e.g., strong acids,
strong bases, detergents).
2.3.2. Reducing Sample Size
Once the sample has been made homogeneous, a small
more manageable portion is selected for analysis. This is
usually referred to as a laboratory sample, and ideally it will
have properties which are representative of the population
from which it was originally selected. Sampling plans often
define the method for reducing the size of a sample in order to
obtain reliable and repeatable results.

2.3.3. Preventing Changes in Sample


Once we have selected our sample we have to ensure that it
does not undergo any significant changes in its properties
from the moment of sampling to the time when the actual
analysis is carried out, e.g., enzymatic, chemical, microbial or
physical changes. There are a number of ways these changes
can be prevented.
 Enzymatic Inactivation. Many foods contain active
enzymes they can cause changes in the properties of the
food prior to analysis, e.g., proteases, cellulases, lipases,
etc. If the action of one of these enzymes alters the
characteristics of the compound being analyzed then it
will lead to erroneous data and it should therefore be
inactivated or eliminated. Freezing, drying, heat
treatment and chemical preservatives (or a combination)
are often used to control enzyme activity, with the
method used depending on the type of food being
analyzed and the purpose of the analysis.
 Lipid Protection. Unsaturated lipids may be altered
by various oxidation reactions. Exposure to light,
elevated temperatures, oxygen or pro-oxidants can
increase the rate at which these reactions proceed.
Consequently, it is usually necessary to store samples
that have high unsaturated lipid contents under nitrogen
or some other inert gas, in dark rooms or covered
bottles and in refrigerated temperatures. Providing that
they do not interfere with the analysis antioxidants may
be added to retard oxidation.
 Microbial Growth and
Contamination. Microorganisms are present naturally in
many foods and if they are not controlled they can alter
the composition of the sample to be analyzed. Freezing,
drying, heat treatment and chemical preservatives (or a
combination) are often used to control the growth of
microbes in foods.
 Physical Changes. A number of physical changes
may occur in a sample, e.g., water may be lost due to
evaporation or gained due to condensation; fat or ice
may melt or crystallize; structural properties may be
disturbed. Physical changes can be minimized by
controlling the temperature of the sample, and the
forces that it experiences.

2.3.4. Sample Identification


Laboratory samples should always be labeled carefully so
that if any problem develops its origin can easily be identified.
The information used to identify a sample includes: a) Sample
description, b) Time sample was taken, c) Location sample
was taken from, d) Person who took the sample, and, e)
Method used to select the sample.� The analyst should
always keep a detailed notebook clearly documenting the
sample selection and preparation procedures performed and
recording the results of any analytical procedures carried out
on each sample.� Each sample should be marked with
a code on its label that can be correlated to the
notebook.� Thus if any problem arises, it can easily be
identified.

2.4. Data Analysis and Reporting


Food analysis usually involves making a number of
repeated measurements on the same sample to provide
confidence that the analysis was carried out correctly and to
obtain a best estimate of the value being measured and a
statistical indication of the reliability of the value.� A variety
of statistical techniques are available that enable us to obtain
this information about the laboratory sample from multiple
measurements.

2.4.1. Measure of Central Tendency of Data


The most commonly used parameter for representing the
overall properties of a number of measurements is the mean:
�����������
�������������������� (1)
Here n is the total number of measurements, xi is the
individually measured values and is the mean value.
The mean is the best experimental estimate of the value
that can be obtained from the measurements. It does not
necessarily have to correspond to the true value of the
parameter one is trying to measure. There may be some form
of systematic error in our analytical method that means that
the measured value is not the same as the true value (see
below). Accuracy refers to how closely the measured value
agrees with the true value. The problem with determining the
accuracy is that the true value of the parameter being
measured is often not known. Nevertheless, it is sometimes
possible to purchase or prepare standards that have known
properties and analyze these standards using the same
analytical technique as used for the unknown food samples.
The absolute error Eabs, which is the difference between the
true value (xtrue) and the measured value (xi), can then be
determined: Eabs = (xi - xtrue).� For these reasons, analytical
instruments should be carefully maintained and frequently
calibrated to ensure that they are operating correctly.

2.4.2. Measure of Spread of Data


The spread of the data is a measurement of how closely
together repeated measurements are to each other.
The standard deviation is the most commonly used measure
of the spread of experimental measurements. This is
determined by assuming that the experimental measurements
vary randomly about the mean, so that they can be represented
by a normal distribution.� The standard deviation SD of a set
of experimental measurements is given by the following
equation:

�����������
������������������������� (
2)
Measured values within the specified range:
 SD means 68% values within range (x - SD) to (x + SD)
 2SD means 95% values within range (x - 2SD) to (x + 2SD)
 3SD means >99% values within range (x - 3SD) to (x + 3SD)
Another parameter that is commonly used to provide an
indication of the relative spread of the data around the mean is
the coefficient of variation, CV = [SD / ]  100%.

2.4.3. Sources of Error


There are three common sources of error in any analytical
technique:
 Personal Errors (Blunders). These occur when the
analytical test is not carried out correctly: the wrong
chemical reagent or equipment might have been
used; some of the sample may have been spilt; a
volume or mass may have been recorded incorrectly;
etc. It is partly for this reason that analytical
measurements should be repeated a number of times
using freshly prepared laboratory
samples.� Blunders are usually easy to identify and
can be eliminated by carrying out the analytical
method again more carefully.
 Random Errors. These produce data that vary in a
non-reproducible fashion from one measurement to
the next e.g., instrumental noise. This type of error
determines the standard deviation of a measurement.
There may be a number of different sources of
random error and these are accumulative (see
�Propagation of Errors�).
 Systematic Errors. A systematic error produces
results that consistently deviate from the true answer
in some systematic way, e.g., measurements may
always be 10% too high. This type of error would
occur if the volume of a pipette was different from
the stipulated value. For example, a nominally 100
cm3 pipette may always deliver 101 cm3instead of
the correct value.
To make accurate and precise measurements it is important
when designing and setting up an analytical procedure to
identify the various sources of error and to minimize their
effects. Often, one particular step will be the largest source of
error, and the best improvement in accuracy or precision can
be achieved by minimizing the error in this step.
2.4.4. Propagation of Errors
Most analytical procedures involve a number of steps
(e.g., weighing, volume measurement, reading dials), and
there will be an error associated with each step. These
individual errors accumulate to determine the overall error in
the final result. For random errors there are a number of
simple rules that can be followed to calculate the error in the
final result:
Addition (Z = X+Y) and Subtraction (Z = X-
Y): ���������
��������������� (3)
Multiplication (Z = XY) and Division (Z =

X/Y): ����������
����� (4)
Here, X is the standard deviation of the mean
value X, Y is the standard deviation of the mean
value Y, and Z is the standard deviation of the mean value Z.
These simple rules should be learnt and used when calculating
the overall error in a final result.�
As an example, let us assume that we want to determine
the fat content of a food and that we have previously
measured the mass of extracted fat extracted from the food
(ME) and the initial mass of the food (MI):�
ME = 3.1  0.3 g
MI = 10.5  0.7 g
% Fat Content = 100  ME / MI
To calculate the mean and standard deviation of the fat
content we need to use the multiplication rule (Z=X/Y) given
by Equation 4.� Initially, we assign values to the various
parameters in the appropriate propagation of error equation:
X = 3.1; X = 0.3
Y = 10.5; Y = 0.7
% Fat Content = Z = 100X/Y� = 1003.1/10.5 = 29.5%
Z = Z  [(X/X)2+(Y/Y)2] =
29.5%  [(0.3/3.1)2+(0.7/10.5)2] = 3.5%
Hence, the fat content of the food is 29.5  3.5%.� In
reality, it may be necessary to carry out a number of different
steps in a calculation, some that involve addition/subtraction
and some that involve multiplication/division.� When
carrying out multiplication/division calculations it is
necessary to ensure that all appropriate addition/subtraction
calculations have been completed first.

2.4.5. Significant Figures and Rounding


The number of significant figures used in reporting a final
result is determined by the standard deviation of the
measurements. A final result is reported to the correct number
of significant figures when it contains all the digits that are
known to be correct, plus a final one that is known to be
uncertain. For example, a reported value of 12.13, means that
the 12.1 is known to be correct but the 3 at the end is
uncertain, it could be either a 2 or a 4 instead.
For multiplication (Z = X Y) and division (Z = X/Y), the
significant figures in the final result (Z) should be equal to the
significant figures in the number from which it was calculated
(X or Y) that has the lowest significant figures. For example,
12.312 (5 significant figures) x 31.1 (3 significant figures) =
383 (3 significant figures). For addition (Z = X + Y) and
subtraction (Z = X - Y), the significant figures in the final
result (Z) are determined by the number from which it was
calculated (X or Y) that has the last significant figure in the
highest decimal column. For example, 123.4567 (last
significant figure in the "0.0001" decimal column) + 0.31 (last
significant figure in the "0.01" decimal column) = 123.77 (last
significant figure in the "0.01" decimal column). Or, 1310
(last significant figure in the "10" decimal column) + 12.1
(last significant figure in the "0.1" decimal column) = 1320
(last significant figure in the "10" decimal column).
When rounding numbers: always round any number with a
final digit less than 5 downwards, and 5 or more
upwards, e.g. 23.453 becomes 23.45; 23.455 becomes 23.46;
23.458 becomes 23.46. It is usually desirable to carry extra
digits throughout the calculations and then round off the final
result.

2.4.6. Standard Curves: Regression Analysis


When carrying out certain analytical procedures it is
necessary to prepare standard curves that are used to
determine some property of an unknown material. A series of
calibration experiments is carried out using samples with
known properties and a standard curve is plotted from this
data. For example, a series of protein solutions with known
concentration of protein could be prepared and their
absorbance of electromagnetic radiation at 280 nm could be
measured using a UV-visible spectrophotometer. For dilute
protein solutions there is a linear relationship between
absorbance and protein concentration:
�����������������������

��������������
A best-fit line is drawn through the date using regression
analysis, which has a gradient of a and a y-intercept of b. The
concentration of protein in an unknown sample can then be
determined by measuring its absorbance: x = (y-b)/a, where in
this example x is the protein concentration and y is the
absorbance. How well the straight-line fits the experimental
data is expressed by the correlation coefficient r , which has a
2

value between 0 and 1. The closer the value is to 1 the better


the fit between the straight line and the experimental
values: r = 1 is a perfect fit. Most modern calculators and
2

spreadsheet programs have routines that can be used to


automatically determine the regression coefficient, the slope
and the intercept of a set of data.

2.4.7. Rejecting Data


When carrying out an experimental analytical procedure it
will sometimes be observed that one of the measured values is
very different from all of the other values, e.g., as the result of
a �blunder� in the analytical procedure. Occasionally, this
value may be treated as being incorrect, and it can be rejected.
There are certain rules based on statistics that allow us to
decide whether a particular point can be rejected or not. A test
called the Q-test is commonly used to decide whether an
experimental value can be rejected or not.

�����������
Here X is the questionable value, X is the next closet
BAD NEXT

value to X , X is the highest value of the data set and X is


BAD HIGH LOW

the lowest value of the data set. If the Q-value is higher than
the value given in a Q-test table for the number of samples
being analyzed then it can be rejected:

Number of Q-value for Data


Rejection
Observations
(90% confidence level)

3 0.94

4 0.76

5 0.64

6 0.56

7 0.51

8 0.47

9 0.44

10 0.41

For example, if five measurements were carried out and


one measurement was very different from the rest
(e.g., 20,22,25,50,21), having a Q-value of 0.84, then it could
be safely rejected (because it is higher than the value of 0.64
given in the Q-test table for five observations).
References
Nielsen, S.S. (1998). Food Analysis, 2nd Edition. Aspen
Publication, Gaithersberg, Maryland.
Procter, A. and Meullenet, J.F. (1998).� Sampling and
Sample Preparation.� In: Food Analysis, 2nd Edition. Aspen
Publication, Gaithersberg, Maryland

You might also like