Lecture Notes For Week 5-1
Lecture Notes For Week 5-1
Lecture outline
5 Methods of Biochemical Sample Preparation and Data Analysis ....................................... 1
Lecture outline...................................................................................................................................... 1
Introduction .......................................................................................................................................... 2
Lecture learning outcomes (LLOs)...................................................................................................... 2
5.1 Sample Selection and Sampling Plans.................................................................................... 3
5.1.1 Purpose of Analysis ..................................................................................................... 4
5.1.2 Nature of Measured Property ..................................................................................... 4
5.1.3 Nature of Population ................................................................................................... 5
5.1.4 Nature of Test Procedure ............................................................................................ 6
5.1.5 Developing a Sampling Plan ...................................................................................... 6
5.1.6 Preparation of Laboratory Samples ........................................................................... 7
5.2 Data Analysis and Reporting .................................................................................................... 8
5.2.1 Sources of Error ......................................................................................................... 10
5.2.2 Propagation of Errors ................................................................................................ 10
5.2.3 Significant Figures and Rounding ............................................................................ 11
5.2.4 Standard Curves: Regression Analysis .................................................................... 12
5.2.5 Rejecting Data ............................................................................................................ 13
Introduction
Biochemical sample preparation and data analysis represent the cornerstone of modern
biological research, providing the essential framework for understanding the intricacies of
cellular processes, molecular interactions, and the underlying mechanisms of health and
disease. These methodologies are indispensable in extracting meaningful information from
biological samples, whether derived from tissues, cells, or bodily fluids. The meticulous
preparation of samples ensures the reliability and reproducibility of experimental results,
while sophisticated data analysis techniques allow researchers to derive insightful
interpretations from the vast and complex datasets generated.
In this dynamic field, researchers employ a diverse array of techniques to prepare biological
samples for analysis. From the initial collection of specimens to the extraction of proteins,
nucleic acids, and metabolites, each step is carefully designed to preserve the integrity of
the biological material and to obtain accurate representations of its molecular constituents.
Subsequently, data analysis plays a pivotal role in translating raw experimental data into
meaningful biological insights. Techniques ranging from statistical analyses to advanced
computational approaches, including machine learning and artificial intelligence, enable
researchers to discern patterns, identify correlations, and unravel the complexities inherent
in biological systems.
Selection of an appropriate fraction of the whole material is one of the most important
stages of food analysis procedures and can lead to large errors when not carried out
correctly.
Population. The whole of the material whose properties we are trying to obtain an estimate
of is usually referred to as the population.
Sample. Only a fraction of the population is usually selected for analysis, which is referred
to as the sample. The sample may be comprised of one or more sub-samples selected from
different regions within the population.
Laboratory Sample. The sample may be too large to conveniently analyze using a
laboratory procedure and so only a fraction of it is actually used in the final laboratory
analysis. This fraction is usually referred to as the laboratory sample.
The primary objective of sample selection is to ensure that the properties of the laboratory
sample are representative of the properties of the population, otherwise erroneous results
will be obtained. Selection of a limited number of samples for analysis is of great benefit
because it allows a reduction in time, expense and personnel required to carry out the
analytical procedure, while still providing useful information about the properties of the
population. Nevertheless, one must always be aware that analysis of a limited number of
samples can only give an estimate of the true value of the whole population.
Sampling Plans. To ensure that the estimated value obtained from the laboratory sample
is a good representation of the true value of the population it is necessary to develop a
sampling plan. A sampling plan should be a clearly written document that contains precise
details that an analyst uses to decide the sample size, the locations from which the sample
should be selected, the method used to collect the sample, and the method used to
preserve them prior to analysis. It should also stipulate the required documentation of
procedures carried out during the sampling process. The choice of a particular sampling
plan depends on the purpose of the analysis, the property to be measured, the nature of
the total population and of the individual samples, and the type of analytical technique used
to characterize the samples. For certain products and types of populations sampling plans
have already been developed and documented by various organizations which authorize
official methods, e.g., the Association of Official Analytical Chemists (AOAC). Some of the
most important considerations when developing or selecting an appropriate sampling plan
are discussed below.
Raw materials. Raw materials are often analyzed before acceptance by a factory, or before
use in a particular manufacturing process, to ensure that they are of an appropriate quality.
Process control samples. A biochemical is often analyzed during processing to ensure that
the process is operating in an efficient manner. Thus if a problem develops during
processing it can be quickly detected and the process adjusted so that the properties of
the sample are not adversely effected. Techniques used to monitor process control must
be capable of producing precise results in a short time. Manufacturers can either use
analytical techniques that measure the properties of biochemicals on-line, or they can select
and remove samples and test them in a quality assurance laboratory.
Finished products. Samples of the final product are usually selected and tested to ensure
that the biochemical is safe, meets legal and labeling requirements, and is of a high and
consistent quality. Officially sanctioned methods are often used for determining nutritional
labeling.
it is or is not spoilt. On the other hand, a variable is some property that can be measured
on a continuous scale, such as the weight, fat content or moisture content of a material.
Variable sampling usually requires less samples than attribute sampling.
The type of property measured also determines the seriousness of the outcome if the
properties of the laboratory sample do not represent those of the population. For example,
if the property measured is the presence of a harmful substance (such as bacteria, glass or
toxic chemicals), then the seriousness of the outcome if a mistake is made in the sampling
is much greater than if the property measured is a quality parameter (such as color or
texture). Consequently, the sampling plan has to be much more rigorous for detection of
potentially harmful substances than for quantification of quality parameters.
A population may be either finite or infinite. A finite population is one that has a definite
size, e.g., a truckload of apples, a tanker full of milk, or a vat full of oil. An infinite population
is one that has no definite size, e.g., a conveyor belt that operates continuously, from which
biochemicals are selected periodically. Analysis of a finite population usually provides
information about the properties of the population, whereas analysis of an infinite
population usually provides information about the properties of the process. To facilitate
the development of a sampling plan it is usually convenient to divide an "infinite"
population into a number of finite populations, e.g., all the products produced by one shift
of workers, or all the samples produced in one day.
Sample size. The size of the sample selected for analysis largely depends on the expected
variations in properties within a population, the seriousness of the outcome if a bad sample
is not detected, the cost of analysis, and the type of analytical technique used. Given this
information it is often possible t to use statistical techniques to design a sampling plan that
specifies the minimum number of subsamples that need to be analyzed to obtain an
accurate representation of the population. Often the size of the sample is impractically
large, and so a process known as sequential sampling is used. Here subsamples selected
from the population are examined sequentially until the results are sufficiently definite from
a statistical viewpoint. For example, sub-samples are analyzed until the ratio of good ones
to bad ones falls within some statistically predefined value that enables one to confidently
reject or accept the population.
Sample location. In homogeneous populations it does not matter where the sample is
taken from because all the sub-samples have the same properties. In heterogeneous
populations the location from which the sub-samples are selected is extremely important.
In random sampling the sub-samples are chosen randomly from any location within the
material being tested. Random sampling is often preferred because it avoids human bias
in selecting samples and because it facilitates the application of statistics. In systematic
sampling the samples are drawn systematically with location or time, e.g., every 10th box
in a truck may be analyzed, or a sample may be chosen from a conveyor belt every 1 minute.
This type of sampling is often easy to implement, but it is important to be sure that there is
not a correlation between the sampling rate and the sub-sample properties. In judgment
sampling the subsamples are drawn from the whole population using the judgment and
experience of the analyst. This could be the easiest sub-sample to get to, such as the boxes
of product nearest the door of a truck. Alternatively, the person who selects the sub-
samples may have some experience about where the worst sub-samples are usually found,
e.g., near the doors of a warehouse where the temperature control is not so good. It is not
usually possible to apply proper statistical analysis to this type of sampling, since the sub-
samples selected are not usually a good representation of the population.
Sample collection. Sample selection may either be carried out manually by a human being
or by specialized mechanical sampling devices. Manual sampling may involve simply
picking a sample from a conveyor belt or a truck, or using special cups or containers to
collect samples from a tank or sack. The manner in which samples are selected is usually
specified in sampling plans.
Enzymatic Inactivation. Many biochemicals contain active enzymes they can cause
changes in the properties of the biochemical prior to analysis, e.g., proteases, cellulases,
lipases, etc. If the action of one of these enzymes alters the characteristics of the compound
being analyzed, then it will lead to erroneous data, and it should therefore be inactivated
or eliminated. Freezing, drying, heat treatment and chemical preservatives (or a
combination) are often used to control enzyme activity, with the method used depending
on the type of biochemical being analyzed and the purpose of the analysis.
Physical Changes. A number of physical changes may occur in a sample, e.g., water may
be lost due to evaporation or gained due to condensation; fat or ice may melt or crystallize;
structural properties may be disturbed. Physical changes can be minimized by controlling
the temperature of the sample, and the forces that it experiences.
d) Sample Identification
Laboratory samples should always be labeled carefully so that if any problem develops its
origin can easily be identified. The information used to identify a sample includes:
a) Sample description,
The analyst should always keep a detailed notebook clearly documenting the sample
selection and preparation procedures performed and recording the results of any analytical
procedures carried out on each sample. Each sample should be marked with a code on its
label that can be correlated to the notebook. Thus, if any problem arises, it can easily be
identified.
a best estimate of the value being measured and a statistical indication of the reliability of
the value. A variety of statistical techniques are available that enable us to obtain this
information about the laboratory sample from multiple measurements.
Here n is the total number of measurements, Xi is the individually measured values and is
the mean value.
The mean is the best experimental estimate of the value that can be obtained from the
measurements. It does not necessarily have to correspond to the true value of the
parameter one is trying to measure. There may be some form of systematic error in our
analytical method that means that the measured value is not the same as the true value.
Accuracy refers to how closely the measured value agrees with the true value. The problem
with determining accuracy is that the true value of the parameter being measured is often
not known. Nevertheless, it is sometimes possible to purchase or prepare standards that
have known properties and analyze these standards using the same analytical technique as
used for the unknown biochemical samples. The absolute error Eabs, which is the difference
between the true value (Xtrue) and the measured value (Xi), can then be determined:
For these reasons, analytical instruments should be carefully maintained and frequently
calibrated to ensure that they are operating correctly.
Another parameter that is commonly used to provide an indication of the relative spread of
the data around the mean is the coefficient of variation, CV = [SD /x] × 100%.
b) Random Errors
These produce data that vary in a non-reproducible fashion from one measurement to the
next e.g., instrumental noise. This type of error determines the standard deviation of a
measurement. There may be a number of different sources of random error and these are
accumulative.
c) Systematic Errors
A systematic error produces results that consistently deviate from the true answer in some
systematic way, e.g., measurements may always be 10 % too high. This type of error would
occur if the volume of a pipette was different from the stipulated value. For example, a
nominally 100 cm3 pipette may always deliver 101 cm3 instead of the correct value.
To make accurate and precise measurements it is important when designing and setting
up an analytical procedure to identify the various sources of error and to minimize their
effects. Often, one particular step will be the largest source of error, and the best
improvement in accuracy or precision can be achieved by minimizing the error in this step.
errors there are a number of simple rules that can be followed to calculate the error in the
final result:
Here, Δ X is the standard deviation of the mean value X, ΔY is the standard deviation of the
mean value Y, and Δ Z is the standard deviation of the mean value Z. These simple rules
should be learnt and used when calculating the overall error in a final result.
As an example, let us assume that we want to determine the fat content of a biochemical
(food) and that we have previously measured the mass of extracted fat extracted from the
biochemical (ME) and the initial mass of the biochemical (MI):
ME = 3.1 ± 0.3 g
MI = 10.5 ± 0.7 g
𝑀𝐸
% Fat Content = × 100
𝑀𝐼
To calculate the mean and standard deviation of the fat content we need to use the
multiplication rule (Z=X/Y) given by Equation 4. Initially, we assign values to the various
parameters in the appropriate propagation of error equation:
X = 3.1; ΔX = 0.3
Y = 10.5; ΔY = 0.7
𝑥 3.1
% Fat Content = Z = × 100 = × 100 = 29.5%
𝑦 10.5
Hence, the fat content of the biochemical is 29.5 ± 3.5%. It may be necessary to carry out a
number of different steps in a calculation, some that involve addition/subtraction and some
that involve multiplication/division. When carrying out multiplication/division calculations it
is necessary to ensure that all appropriate addition/subtraction calculations have been
completed first.
one that is known to be uncertain. For example, a reported value of 12.13, means that the
12.1 is known to be correct but the 3 at the end is uncertain, it could be either a 2 or a 4
instead.
For multiplication (Z = X × Y) and division (Z = X/Y), the significant figures in the final result
(Z) should be equal to the significant figures in the number from which it was calculated (X
or Y) that has the lowest significant figures.
For example, 12.312 (5 significant figures) x 31.1 (3 significant figures) = 383 (3 significant
figures). For addition (Z = X + Y) and subtraction (Z = X - Y), the significant figures in the
final result (Z) are determined by the number from which it was calculated (X or Y) that has
the last significant figure in the highest decimal column. For example, 123.4567 (last
significant figure in the "0.0001" decimal column) + 0.31 (last significant figure in the "0.01"
decimal column) = 123.77 (last significant figure in the "0.01" decimal column). Or, 1310
(last significant figure in the "10" decimal column) + 12.1 (last significant figure in the "0.1"
decimal column) = 1320 (last significant figure in the "10" decimal column).
When rounding numbers: always round any number with a final digit less than 5
downwards, and 5 or more upwards, e.g. 23.453 becomes 23.45; 23.455 becomes 23.46;
23.458 becomes 23.46. It is usually desirable to carry extra digits throughout the
calculations and then round off the final result.
A best-fit line is drawn through the date using regression analysis, which has a gradient of
a and a y-intercept of b. The concentration of protein in an unknown sample can then be
determined by measuring its absorbance: x = (y-b)/a, where in this example x is the protein
concentration and y is the absorbance. How well the straight-line fits the experimental data
is expressed by the correlation coefficient r2, which has a value between 0 and 1. The closer
the value is to 1 the better the fit between the straight line and the experimental values: r2
= 1 is a perfect fit. Most modern calculators and spreadsheet programs have routines that
can be used to automatically determine the regression coefficient, the slope and the
intercept of a set of data.
Here XBAD is the questionable value, XNEXT is the next closet value to XBAD, XHIGH is the highest
value of the data set and XLOW is the lowest value of the data set. If the Q-value is higher than
the value given in a Q-test table for the number of samples being analyzed then it can be
rejected:
For example, if five measurements were carried out and one measurement was very
different from the rest (e.g., 20,22,25,50,21), having a Q-value of 0.84, then it could be
safely rejected (because it is higher than the value of 0.64 given in the Q-test table for five
observations).