Lesson 1
Lesson 1
PRE-TEST:
1. Describe the instrument/s that you will use for your research study and detail your
justification.
2. Enumerate the statistical techniques you will use and the justification for using such.
Learning Activities:
The learner shall be able to:
1. Collects data using appropriate instruments.
2. Presents and interprets data in tabular and graphical forms.
3. Uses statistical techniques to analyze data.
Table design
In order to ensure that your table is clear and easy to interpret there are a number of
design issues that need to be considered. These are listed below:
a. Since tables consist of rows and columns of information it is important to consider
how the data are arranged between the two. Most people find it easier to identify
patterns in numerical data by reading down a column rather than across a row. It is
easier to interpret the data if they arranged according to their magnitude so there is
numerical progression down the columns, although this may not always possible.
b. If there are several columns or categories of information a table can appear complex
and become hard to read. It also becomes more difficult to list the data by magnitude
since the order that applies to one column may not be the same for others. In such
cases you need to decide which column contains the most important trend and this
should be used to structure the table. If the columns are equally important it is often
better to include two or more simple tables rather than using a single more complex
one.
c. Numbers in tables should be presented in their most simple format. This may mean
rounding up values to avoid the use of decimal places, stating the units (e.g. Php 4.6
million rather than Php 4,600,000) or using scientific notation (e.g. 6.315 X 10-2 rather
than 0.06315).
d. All tables should be presented with a title that contains enough detail that a reader
can understand the content without needing to consult the accompanying text. There
should also be information about the source of the data being used; this may be a
reference to a book or journal, or could indicate that the data are results from
experiment carried out on a particular date.
e. Where more than one table is being presented it is standard practice to give each one
a unique reference number, and in larger pieces of work, such as dissertations, a list
of tables with their page number is usually provided in addition to contents page.
f. The formatting of the table should not resemble a spreadsheet where each entry is
bounded by a box since this makes it difficult to read across rows or down columns.
However, the design of the table should help the reader interpret the data and so the
use of lines and/or bold text to separate headings from the body of data, or
highlighting/shading specific rows or may be effective. Avoid large gaps between
columns since this also makes it difficult to read along a row.
Graphs
Graphs are good means of describing, exploring or summarizing numerical data
because the use of a visual image can simplify complex information and help to highlight
patterns and trends in the data. They are a particularly effective way of presenting a large
amount of data but can also be used instead of a table to present smaller datasets. Using
graphs can help depict data and well-made graphs convey information quickly. There are
many different graph types to choose from and a critical issue is to ensure that the graph type
selected is the most appropriate for the data. Having done this, it is then essential to ensure
that the design and presentation of the graph help the reader or audience interpret the data.
Types of graphs
Different types of graphs are used for different situations. For this reason, it helps to
know a little bit about what the available graphs are. Many times, the kind of data is what
determines the appropriate graphs to use.
a. Bar charts – are one of the most commonly used types of graph and are used to
display and compare the number, frequency or other measure (e.g. mean) for
different discrete categories or groups. The graph is constructed such that the heights
or lengths of the different bars are proportional to the size of the category they
represent. Since the x-axis (the horizontal axis) represents the different categories it
has no scale. The y-axis (the vertical axis) does have a scale and this indicates the
units of measurement. The bars can be drawn either vertically or horizontally
depending upon the number of categories and length or complexity of the category
labels. The bars’ heights are scaled according to their values and the bars can be
compared to each other. Bar graphs can be drawn in a 3-dimensional way and
compiled for data comparison about the same thing or location. So that more
important categories are emphasized, bars in a bar graph are arranged in order of
frequency.
Figure 2. Sample Bar Chart represents the depth of the Great Lakes
b. Histograms – are a special form of bar chart where the data represent continuous
rather than discrete categories. However, because a continuous category may have
a large number of possible values the data are often grouped to reduce the number
of data points. Unlike a bar chart, in a histogram both the x- and y-axes have a scale.
This means that it is the area of the bar that is proportional to the size of the category
represented and not just its height. Creating a histogram provides a visual
representation of data distribution. Histograms can display a large amount of data
and the frequency of the data values. The median and distribution of the data can be
determined by a histogram. In addition, it can show any outliers or gaps in the data.
Figure 3. Sample Histogram
c. Pie charts – are a visual way of displaying how the total data are distributed between
different categories. They are generally best for showing information grouped into a
small number of categories and are a graphical way of displaying data that might
otherwise be presented as a simple table. Sometimes called a circle graph, pie charts
represent the parts of a whole. Each section or slice of the pie is a data percentage.
From biggest to smallest, segments are arranged in a clockwise formation. This way,
the pie chart features easy-to-compare subjects presented in a neat, easy-to-
understand way.
Figure 4. Sample Pie chart
d. Line graphs – are usually used to show time series data that is how one or more
variables vary over a continuous period of time. In a line graph the x-axis represents
the continuous variable (for example year or distance from the initial measurement)
while the y-axis has a scale and indicates the measurement. Several data series can
be plotted on the same line chart and this is particularly useful for analyzing and
comparing the trends in different datasets. Line graphs provide an excellent way to
map independent and dependent variables that are both quantitative. When both
variables are quantitative, the line segment that connects two points on the graph
expresses a slope, which can be interpreted visually relative to the slope of other lines
or expressed as a precise mathematical formula.
Figure 5. Sample Line Graph
e. Scatter plots – are used to show the relationship between pairs of quantitative
measurements made for the same object or individual. Scatter plots are similar to
line graphs in that they start with mapping quantitative data points. The difference is
that with a scatter plot, the decision is made that the individual points should not be
connected directly together with a line but, instead express a trend. This trend can
be seen directly through the distribution of points or with the addition of a regression
line. A statistical tool used to mathematically express a trend in the data. Additionally,
with a scatter plot a mark, usually a dot or small circle, represents a single data point.
With one mark (point) for every data point a visual distribution of the data can be
seen. Depending on how tightly the points cluster together, you may be able to
discern a clear trend in the data
Figure 6. Sample Scatter Plot
Figure 8. An easy way to arrange blocks is to put them side by side across the field. Letters
represent different treatments.
Blocking is a very powerful tool that is most effective if you can anticipate sources of
variation before you begin an experiment. For example, in an herbicide trial, one side of a
field may have a history of more severe weed problems. If you just scattered your treatments
randomly through the field, a lot of the variation in the data you collected could be due to the
increased weed pressure on one side of the field. Such variation would make it difficult to
determine how well each treatment worked. Because you know one side of the field will have
more weeds, you can remove that source of variation from the statistical analysis by blocking
and improve your chances of identifying differences among treatments.
The process of blocking follows a logical sequence. First, you determine that there is
something (weeds, drainage, sun/shadow, water, soil type, etc.) that is not uniform
throughout the experimental area (field, greenhouse, etc.) that may influence whatever you
are measuring (yield, plant height, etc.). Then you can arrange your treatments into blocks so
that the area within each block is as uniform as possible (see figure 7). Though the area within
a block should be relatively uniform, there may be large differences among the blocks, but
that is what makes blocking effective. Your goal is to maximize the differences among blocks
while minimizing the differences within a block.
The shape of the blocks is not important as long as the plots within a block are as
uniform as possible. Ideally, the only differences among plots within a block should be due to
the treatments. Blocks in field experiments are usually square or rectangular, but they may
be any shape. Blocks in the same experiment do not have to be the same shape; the shape of
individual blocks will be determined by variation in the field that you are trying to minimize.
If you are not sure what shape your blocks should be, square or nearly square blocks are
usually a safe choice.
Blocks may be arranged through the field in many ways. If the field is wide enough, an
easy way to arrange blocks is to place them side-by-side all the way down the field (see figure
8). But blocks do not have to be contiguous and may be scattered through the field in any way
that is convenient for you.
Factorial Arrangement of Treatments
A factorial arrangement of treatments is not an experimental design, though you will
often hear it referred to as a factorial design or a factorial experiment. A factorial
arrangement of treatments means that the experiment is testing two or more factors at the
same time, and that the experiment includes all combinations of all factors. The term "factor"
is used to describe a group of treatments that have something in common. Fungicides,
sources of nitrogen, or corn hybrids could be considered factors in an experiment. Factors
may be defined broadly or narrowly in different experiments. All herbicides may be grouped
as a factor in one experiment, but pre-plant and post-plant herbicides may be treated as
separate factors in another experiment. A single-factor experiment tests one factor at a time;
a two-factor experiment tests two factors at once.
Figure 9. A 2x5 factorial arrangement of treatments in a randomized complete block design
(above) and in a split-plot design (below). A and B represent two levels of one factor, and
the numbers (1-5) represent five levels of a second factor. The combinations (e.g., 4A, 5B,
etc.) denote individual treatment combinations. Either experimental design could be used,
but the randomized complete block design is preferred unless the split-plot design is
required by some limitation on randomization.
Most simple on-farm experiments are single-factor experiments (in a Completely
Randomized or Randomized Complete Block design) and compare things such as crop
varieties or herbicides, but it is sometimes useful to test two or more factors at once. For
example, a two-factor experiment would allow you to compare the yields five corn hybrids at
three planting dates.
This accomplishes three things at once:
➢ It allows you to compare the corn hybrids with each other.
➢ It allows you to evaluate the effect of planting date.
➢ It allows you to determine if varying the planting date changes the relative
performance of the hybrids (e.g. one hybrid may only perform well if planted early).
The first two could be done in separate single-factor experiments, but the third can
only be achieved by having both factors in a single experiment. This becomes especially
important if one factor can have a significant influence on the effect of the other factor. For
example, you might test soybean varieties as one factor and nematicides as another factor. If
a few varieties have good nematode resistance but others do not, they may appear equally
good when effective nematicides are used but varieties with resistance would appear much
better when nematicides are not used. In cases like this, the effect of one factor (variety) is
strongly influenced by the other factor (nematicide). When one factor influences the effect
of the other factor, there is said to be a significant interaction between the two factors. It can
be very important to know if there is an interaction between factors, because if there is an
interaction, you can make predictions or recommendations based on the results of single-
factor experiments only when all other factors are at the same levels they were at in the
experiment. If you change some factor not included in the experiment, the results from your
single-factor experiment may no longer be valid.
With a factorial arrangement of treatments, all values (or levels) of each factor must
be paired with all levels of the other factors. If you have two nematicides and five soybean
varieties, then your treatment list must include each variety with each nematicide for a total
of 10 treatments. This would be referred to as a "two by five factorial" to denote how many
factors were present in the experiment and how many levels of each factor were used. The
number of treatments increases quickly when you add more levels for a factor (if you used
three nematicides instead of two, you would have 15 treatments instead of 10), so choose
your levels carefully or the experiment can get too large to manage.
A factorial arrangement of treatments can be a very powerful tool, but because the
number of treatments can get very large it is best used when some reason exists to believe
that the factors may influence each other and have a significant interaction. If there is no
suspicion that the factors may influence each other, it is frequently easier and more thorough
to test the factors in separate experiments. A factorial arrangement of treatments can be
used with a completely randomized experimental design or a randomized complete block
design. The top half of figure 4 shows a factorial arrangement of treatments in a randomized
complete block design.
Split-Plot Experimental Design
A split-plot experimental design is a special design that is sometimes used with
factorial arrangements of treatments. This design usually is used when an experiment has at
least two factors and some constraint prevent you from randomizing the treatments into a
randomized complete block design. Such a constraint may be based on equipment limitations
or on biological considerations. For example, the equipment you have may make it difficult to
put out a soil fumigant in randomized complete blocks, but you may be able to put out the
fumigant so that all treatments within a block that get the fumigant will be clustered together
rather than scattered throughout the block. You can use a split-plot experimental design to
work around this limitation as long as you are able to randomize the other factors. There are
other situations when this design is appropriate, but a constraint on randomization is the
most likely to occur.
Suppose you want to test the effect of five fungicides to control Cylindrocladium Black
Rot on two varieties of peanut. In this test, you would have a 2x5 factorial arrangement of
treatments: The two factors would be varieties (2 levels of this factor) and fungicides (5 levels
of this factor). Because a factorial arrangement of treatments is not an experimental design,
you still have to select an experimental design that best meets your needs. If you are able to
randomize varieties and fungicides within a block, then you should pick a randomized
complete block design. If there is some reason why you cannot completely randomize the
treatments within each block, then you may be able to use a split-plot design to work around
that limitation. For example, you may have a six-row planter but only enough space in the
field to put out four-row plots. To resolve this dilemma, you could plant all of the plots that
have the same peanut variety together within a block and then randomize the five fungicide
treatments within each peanut variety.
In split-plot designs, the terms "whole plots" and "sub-plots" refer to the plots into
which the factors are randomized. As the names imply, whole plots are subdivided into
subplots. In figure 9, a whole plot would be the areas designated with A or B, and the subplots,
the subdivisions within the whole plots, are designated 1, 2, 3, 4, or 5. In this example, A and
B could represent two varieties (two levels of one factor) and the numbers could represent
different fungicides (five levels of a second factor). Each whole plot serves as a block for the
subplot treatments.
To assign treatments in a split-plot design, start by identifying where each block will
be. Then randomize the whole plot treatments within each block. The whole plot treatments
will be the treatment that you are unable to randomize into a randomized complete block
design. The subplot treatments can then be randomized within each whole plot treatment
(see figure 9).
Experimental Analysis
The type of analysis to be conducted depends on the purpose and design of the trial
and the type of observations made. Statistical analysis is not required in all cases; nor is it
appropriate in certain situations. However, when a comparison between two products or
between one product and no treatment is required, statistical analysis must be provided to
support the interpretation of the data. Novel statistical analysis submitted in support of
experimental data should be accompanied by the raw data and the published literature that
references the statistical technique. This guideline cannot describe all analytical approaches
for all trial designs, but aims to provide some principles of analysis to assist applicants. If you
are not confident of your knowledge in this area, it is highly advisable to seek the assistance
of a competent statistician before starting trial work.
Typically, it is the variable that determines which broad type of analysis is required
(that is, parametric or non-parametric). If the variable is quantitative (binary, binomial,
discrete or continuous), parametric statistical methods should be used, such as analysis of
variance or linear or logistic regression. If the variable is qualitative (nominal and ordinal
methods, such as ranking or scoring), non-parametric methods are required.
Before conducting a parametric analysis of variance, three assumptions should be met to
ensure that the analysis is valid:
➢ additivity of effects
➢ homogeneity of variance
➢ normality of the error.
If these three assumptions cannot be met, non-parametric methods may be preferred.
Parametric tests
Additivity requires that the sources of variability (e.g. treatments) are independent of
each other. Independence results in an additive (e.g. multiplicative or logarithmic) effect on
the response variable (e.g. pest population). The more variables interact with each other, the
greater the chance that the observed response is not the result of the individual treatments
may invalidate the observed results. Sometimes, effect results are not on a natural scale and
must be transformed to different scales (for example, probit or logit) to meet additivity
requirements. Methods to test additivity are available (such as Tukey’s test of additivity).
Homogeneity of variance requires that all the populations tested contain the same
level of variability. The less homogeneity between variances of populations being compared,
the less likely it is that a parametric method will be able to accurately produce a significant
result. There are many tests used to test homogeneity of variance, each with advantages and
disadvantages.
Normality requires the distribution of errors (variance around the mean) to be normal.
Normal distribution is important because the further the distributions are from normal, the
less validity any analysis of variance assessments will have, as there is a greater chance that a
significant result will be false (and vice versa). Standard tests and graphical displays are
available to demonstrate normality.
Analysis of variance
When reporting the results of an analysis of variance (ANOVA), you should present a
table of means of each of the treatments, along with the standard error or confidence interval
(the variability around the mean). Presenting means with the variability of results can
overcome the difficulty of explaining statistically equivalent results when differences
between means are large (and vice versa).
Formal statistical tests, often as F-tests, are usually also performed to demonstrate
any significant results between treatments. Typically, study reports present an analysis to
compare all treatment means against each other. In considering the original objective of the
trial, this may not be necessary and may confuse the analysis and interpretation. For example,
not all treatments need be compared against each other, especially if the comparison of
interest is only a limited set of treatments, such as the new product versus the industry
standard at proposed label rates. If the trial is designed for this purpose, these matters should
be considered at the trial design stage (for example, t-tests may be appropriate). You should
consult appropriate texts and professional statisticians if you are unsure of the most
appropriate test or procedure.
Non-parametric tests
Non-parametric methods may be required and are preferred when the data are
qualitative rather than quantitative or the three assumptions described above cannot be met.
However, non-parametric methods should be used with caution when analyzing small data
sets. There are a number of different non-parametric methods, many of which are suitable
only in certain situations. You may wish to refer to the EPPO’s Design and analysis of efficacy
evaluation trials (PP 1/152(4))(link is external) for references describing which test is relevant
to a particular type of data set.
Trial series
You may need to conduct separate but closely similar trials at different locations
and/or at different times. The series of trials can be analyzed together in certain
circumstances (for example, if they have the same methods, external impacting factors and
pest abundance and similar standard error) and for particular reasons (for example, to
estimate treatment effects over sites and years or to test potential confounding factors). Such
an analysis should not be conducted unless it has been planned for at the trial design stage
so that all requirements can be met. See EPPO guideline PP 1/152(4)(link is external) for more
details.
Interpretation of Experimental Data
This section answers the question, “So what?” in relation to the results of the study.
What do the results of the study mean? This part is, perhaps, the most critical aspect of the
research report. It is often the most difficult to write because it is the least structured. This
section demands perceptiveness and creativity from the researcher.
How do we interpret the result(s) of our study?
1. Tie up the results of the study in both theory and application by pulling together the:
a. conceptual/theoretical framework
b. review of literature: and
c. the study’s potential significance for application
2. Examine, summarize, interpret and justify the results; the draw inferences. Consider
the following:
a. Conclude or summarize
• This technique enables the reader to get the total picture of the findings
in summarized form, and helps orient the reader to the discussion that follows.
b. Interpret
• Questions on the meaning of the findings, the methodology, the
unexpected
• Results and the limitations and shortcomings of the study should be
answered and interpreted.
c. Integrate
• This is an attempt to put the pieces together.
• Often, the results of a study are disparate and do not seem to “hang
together”. In the discussion, attempt to bring the findings together to
extract meaning and principles.
d. Theorize
When the study includes a number of related findings, it occasionally becomes
possible to theorize.
• Integrate your findings into a principle;
• Integrate a theory into your findings; and
• Use these findings to formulate and original theory
e. Recommend or apply alternatives
REVIEW QUESTIONS FOR DISCUSSIONS
Direction: Answer the following questions:
1. What are the advantages and disadvantages of using survey instruments?
2. When preparing a questionnaire what types of questions to be avoided.
3. When are tables effective in presenting data? Specify.
4. What is a good table design? Discuss briefly.
5. What are the different types of graphs for presenting data? Describe each in detail.
6. What are the most commonly used descriptive and inferential statistics in the analysis
of data? Discuss briefly.
7. What are the different experimental designs used to analyze the results? Discuss
briefly.
8. How do we interpret the result(s) of our study?