5.basic Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

BASIC STATISTICS

Importance of Mathematics in Agriculture

• An understanding of statistics and the


fundamental of algebra is needed.
• Correlation, regression and predictive
modeling are needed to analyze and interpret
spatial data.
Independent and dependent variable
• Variable refers to factors or events that can
have different values or, in other words, they
vary.
• Ex; soil PH range from 5 to 8. This makes soil
pH variable
• A variable, if its actual value is unknown, is
represented by a letter such as X.
• Dependent variable is an event or factor that is
effected by, or is dependent on, other factors.
• An independent variable is a factor that does not
depend on other variable for its value.
• Crop yield is dependent upon many variables,
such as soil PH and soil moisture.
• Soil PH and soil moisture is independent variable
because yield does not change pH, although PH
may change yield.
Discrete vs. Continuous Data
• Discrete ; Variables that can only take on a finite
number of values are called "discrete variables.
People and machinery can be grouped as discrete
variable, or temperature rounded to the nearest
degree. 45.6◦C. can be rounded to 45 ○C.
Continues variable are those things that can have
an infinitive number of values between two whole
numbers.
Discrete and continuous data
Types of data
1. Nominal data
2. Ordinal data
3. Interval data
4. Ratio
1. Nominal Data
-Nominal data are those data that don’t have
numerical value.
- They may be colors, shapes, brands, or even
the number don’t have value and might as
well be words or letters.
- Example nominal value and is used as an
identifier without ranking or size.
- A person’s social security number also.
2. Ordinal Data
• Ordinal data are those data that
accommodate infinite sequences/ranks and to
classify sets with certain kinds
of order structures on them.

Example : Rank 1th…..100th…..


3. Interval data
• Interval data are those data not only provide a
ranked order, but also a specific scale of
measurement.
• Temperature is an example.
• Fahrenheit scale is interval data because it
provide an order as well as a consistent scale.
4. Ratio data
• Ratio are real values that can be compared to
each other and are not limited to scale.
• Ex: level of phosphorous in the soil is 44 ppm
or twice as other soil that has 22 ppm.
• 44 ⁰C is not twice 22 ⁰C.
• Sport team has rating 100th is not twice rating
of the team has rating 50th.
SAMPLING
• Data cannot be collected on every event or
object within a study area.
• The next best thing is to select or sample
some of the events or object to represent all
of them.
• This is called sampling.
• Statistical sampling requires
1. A large number of samples to be
representative of the population
2. Random sampling where each area has an
equal chance of being selected so the sample
is unbiased
3. Sample from the population or a
homogenous area to which the results will be
applied.
• Sampling techniques:
1. Grid tessalation
- It is typically used to identify a systematic
pattern for determining regular or irregular
sampling points, with a sample taken from each
grid cell.
2. Grid pointing sampling
The process is the same with grid tessalation,
but instead of assuming that entire cell has the
same nutrient value, the nutrient value is
applied is applied to the point at which the
sample was taken.
Number of samples
• The size of the grid cell can range from 1 to 10
acres depending on the variability and size of
the field.
• The greater the variability, the more samples
need to be taken and the smaller the grid cell
needs to be.
Unbiased samples
• Random sampling assures an unbiased sample.
• 3 method for assuring unbiased samples
1. Centre method
Takes a sample in the center of the grid cell.
2. Offset method
Creates a diamond pattern by taking the sample a certain distance
offset from center.
3. Technique of collection sample.
Standard procedures calls for taking at least 10 samples from various
location within a radius of 10 feet of the sampling point to create one
composite sample
Frequency tables and Graphs
• Frequency is the number of times a value
occurs.
• Frequency tables and graphs can help a
people visualize the data.
Statistical techniques
• Frequency graph provide a method for
visualizing data.
• Statistics are used to describe numerically
what the frequency graph or curve look like.
• We can determine the center of the curve
(central tendency) and the width of it
(dispersion).
• This is refereed to as descriptive statitics.
• Inferential statistics are used when estimating
data or making an inference about differences
between data sets.
• The frequency graph can be used to visualize
this comparisons, statistics can be used to
quantify the values.
Central tendency
• Mean, median and mode provide basic
information regarding central tendencies.
• Mean is the average of all the numbers.
• Median" is the "middle" value in the list of
numbers.
• Mode is the number that is repeated more
often than any other.
• Range is just the difference between the
largest and smallest values.
• Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13

• The mean is the usual average, so:


(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15

• The median is the middle value, so I'll have to rewrite the list in order:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th
number:
13, 13, 13, 13, 14, 14, 16, 18, 21, So the median is 14.

• The mode is the number that is repeated more often than any other, so 13 is the mode.

• The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.

Answer:

• mean: 15
median: 14
mode: 13
range: 8
Dispersion
• Beside a center of data, we also want to know how
widely it is dispersed or what range of data values is.
• Standard deviation (SD) shows how much variation
or dispersion exists from the average (mean), or
expected value.
• A low standard deviation indicates that the data points
tend to be very close to the mean; high standard
deviation indicates that the data points are spread out
over a large range of values.
Formula :

Mean : Mean = Sum of X values / N(Number of values)

Standard Deviation :

Population Standard Deviation :

Variance : Variance = s2
Correlation and regression
• Another purpose of statistical analysis is
measuring relationships, or trying to establish
if the value of one variable is related of
connected to a second variable.
• Correlation coefficient is a one way of
measuring the relationship between paired
variable.
• It tries to show that one variable is correlated
to the second one.
Simple linear regresion
• Example:
• Time studying of student with the final score.
• To find the answer, the amount of time that each
students spent studying would be paired with the
student’s score.
• Higher values of time----------higher test score (
meaning strong positive correlation)
• Higher values of times----lower test score ( stronng
negative correlation).
• If higher values of time ----some to higher score, and
some low score test ( no correlation)
• One problem with correlation coefficient is that it
only deals with two factors and does not take into
account the influence that other factors may
have.
• A multivariate regressions uses data sets that
have three or more independent attribute values.
• The result of multivariate regression is a formula
that being used to predict the dependent
variable based on one or more independent
variables.
Formula of multivariate regression:
𝑌 = 𝐴 + 𝐵𝑋1 + 𝐶𝑋2 + 𝐷𝑋3
Y = the dependent variable that we are trying to
estimate or predict
A = the intercept, which could be thought of as the
scale’s starting point. It represent the lowest value
from which the prediction of the dependent
variable will start from.
X(1...)= represents all of the independent variables
that we are using to describe, estimate, or predict
the dependent variable Y.
• Formula will results in a linear
relationships.
• Most relationships are not perfectly
linear.
• If we are create a plot of all points used
in the formula, they will not line up.
• The difference between the line and each
point is the error.
Test for significance
• Significance is a very important concept because
what may seen like difference to us, may not be a
statistically significant difference.
• An example:
• Two set of yield data values; one yield data from
a no-till field (156 bushels) and conventional
tilled field (166 bushels).
• It is difference? We need to find out if it statically
different.
• A test can be used if there is a statistical
difference. E.g using T-test.
Research
• Research makes significant use of statistics.
• The ability to prove and disprove a hypothesis is
based on the objectivity provided by statistics.
• The objectivity is based on valid data collection,
the use of statistics, and the replication and
control of independent variables.
• Data should be take in unbiased manner and
sample should be take accurately.
• A research projects done once, without
replication, has little validity.
THANK YOU

You might also like