5.basic Statistics
5.basic Statistics
5.basic Statistics
• The median is the middle value, so I'll have to rewrite the list in order:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th
13, 13, 13, 13, 14, 14, 16, 18, 21, So the median is 14.
• The mode is the number that is repeated more often than any other, so 13 is the mode.
• The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
• mean: 15
median: 14
mode: 13
range: 8
• Beside a center of data, we also want to know how
widely it is dispersed or what range of data values is.
• Standard deviation (SD) shows how much variation
or dispersion exists from the average (mean), or
expected value.
• A low standard deviation indicates that the data points
tend to be very close to the mean; high standard
deviation indicates that the data points are spread out
over a large range of values.
Formula :
Standard Deviation :
Variance : Variance = s2
Correlation and regression
• Another purpose of statistical analysis is
measuring relationships, or trying to establish
if the value of one variable is related of
connected to a second variable.
• Correlation coefficient is a one way of
measuring the relationship between paired
• It tries to show that one variable is correlated
to the second one.
Simple linear regresion
• Example:
• Time studying of student with the final score.
• To find the answer, the amount of time that each
students spent studying would be paired with the
student’s score.
• Higher values of time----------higher test score (
meaning strong positive correlation)
• Higher values of times----lower test score ( stronng
negative correlation).
• If higher values of time ----some to higher score, and
some low score test ( no correlation)
• One problem with correlation coefficient is that it
only deals with two factors and does not take into
account the influence that other factors may
• A multivariate regressions uses data sets that
have three or more independent attribute values.
• The result of multivariate regression is a formula
that being used to predict the dependent
variable based on one or more independent
Formula of multivariate regression:
𝑌 = 𝐴 + 𝐵𝑋1 + 𝐶𝑋2 + 𝐷𝑋3
Y = the dependent variable that we are trying to
estimate or predict
A = the intercept, which could be thought of as the
scale’s starting point. It represent the lowest value
from which the prediction of the dependent
variable will start from.
X(1...)= represents all of the independent variables
that we are using to describe, estimate, or predict
the dependent variable Y.
• Formula will results in a linear
• Most relationships are not perfectly
• If we are create a plot of all points used
in the formula, they will not line up.
• The difference between the line and each
point is the error.
Test for significance
• Significance is a very important concept because
what may seen like difference to us, may not be a
statistically significant difference.
• An example:
• Two set of yield data values; one yield data from
a no-till field (156 bushels) and conventional
tilled field (166 bushels).
• It is difference? We need to find out if it statically
• A test can be used if there is a statistical
difference. E.g using T-test.
• Research makes significant use of statistics.
• The ability to prove and disprove a hypothesis is
based on the objectivity provided by statistics.
• The objectivity is based on valid data collection,
the use of statistics, and the replication and
control of independent variables.
• Data should be take in unbiased manner and
sample should be take accurately.
• A research projects done once, without
replication, has little validity.