Quantitative Skills 2 Data Analysis
Quantitative Skills 2 Data Analysis
Data Analysis
Data analysis is one of the first steps
toward determining whether an
observed pattern has validity. Data
analysis also helps distinguish among
multiple working hypotheses.
Descriptive statistics serves to
summarize the data. It helps show the
variation in the data, standard errors,
best-fit functions, and confidence that
sufficient data have been collected.
Inferential statistics involves inferring
parameters in the natural population
from a sample.
Most of the data you will collect will fit
into two categories: measurements or
counts.
• median
• mode
• quartiles
• box-and-whisker plots
The median is the value separating the
higher half of a data sample from the
lower half. To find the median of a data
set, first arrange the data in order from
lowest to highest value and then select
the value in the middle.
5, 1, 3, 7, 2 1, 2, 3, 5, 7
median
If there are two values in the middle of
an ordered data set, the median is
found by averaging those two values.
5, 1, 3, 7, 4, 2 1, 2, 3, 4, 5, 7
3.5
median
The mode is the value that appears
most frequently in a data set.
3, 5, 1, 3, 7, 2
3 is the mode in this example
because it appears more
frequently than any other
number.
A bimodal distribution
Data Analysis Flowchart:
Type of Data
Nonparametric
Parametric
(not a normal
(normal distribution)
distribution)
Mean,
Median, mode,
standard deviation,
standard error quartiles
Example of Data Analysis:
Do shady English ivy
leaves have a larger
surface area than sunny
English ivy leaves?
Since the data collected is in centimeters,
it is measurement data, not count data.
So the first step is to make a:
HISTOGRAM
Does the data resemble a normal
curve?
(No.)
A more rigorous statistical test will need to
be performed, but because the error bars
do not overlap there is a high probability
that the two populations are indeed
different from each other.
Example of Data Analysis:
Is 98.6°F actually the average body
temperature for humans?
Since the data collected is in Farenheit,
it is measurement data, not count
data. So the first step is to make a:
HISTOGRAM
Does the data resemble a normal
curve?
(Close Enough)
Next, the appropriate statistical tools are
applied:
*
Note that by convention, descriptive statistics rounds
the calculated results to the same number of decimal
places as the number of data points plus 1.
According to the 68–95–99.7 Rule, 68%
of all samples lie within one standard
deviation from the mean. This means
that around 68% of the temperatures
should be between 97.51 and 98.99.
Including the standard error, we can
say with a 68% confidence that the
mean human body temperature of our
sample is 98.25 ± 0.06°F.