2 - Data Analysis
2 - Data Analysis
• Data
• Measures of Variability
Data
sheffield.ac.uk
Obtaining Data:
Techniques and Methods
• Direct Observations
• Experiments
• Surveys
Obtaining Data:
Techniques and Methods
Sampling
process in which a predetermined
number of observations are taken from a
larger population
psu.edu
Obtaining Data:
Techniques and Methods
Sampling Methods
Sampling Methods
Sampling Methods
Sampling Methods
Cluster Sampling
• simple random sample of groups
Obtaining Data:
Techniques and Methods
colby.edu
Obtaining Data:
Techniques and Methods
Relative Frequency - the proportion of all given values that fall within the
interval. Usually express in percent.
Cumulative frequency - is the sum of the frequency for that class and all the
previous classes
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Round up to 8
Graphical Organization & Summarization of Data
3. The minimum data entry of 18 may be used for the lower limit of the first class. To
find the lower class limits of the remaining classes, add the width (8) to each lower
limit.
The lower class limits are 18, 26, 34, 42, and 50
The upper class limits are 25, 33, 41, 49, and 57
4. Make a tally mark for each data entry in the appropriate class.
5. The number of tally marks for a class is the frequency for that class.
Graphical Organization & Summarization of Data
Bar Graph
• a graphical representation of a frequency table for qualitative data.
• On one axis of the graph frequencies of the relative frequencies are represented.
• The various classes of data are labeled on the other axis.
Histogram
• a graphical representation of a frequency table; it displays quantitative data.
• The class intervals are marked off on the horizontal axis; frequencies or relative
frequencies are marked off on the vertical axis.
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Frequency Polygon
• the geometric shape obtained by connecting with a straight line the midpoints of
adjacent class intervals of a histogram.
• The relevance of presentation of data in the pictorial or graphical form is
immense.
• Frequency polygons give an idea about the shape of the data and the trends that a
particular data set follows.
• This can be very useful in comparing different sets of data by superimposing one
on the other.
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Line graph
Scatter Plot
• a graphic display of data points in a two-dimensional plane.
• Each data point represents a single unit of observation on which two
measurements, X and Y, have been made.
• The values of each of the measurements are scaled on the X and Y axes,
respectively.
• Each data point is located in the plane at the intersection of its associated X and
Y values.
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Measures of Location (Mean, Median and Mode)
Outlier - is a value that is very different from the other data in your data set.
This can skew your results.
Measures of Variability
1, 1, 2, 2, 2, 3, 3, 4, 5, 5
Measures of Variability
Measures of Variability
1, 1, 2, 2, 2, 3, 3, 4, 5, 5
Measures of Variability
Range - measures the total spread of a set of data and is computed from only
two numbers.
1, 1, 2, 2, 2, 3, 3, 4, 5, 5
Measures of Variability
Variance
• Variance is a numerical value that describes the variability of observations from its
arithmetic mean.
• Variance measures how far the outcome varies from the mean
• The variance equals the average of the sum of all the squared deviations of the
population.
• A deviation is the distance from any single measurement of a set to the mean of that set.
• It indicates how far are the individuals or the observations in a group that are spread
out.
Measures of Variability
Variance
• Statisticians use variance to see how individual numbers relate to each other
within a data set, rather than using broader mathematical techniques such as
arranging numbers into quartiles.
• The advantage of variance is that it treats all deviations from the mean as the
same regardless of their direction.
Measures of Variability
Population Variance
Sample Variance
Measures of Variability
Standard Deviation, Sd
• the square root of the variance.
• This measurement is very useful for describing the spread or dispersion of
a set of data around the mean.
• Measures how far the normal standard deviation is from the expected
value.
• Indicates how much observations or the individuals of a data set which
differs from the mean.
Measures of Variability
Height of dogs
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Measures of Variability
Variance
Measures of Variability
Standard Deviation
Measures of Variability
Now we can show which heights are within one Standard Deviation (147mm) of the
Mean:
Measures of Variability
• So, using the Standard Deviation we have a "standard" way of knowing what is
normal, and what is extra large or extra small.
• Rottweilers are tall dogs. And Dachshunds are a bit short, right?
Measures of Variability
• this indicates the degree of precision with which the treatments are
compared and is good index of the reliability of the experiment.
• It expresses the experimental error as percentage of the mean, thus the
higher the CV values, the lower is the reliability of the experiment.
Measures of Variability
• Basically CV<10 is very good, 10-20 is good, 20-30 is acceptable, and CV>30 is not
acceptable.
• For field experiments CV of 30% is tolerable and for laboratory/ clinical experiments
5% is the limit.
CV of population
CV of sample
Next Meeting…
Probability