P299 Module 8 Notes
P299 Module 8 Notes
1. Continuous – the data can take any Empirical Rule for Any Normal Distribution
value within a specified range
About 68% of the data will fall within
2. Discrete - the data can take only certain
one standard deviation of the mean.
values
About 95% of the data will fall within
Normal Distribution two standard deviations of the mean.
About 99% of the data will fall within
a reasonable description of how many
three standard deviations of the mean.
continuous variables are distributed in
reality, from industrial process variation Z-Score
to intelligence test scores
The process of making such
comparisons is facilitated by converting
raw scores (scores in their natural 4. There is a fixed number of trials,
metric, for instance, weight measured denoted as n.
in pounds or kilograms) into Z-scores,
Formula for Binomial Distribution
which express the value of the score in
terms of units of the standard
deviation.
sometimes referred to as normalized
scores
facilitate comparison of scores from
populations with different means and
standard deviations. Variables
distance of a data point from the mean, 1. Independent - presumed to influence
expressed in units of standard the value of the dependent variable
deviation. 2. Dependent - represent an outcome of
the study
3. Control - might influence the
dependent variable but are not the
main focus of interest
3. Mode
refers to the most frequently
occurring value.
most often useful in describing
ordinal or categorical data.
5. Coefficient of Variation
Dispersion - refers to how variable or spread
out data values are. a measure of relative variability
that makes it possible to
Measures of Dispersion compare variability across
variables measured in different
1. Range
units
the difference between the
highest and lowest values.
If there are one or a few
outliers in the data set, the
Outliers
range might not be a useful
summary measure. a data point or observation whose value
is quite different from the others in the data
2. Interquantile Range set being analyzed.
alternative measure of a data point that seems to come from a
dispersion that is less different population or is outside the typical
influenced than the range by pattern of the other data points
extreme values
Graphic Methods
the range of the middle 50% of
the values in a data set, which is 1. Frequency Tables
calculated as the difference when the actual values of the
between the 75th and 25th numbers in different categories,
percentile values. rather than the general pattern
among the categories, are of
3. Variance primary interest.
average of the squared an efficient way to present large
deviations from the mean quantities of data and represent
a middle ground between text
(paragraphs describing the data and the least common the
values) and pure graphics (such furthest to the right), and a
as a histogram). cumulative frequency line is
Absolute Frequency - raw superimposed over the bars
numbers or counts for each
category 5. Stem and Leaf Plot
Relative Frequency - displays divide your data into intervals
the percent of the total (using your common sense and
represented by each category the level of detail appropriate
Cumulative Frequency - shows to your purpose) and display
the relative frequency for each each data point by using two
category and those below it columns.
The stem is the leftmost column
2. Bar Chart and contains one value per row,
particularly appropriate for and the leaf is the rightmost
displaying discrete data with column and contains one digit
only a few categories for each case belonging to that
row.
3. Pie Chart plot that displays the actual
shows graphically what values of the data set but also
proportion each part occupies assumes a shape indicating
of the whole which ranges of values are most
most useful when there are common.
only a few categories of not only tells us the actual
information and the differences values of the scores and their
among those categories are range but the basic shape of
fairly large their distribution as well.
8. Bivariate Charts
Charts that display information
about the relationship between
two variables
Scatterplot - define each point
in a data set by two values,
commonly referred to as x and
y, and plot each point on a pair
of axes