Summaries Chapter 1-6
Summaries Chapter 1-6
Summaries Chapter 1-6
Chapter 1
1. The term statistics is used to refer to methods for organizing, summarizing, and interpreting data.
2. Scientific questions usually concern a population, which is the entire set of individuals one wishes
to study. Usually, populations are so large that it is impossible to examine every individual, so most re- search is
conducted with samples. A sample is a group selected from a population, usually for purposes of a research study.
3. A characteristic that describes a sample is called a statistic, and a characteristic that describes a population is called a
parameter. Although sample statistics are usually representative of corresponding population parameters, there is typically
some discrepancy between a statistic and a parameter. The naturally occurring difference between a statistic and a
parameter is called sampling error.
4. Statistical methods can be classified into two broad categories: descriptive statistics, which organize and summarize data,
and inferential statistics, which use sample data to draw inferences about populations.
5. A construct is a variable that cannot be directly observed. An operational definition defines a construct in terms of
external behaviours that are representative of the construct.
6. A discrete variable consists of indivisible categories, often whole numbers that vary in countable steps.
A continuous variable consists of categories that are infinitely divisible, with each score corresponding to an interval on
the scale. The boundaries that separate intervals are called real limits and are located exactly halfway between adjacent
scores.
7. A measurement scale consists of a set of categories that are used to classify individuals. A nominal scale consists of
categories that differ only in name and are not differentiated in terms of magnitude or direction. In an ordinal scale, the
categories are differentiated in terms of direction, forming an ordered series. An interval scale consists of an ordered
series of categories that are all equal-sized intervals. With an interval scale, it is possible to differentiate direction and
distance between categories. Finally, a ratio scale is an interval scale for which the zero point indicates none of the
variable being measured. With a ratio scale, ratios of measurements reflect ratios of magnitude.
8. The correlational method examines relationships between variables by measuring two different variables for each
individual. This method allows researchers to measure and describe relationships, but cannot produce a cause-and-effect
explanation for the relationship.
9. The experimental method examines relationships between variables by manipulating an independent variable to create
different treatment conditions and then measuring a dependent variable to obtain a group of scores in each condition.
The groups of scores are then compared. A systematic difference between groups provides evidence that changing the
independent variable from one condition to another also caused a change in the dependent variable. All other variables
are controlled to prevent them from influencing the relationship. The intent of the experimental method is to demonstrate
a cause-and-effect relationship between variables.
10. Nonexperimental studies also examine relationships between variables by comparing groups of scores, but they do not
have the rigor of true experiments and cannot produce cause-and-effect explanations. Instead of manipulating a variable
to create different groups, a nonexperimental study uses a pre-existing participant characteristic (such as older/younger)
or the passage of time (before/after) to create the groups being compared.
11. In an experiment, the independent variable is manipulated by the researcher and the dependent variable
is the one that is observed to assess the effect of the treatment. The variable that is used to create the groups in a
nonexperimental is a quasi-independent variable.
12. The letter X is used to represent scores for a variable. If a second variable is used, Y represents its scores. The letter N is
used as the symbol for the number of scores in a population; n is the symbol for a number of scores in a sample.
13. The Greek letter sigma (o) is used to stand for summation. Therefore, the expression åX is read “the sum of the scores.”
Summation is a mathematical operation (like addition or multiplication) and must be performed in its proper place in the
order of operations; summation occurs after parentheses, exponents, and multiplying/dividing have been completed.
SUMMATION NOTATION
A set of scores consists of the following values:
Compute åX To compute åX, we simply add all of the scores in the group.
åX 5 7 1 3 1 9 1 5 1 4 5 28
Compute (åX )2 The first step, inside the parentheses, is to compute åX. The second step is to square the
value for åX.
åX = 28 and (åX)2 = (28)2 = 784
Compute åX2 The first step is to square each score. The second step is to add the squared scores. The
computational table shows the scores and squared scores. To compute åX2 we add the
values in the X2 column.
åX2 = 49 + 9 + 81 + 25 + 16 = 180
Compute åX + 5 The first step is to compute åX. The second step is to add 5 points to the total.
åX = 28 and åX + 5 = 28 + 5 = 33
Chapter 2
1. The goal of descriptive statistics is to simplify the organization and presentation of data. One descriptive technique is
to place the data in a frequency distribution table or graph that shows exactly how many individuals (or scores) are
located in each category on the scale of measurement.
2. A frequency distribution table lists the categories that make up the scale of measurement (the X values) in one
column. Beside each X value, in a second column, is the frequency or number of individuals in that category. The table
may include a proportion column showing the relative frequency for each category:
3. The table may include a percentage column showing the percentage associated with each X value:
3. The cumulative percentage is the percentage of individuals with scores at or below a particular point in the
distribution. The cumulative percentage values are associated with the upper real limits of the corresponding scores
or intervals.
4. Percentiles and percentile ranks are used to describe the position of individual scores within a distribution. Percentile
rank gives the cumulative percentage associated with a particular score. A score that is identified by its rank is called a
percentile.
5. It is recommended that a frequency distribution table have a maximum of 10–15 rows to keep it simple. If the scores
cover a range that is wider than this suggested maximum, it is customary to divide the range into sections called class
intervals. These intervals are then listed in the frequency distribution table along with the frequency or number of
individuals with scores in each interval. The result is called a grouped frequency distribution. The guidelines for
constructing a grouped frequency distribution table are as follows:
a. There should be about 10 intervals.
b. The width of each interval should be a simple number (e.g., 2, 5, or 10).
c. The bottom score in each interval should be a multiple of the width.
d. All intervals should be the same width, and they should cover the range of scores with no gaps.
6. A frequency distribution graph lists scores on the horizontal axis and frequencies on the vertical axis. The type of
graph used to display a distribution depends on the scale of measurement used. For interval or ratio scales, you
should use a histogram or a polygon. For a histogram, a bar is drawn above each score so that the height of the bar
corresponds to the frequency. Each bar extends to the real limits of the score, so that adjacent bars touch. For a
polygon, a dot is placed above the midpoint of each score or class interval so that the height of the dot corresponds
to the frequency; then lines are drawn to connect the dots. Bar graphs are used with nominal or ordinal scales. Bar
graphs are similar to histograms except that gaps are left between adjacent bars.
7. Shape is one of the basic characteristics used to describe a distribution of scores. Most distributions can be classified
as either symmetrical or skewed. A skewed distribution that tails off to the right is positively skewed. If it tails off to
the left, it is negatively skewed.
8. A stem and leaf display is an alternative procedure
for organizing data. Each score is separated into a stem (the first digit or digits) and a leaf (the last digit). The display
consists of the stems listed in a column with the leaf for each score written beside its stem. A stem and leaf display is
similar to a grouped frequency distribution table; however, the stem and leaf display identifies the exact value of each
score and the grouped frequency distribution does not.
Chapter 3
1. The purpose of central tendency is to determine the single value that identifies the centre of the distribution and best
represents the entire set of scores. The three standard measures of central tendency are the mean, the median, and
the mode.
2. The mean is the arithmetic average. It is computed by adding all the scores and then dividing by the number of
scores. Conceptually, the mean is obtained by dividing the total (SX) equally among the number of individuals (N or n).
The mean can also be defined as the balance point for the distribution. The distances above the mean are exactly
balanced by the distances below the mean. Although the calculation for a population mean is the same as the
calculation for a sample mean, a population mean is identified by the symbol m, and a sample mean is identified by M.
In most situations with numerical scores from an interval or a ratio scale, the mean is the preferred measure of central
tendency.
3. Changing any score in the distribution causes the mean to be changed. When a constant value is added to (or
subtracted from) every score in a distribution, the same constant value is added to (or subtracted from) the mean. If
every score is multiplied by a constant, the mean is multiplied by the same constant.
4. The median is the midpoint of a distribution of scores. The median is the preferred measure of central tendency when
a distribution has a few extreme scores that displace the value of the mean. The median also is used when there are
undetermined (infinite) scores that make it impossible to compute a mean. Finally, the median is the preferred
measure of central tendency for data from an ordinal scale.
5. The mode is the most frequently occurring score in a distribution. It is easily located by finding the peak in a
frequency distribution graph. For data measured on a nominal scale, the mode is the appropriate measure of central
tendency. It is possible for a distribution to have more than one mode.
6. For symmetrical distributions, the mean will equal the median. If there is only one mode, then it will have the same
value, too.
7. For skewed distributions, the mode is located toward the side where the scores pile up, and the mean is pulled
toward the extreme scores in the tail. The median is usually located between these two values.
Chapter 4
1. The purpose of variability is to measure and describe the degree to which the scores in a distribution are spread out or
clustered together. There are four basic measures of variability: the range, interquartile range, the variance, and the
standard deviation.
The range is the distance covered by the set of scores, from the smallest score to the largest score. The range is
completely determined by the two extreme scores and is considered to be a relatively crude measure of variability.
The interquartile range is more descriptive than the basic range because it trims the extreme scores and provides a range
that reflects the middle 50% of scores in the centre of the distribution.
Standard deviation and variance are the most commonly used measures of variability. Both of these measures are based
on the idea that each score can be described in terms of its deviation or distance from the mean. The variance is the mean
of the squared deviations. The standard deviation is the square root of the variance and provides a measure of the
standard distance from the mean.
2. To calculate variance or standard deviation, you first need to find the sum of the squared deviations, SS. Except for minor
changes in notation, the calculation of SS is identical for samples and populations. There are two methods for calculating
SS:
I. By definition, you can find SS using the following steps:
a. Find the deviation (X 2 m) for each score.
b. Square each deviation.
c. Add the squared deviations.
This process can be summarized in a formula as follows:
Definitional formula:
II. The sum of the squared deviations can also be found using a computational formula, which is especially useful
when the mean is not a whole number:
Computational formula:
3. Variance is the mean squared deviation and is obtained by finding the sum of the squared deviations and then dividing by
the number of scores. For a population, variance is
For a sample, only n 2 1 of the scores are free to vary (degrees of freedom or df 5 n 2 1), so sample variance is
Using n 2 1 in the sample formula makes the sample variance an accurate and unbiased estimate of the population
variance.
4. Standard deviation is the square root of the variance. For a population, this is
5. Adding a constant value to every score in a distribution does not change the standard deviation. Multi- plying every score
by a constant, however, causes the standard deviation to be multiplied by the same constant.
6. Because the mean identifies the centre of a distribution and the standard deviation describes the average distance from
the mean, these two values should allow you to create a reasonably accurate image of the entire distribution. Knowing the
mean and standard deviation should also allow you to describe the relative location of any individual score within the
distribution.
7. Large variance can obscure patterns in the data and, therefore, can create a problem for inferential statistics.
Chapter 5
1. Each X value can be transformed into a z-score that specifies the exact location of X within the distribution. The sign of
the z-score indicates whether the location is above (positive) or below (negative) the mean. The numerical value of the z-
score specifies the number of standard deviations between X and m.
2. The z-score formula is used to transform X values into z-scores. For a population:
For a sample:
3. To transform z-scores back into X values, it usually is easier to use the z-score definition rather than a formula. However,
the z-score formula can be trans- formed into a new equation. For a population:
For a sample:
4. When an entire distribution of scores (either a population or a sample) is transformed into z-scores, the result is a
distribution of z-scores. The z-score distribution will have the same shape as the distribution of raw scores, and it always
will have a mean of 0 and a standard deviation of 1.
5. When comparing raw scores from different distributions, it is necessary to standardize the distributions with a z-score
transformation. The distributions will then be comparable because they will have the same mean (0) and the same
standard deviation (1). In practice, it is necessary to transform only those raw scores that are being compared.
6. In certain situations, such as psychological testing, a distribution may be standardized by converting the original X values
into z-scores and then converting the z-scores into a new distribution of scores with predetermined values for the mean
and the standard deviation.
7. In inferential statistics, z-scores provide an objective method for determining how well a specific score rep- resents its
population. A z-score near 0 indicates that the score is close to the population mean and therefore is representative. A z-
score beyond 12.00 (or 22.00) indicates that the score is extreme and is noticeably different from the other scores in the
distribution.
Chapter 6
1. The probability of a particular event A is defined as a fraction or proportion:
2. Our definition of probability is accurate only for random samples. There are two requirements that must be satisfied for a
random sample:
a. Every individual in the population has an equal chance of being selected.
b. When more than one individual is being selected, the probabilities must stay constant. This means there must be
sampling with replacement.
3. All probability problems can be restated as proportion problems. The “probability of selecting a king from a deck of cards”
is equivalent to the “proportion of the deck that consists of kings.” For frequency distributions, probability questions can
be answered by determining proportions of area. The “probability of selecting an individual with an IQ greater than 108” is
equivalent to the “proportion of the whole population that consists of IQs greater than 108.”
4. For normal distributions, probabilities (proportions) can be found in the unit normal table. The table provides a listing of
the proportions of a normal distribution that correspond to each z-score value. With the table, it is possible to move
between X values and probabilities using a two-step procedure:
a. The z-score formula (Chapter 5) allows you to transform X to z or to change z back to X.
b. The unit normal table allows you to look up the prob- ability (proportion) corresponding to each z-score or the z-
score corresponding to each probability.
5. Percentiles and percentile ranks measure the relative standing of a score within a distribution. Percentile rank is the
percentage of individuals with scores at or below a particular X value. A percentile is an X value that is identified by its
rank. The percentile rank always corresponds to the proportion to the left of the score in question.