0% found this document useful (0 votes)
30 views10 pages

Stat 102 Module 2

Uploaded by

jerichotrio525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views10 pages

Stat 102 Module 2

Uploaded by

jerichotrio525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Stat 102 Module 2: Organizing Data

Concept Code
Introductory concepts Concept Description
Chapter outline - Variables and data
- Organizing qualitative data
- Organizing quantitative data
- Distribution shapes
- Misleading graphs
Variables and data Concept Description
variable - Characteristic that varies from one person or thing to
another
Qualitative variable - Non-numerically valued variable
Quantitative variable - Numerically valued variable
Discrete variable - Quantitative variable whose possible values can e listed
Continuous variable - Quantitative variable whose possible values form some
interval of numbers
Summary of variables

-
Data - Values are unconnected subset of real numbers
Qualitative data - Values of a qualitative variable
Quantitative data - Values of a quantitative variable
Discrete data - Values of a discrete variable
Continuous data - Values of a continuous variable
Observation - Individual piece of data
Data set - Collection of all observations for a particular variable

Frequency - Involves the listing of distinct values and their


distribution of a frequencies
qualitative data - Steps in constructing a qualitative data
 List the distinct values of the observations in the
data set in the first column of a table
 For each observation place a tally mark in the
second column of the table in the row of the
appropriate distinct value
 Count the tallies for each distinct value and
record the totals in the third column of the table
Relative-frequency - Listing of distinct values and their relative frequencies
distribution of a - Steps in constructing a relative-frequency
qualitative data distribution of qualitative data
 obtain a frequency distribution of the data
 divide each frequency by the total number of
observations
Pie chart - disk divided into wedge-shaped pieces proportional to
the relative frequencies of the qualitative data
- Steps in constructing a pie-chart
 Obtain a relative frequency distribution of the
data
 Divide a disk into wedge-shaped pieces
proportional to the relative frequencies
 Label the slices with the distinct values and their
relative frequencies
Bar chart - Displays distinct values of the qualitative data on a
horizontal axis and the relative frequencies of those
values on a vertical axis
- Represented by a vertical bar whose height is equal to
the relative frequency of that value
- Steps in constructing a bar chart
 Obtain a relative-frequency distribution of data
 Draw a horizontal axis on which to place the bars
and a vertical axis on which to display the relative
frequencies
 For each distinct value construct a vertical bar
whose height equals the relative frequency of
that value
 Label the bars with the distinct values, the
horizontal axis with the name of the variable, and
the vertical axis with relative frequency
First quartile Q 1 - Median of the lower half of the data
Third quartile Q 2 - Median of the upper half of the data
Five number - A data set is the set of five numbers with the indicated
summary minimum, Q 1, median, Q 3, maximum
range - Difference between the maximum and minimum values
Interquartile range - Difference between the third and first quartiles
Box plot (box-and- - Method for graphically displaying the five-number
whisker plot) summary
- Plot consists of a rectangle whose left- and right- hand
sides corresponds to the first and third quartiles
Simple random sampling Concept Description
Census - Obtaining information for the entire population of
interest
Take aways - You can often avoid the effort and expense of a study if
someone else has already done that study and published
the results
- Literature search is important before conducting a study
- One might use an information collection agency that
specializes in finding on specific topics
- If sampling is appropriate, you must decide how to select
the sample; you must choose the method for obtaining a
sample from the population
- Simple random sampling corresponds to our intuitive
notion of random selection by a lot
Representative - Population that reflect as closely as possible the relevant
sample characteristics of the population under consideration
Probability sampling - A random device which is used to decide which members
of the population will constitute the sample instead of
leaving such decisions to human judgment
Simple random - Sampling procedure for which each possible sample of a
sampling given size is equally likely to be the one obtained
Simple random - A sample obtained by simple random sampling
sample
Simple random - A member of a population can be selected more than
sampling with once
replacement
Simple random - A member of a population can be selected at most once
sampling without
replacement
Table of random - Table of randomly chosen digits
numbers
Random number - Statistical software packages or graphic calculators to
generators obtain simple random samples
Other sampling designs Concept Description
Systematic random - Takes less effort to implement that simple random
sampling sampling
- Steps for systematic random sampling:
 Divide the population size by the sample size and
round the result down to the nearest whole
number
 Use a random-number table or a similar device to
obtain a number, k, between 1 and m
 Select for the sample those members of the
population that are numbered k, k + m, k + 2m
- Easier to execute than simple random sampling and
usually provides comparable results
- Limitations: presence of some kind of cyclical pattern in
the listing of the members of the population
Cluster sampling - Particularly useful when the members of the population
are widely scattered geographically
- Steps for cluster sampling:
 Divide the population into groups (clusters)
 Obtain a simple random sample of the clusters
 Use all members of the clusters obtained in Step
2 as the sample
- Cluster should mirror the entire population
- Members of a cluster may be more homogenous than the
members of the entire population which can cause
problems
Stratified sampling - Population is first divided into subpopulations (called
strata)
- Sampling is done from each strata
- Proportional allocation: strata are often sample in
proportion to sie
- Steps for stratified sampling:
 Divide the population into strata
 From each stratum obtain a simple random
sample of size proportional to the size of the
stratus; that is the sample size for a stratum
equals the total sample size times the stratum
size divided by the population size
 Use al members obtained in Step 2 as sample
Multistage sampling - Used frequently by pollsters and government agencies
- Combine one or more of simple random sampling,
systematic random sampling and stratified sampling
Organizing quantitative Concept Description
data Steps for organizing - Group observations into classes (categories / bins)
quantitative data - Treat the classes as distinct values of qualitative data
- Once quantitative data are grouped into classes, one can
construct frequency and relative frequency distributions
of the data exactly the same way as we did for
qualitative data
Common methods in - Single-value grouping
organizing - Limit grouping
quantitative data - Cutpoint grouping
Methods in organizing Concept Description
quantitative data (single- Single-value classes - classes in which each class represents a single possible
value grouping) value
Single-value grouping - method of grouping quantitative data via single-value
classes
- use distinct values of observations as the classes, a
method analogous to qualitative data
- suitable for discrete data in which there are only small
number of distinct values
Methods in organizing Concept Description
quantitative data (limit Limit grouping - method of grouping quantitative data using class limits
grouping) where each class consists a range of values
- useful when data are expressed as whole numbers and
there are too many distinct values to employ single-
value grouping
Lower limit - smallest value that could go in a class
Upper limit - largest value that could go in a class
Class width - difference between the lower limit of a class and the
lower limit of the next higher class
Class - average of the two class limits of a class
Methods in organizing Concept Description
quantitative data Class cutpoints - utilized to group quantitative data
(cutpoint grouping) - each class consists of a range of values
Lower class cutpoint - smallest value that could go in a class
Upper class cutpoint - smallest value that could go in the next-higher class
(equivalent to the lower cutpoint of the next-higher
class)
Class width - difference between the cutpoints of a class
Class midpoint - average of the two cutpoints of a class
Data visualization Concept Description
(histogram) Histogram - displays the classes of the quantitative data on a
horizontal axis and the frequencies of those classes on a
vertical axis
- frequency of each class is represented by a vertical bar
whose height is equal to the frequency of that class
- bars should be positioned so that they touch each other

- Steps to construct a histogram:


 obtain a frequency distribution of the data
 draw a horizontal axis on which to plae the bars and a
vertical axis on which to display the frequencies
 for each class construct a vertical bar whose heigh
equals the frequency of that class
 label the bars with the classess (horizontal axis with the
name of the variable, and the vertical axis with the
frequency)
Single-value grouping - use distinct values of observations to label the bars with
histogram each such value centered under its bar
Limited or cutpoint - use lower class limits to label the bars
grouping histogram
Frequency histogram - histogram that uses frequencies on the vertical axis
- vertical scale of a frequency histogram depends on the
number of observations making comparison more
difficult
Relative frequency - histogram that uses relative frequencies or percent on
histogram (percent the vertical axis
histogram) - better than frequency histograms for comparing two
data sets
- same vertical scale is used for all relative-frequency
histograms- a minimum if 0 and a maximum of 1-
making direct comparison easy
Data visualization Concept Description
(Dotplots and Stem-and- dotplot - useful for showing the relative positions of the data in a
Leaf diagrams ) data set or for comparing two o more data sets

- Constructing a dot plot:


 Draw a horizontal axis that displays the possible values
of the quantitative data
 Record each observation by placing a dot over the
appropriate value on the horizontal axis
 Label the horizontal axis with the name of the variable
Stem-and-leaf plot - Method developed in the 1960s by Professor John Turkey
of Princeton University
- Ingenious diagram is easier to construct than either a
frequency distribution or a histogram and generally
displays more information
- Consists of a stem (rightmost digit) and a leaf (rightmost
digit)

- Constructing a stem-and-leaf plot


 Think of each observation as a stem- consisting of all but
the rightmost digit- and a leaf, the rightmost digit
 Write the stems from smallest to largest in a vertical
column to the left of the vertical rule
 Write each leaf to the right of the vertical rule in the row
that contains the appropriate stem
Arrange the leaves in each row in ascending order
Misleading graphs Concept Description
Improper scaling The house on the left represents the number of homes built last
year. Because
the number of homes that will be built this year is double the
number built last
year, the developer makes the house on the right twice as tall
and twice as wide as
the house on the left.
Truncated graphs Causes the bars to be out of proportion and hence create a
misleading impression
Example of truncated
graph

- Because the bar for March is about three-fourths as large


as the bar for January, a quick look at Fig. 2.14(a) might
lead you to conclude that the unemployment rate
dropped by roughly one-fourth between January and
March.

- In reality, however, the unemployment rate dropped by


less than one-thirteenth, from 5.4% to 5.0%.
- the vertical axis, which should start at 0%, starts at 4%
instead. Thus the part of the graph from 0% to 4% has
been cut off, or truncated.
Distribution shapes Concept Description
Distribution of a data Table, graph or formula that provides the values of the
set observation and how they occur
Advantages of using - Need not to worry about minor differences in shape
smooth curves in - Concentrate on overall patterns which allows to classify
distribution shapes most distributions by designating relatively few shapes
Common distribution
shapes

Modality Modality Description


unimodal Has one peak
bimodal Has two peaks
multimodal Has two or three peaks
Symmetry and - When classifying distribution we must be flexible
skewness - Exact symmetry is not required to classify a distribution
as symmetric
Symmetry and skewness Description
Right-skewed distribution Rises to its peak rapidly and
comes back toward the
horizontal axis more slowly—
its right tail is longer than its
left tail
Left-skewed distribution Rises to its peaks slowly and
comes back toward the
horizontal axis more rapidly—
its left tail is longer than its
right tail
J-shaped distributions Special types of right-skewed
and left-skewed distributions
Population and Popultion and sample Description
sample distributions distribution
Population data Values of a variable for the
entire population
Sample data Values of a variable for a
sample of the population
Population distribution Distribution of population
data
Sample distribution Distribution of sample data
- For a simple random sample, the sample distribution
approximates the population distribution (i.e., the
distribution of the variable under consideration). The
larger the sample size, the better the approximation
tends to be.

You might also like