Bustat Reviewer
Bustat Reviewer
Ogive
An ogive (o-jive) is a cumulative frequency
polygon. Construction begins by labeling the
x-axis with the class endpoints and the y-axis
with the frequencies.
Ogives are most useful when the decision
Frequency Distributions maker wants to see running totals. For example,
This is the summary of data presented in if a comptroller is interested in controlling costs,
the form of class intervals and frequencies. an ogive could depict cumulative costs over a
It is constructed according to individual fiscal year.
business researchers’ taste.
Dot Plots
Class Midpoint A relatively simple statistical chart
Class midpoint or the class mark is the that is generally used to display continuous,
value halfway across the class interval. It quantitative data is the dot plot.
can be calculated by the average of two In a dot plot, each data value is plotted
class endpoints. along the horizontal axis and is represented
on the chart by a dot. If multiple data points have
Relative frequency the same values, the dots will stack up vertically
It is the proportion of the total frequency
that is in any given class interval in a Stem-and Leaf Plots
frequency distribution. Another way to organize raw data into
It is calculated as individual class groups
frequency divided by total frequency: besides using a frequency distribution is a
stem-and-leaf plot. This technique is simple
Cumulative frequency and provides a unique view of the data.
is a running total of frequencies through A stem-and-leaf plot is constructed by
the classes of frequency distribution. separating the digits for each number of the
data into two groups, a stem and a leaf.
Quantitative Data Graphs
are plotted along numerical scale while Pie Chart
A pie chart is a circular depiction of data
Qualitative Data Graphs where the area of the whole pie represents
are plotted using non-numerical categories. 100% of the data and slices of the pie
represent a percentage breakdown of the
sub levels. Pie charts show the relative
magnitudes of the parts to the whole.
Bar graph numbers.
Applicable for ordinal, interval, and ratio
A bar graph or chart contains two or more data
categories along one axis and a series of Not applicable for nominal data
bars, one for each category, along the other Unaffected by extremely large and
axis. Typically, the length of the bar extremely small values.
represents the magnitude of the measure
(amount, frequency, money, percentage, Arithmetic Mean
etc.) for each category. Commonly called ‘the mean’ is the average
of a group of numbers
Pareto Graphs Applicable for interval and ratio data
Pareto analysis is a quantitative tallying of Not applicable for nominal or ordinal data
the number and types of defects that occur Affected by each value in the data set,
with a product or service. Analysts use this including extreme values
tally to produce a vertical bar chart that Computed by summing all values in the
displays the most common types of defects, data set and dividing the sum by the
ranked in order of occurrence from left to number of values in the data set
right. The bar chart is called a Pareto chart.
Percentiles
Cross Tabulation Measures of central tendency that divide a
It is a process for producing a two- group of data into 100 parts
dimensional table that displays the At least n% of the data lie below the nth
frequency counts for two variables percentile, and at most (100 - n)% of the
simultaneously. It is also referred to as data lie above the nth percentile
contingency table or pivot table Example: 90th percentile indicates that at least
90% of the data lie below it, and at most 10% of
Scatter Plot the data lie above it
A scatter plot is a two-dimensional graph The median and the 50th percentile have the
plot of pairs of points from two numerical same value.
variables. Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Module 3: Descriptive Statistics
Mode Quartiles
The most frequently occurring value in a Measures of central tendency that divide a
data set group of data into four subgroups
Applicable to all levels of data Q1: 25% of the data set is below the first quartile
measurement (nominal, ordinal, interval, Q2: 50% of the data set is below the second
and ratio) quartile
Bimodal Q3: 75% of the data set is below the third
Data sets that have two modes quartile
Q1 is equal to the 25th percentile
Multimodal Q2 is located at 50th percentile and equals the
Data sets that contain more than median
two modes Q3 is equal to the 75th percentile
Quartile values are not necessarily
Median members of the data set
Middle value in an ordered array of
Measures of variability
describe the spread or the dispersion of a
set of data.
Range
The difference between the largest and
the smallest values in a set of data
Subjective Probability
It comes from a person’s intuition or
reasoning
Different individuals may correctly or
incorrectly assign different numeric probabilities
to the same event
Degree of belief in the results of the event
Module 4: Probability
Useful for unique or single trial experiments
Probability
occurrences are assigned to the inferential Structure of Probability
process under conditions of uncertainty Experiment- is the process that produces an
outcome
Methods of Assigning Probabilities Event- an outcome of an experiment
Classical method of assigning probability Elementary event- events that cannot be
(Rules and laws) decomposed or broken down into other events
Relative frequency of occurrence Sample Space- a complete roster/listing of all
(Cumulated historical data) elementary events for an experiment
Subjective probability (personal intuition Trial-one repetition of the process
or reasoning) Set Notation is the use of braces to group
members
Example: using the chamber of commerce
The UNION of x,y is formed by membership directory as the frame for a target
combining elements from both sets, and is population of member businesses owned by
denoted by x U y. women.
An INTERSECTION is denoted by x ∩
yStructure of Probability Underregistration
Mutually Exclusive Events are events that the frame does not contain all members of
such occurrence of one precludes the the target population.
occurrence of the other. Example: using the chamber of commerce
Independent Events is the occurrence or membership directory as the frame for a target
nonoccurrence of one has no effect on the population of all businesses.
occurrence of the others
Random sampling
Structure of Probability • Every unit of the population has the same
Collectively Exhaustive Events are listing of probability of being included
all possible elementary events for an in the sample.
experiment • A chance mechanism is used in the selection
Complementary Events these are two process.
events, one which comprises all the • Eliminates bias in the selection process
elementary events of an experiment that are • Also known as probability sampling
not in the other event
Nonrandom Sampling
Sample Space • Every unit of the population does not have the
The set of all elementary events for an same probability of
experiment being included in the sample.
Methods of describing a sample space • Open the selection bias
roster or listing • Not appropriate data collection methods for
Tree diagram most statistical methods
Set builder notation • Also known as non-probability sampling
Venn Diagram
Random Sampling Techniques
Module 5: Sampling and Sampling Simple Random Sample
Distribution Stratified Random Sample
◦ Proportionate
Reasons for Sampling ◦ Disportionate
Sampling can save money. Systematic Random Sample
Sampling can save time. Cluster (or Area) Sampling
For given resources, sampling can broaden
the scope of the data set. Simple Random Sample
Because the research process is sometimes Number each frame unit from 1 to N.
destructive, the sample can save product. Use a random number table or a random
If accessing the population is impossible; number generator to select n distinct
sampling is the only option. numbers between 1 and N, inclusively.
Easier to perform for small populations
Reasons for Taking a Census Cumbersome for large populations
Eliminate the possibility that a random
sample is not representative of the Stratified Random Sample
population. Population is divided into non-overlapping
The person authorizing the study is Sub-populations called strata
uncomfortable with sample information. A random sample is selected from each
stratum
Population Frame Potential for reducing sampling error
A list, map, directory, or other source used
to represent the population Proportionate
the percentage of thee sample taken
Overregistration from each stratum is proportionate to the
the frame contains all members of the percentage
target population and some additional that each stratum is within the population
elements
Disproportionate Errors
proportions of the strata within the sample Data from nonrandom samples are not
are different than the proportions of the appropriate for analysis by inferential statistical
strata within the population methods.
Sampling Error
occurs when the sample is not
representative of the population
Non-sampling Errors
• Missing Data, Recording, Data Entry, and
Analysis Errors
• Poorly conceived concepts , unclear definitions,
and defective
questionnaires
Cluster Sampling • Response errors occur when people so not
Population is divided into non overlapping know, will not say, or overstate
clusters or areas in their answers
Each cluster is a miniature, or microcosm,
of the
population.
A subset of the clusters is selected
randomly for the sample.
If the number of elements in the subset of
clusters
is larger than the desired value of n, these
clusters may be subdivided to form a new
set of clusters and subjected to a random
selection process.
Cluster Sampling
Advantages
•More convenient for geographically dispersed
populations
•Reduced travel costs to contact sample elements
•Simplified administration of the survey
•Unavailability of sampling frame prohibits
using other random sampling methods
Disadvantages
•Statistically less efficient when the cluster
elements are similar
•Costs and problems of statistical analysis are
greater than for simple random sampling.
Nonrandom Sampling
Convenience Sampling
sample elements are selected for the
convenience of the researcher
Judgment Sampling
sample elements are selected by the
judgment of the researcher
Quota Sampling
sample elements are selected until the
quota controls are satisfied
Snowball Sampling
survey subjects are selected based on
referral from other survey respondents.