0% found this document useful (0 votes)
460 views6 pages

Bustat Reviewer

This document provides an overview of statistics, including what statistics is, who uses it, and the different types of variables and levels of measurement. It also summarizes common descriptive statistical techniques like frequency distributions, measures of central tendency (mean, median, mode), and quantitative data graphs (histogram, frequency polygon, ogive, dot plot, stem-and-leaf plot, pie chart, bar graph). Inferential statistics are used to make predictions or generalizations about populations based on samples. Both descriptive and inferential statistics can use parametric or non-parametric methods depending on the level of measurement of the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
460 views6 pages

Bustat Reviewer

This document provides an overview of statistics, including what statistics is, who uses it, and the different types of variables and levels of measurement. It also summarizes common descriptive statistical techniques like frequency distributions, measures of central tendency (mean, median, mode), and quantitative data graphs (histogram, frequency polygon, ogive, dot plot, stem-and-leaf plot, pie chart, bar graph). Inferential statistics are used to make predictions or generalizations about populations based on samples. Both descriptive and inferential statistics can use parametric or non-parametric methods depending on the level of measurement of the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Bustat Reviewer

Statistics  Prohibitive cost of census


 Is the science of collecting, organizing,  Destruction of item being studied may be
presenting, and interpreting numerical required
data to assists in making more effective  Not a possible to test or inspect all
decisions. members of a population being studied.

Why study statistics? Types of Variables


1. Numerical information is everywhere.
2. Statistical techniques are used to make Qualitative or Attribute variable
decisions that affect our daily lives.  The characteristics of being studied is non-
3. The knowledge of statistical methods will numeric.
help you understand how decisions are  Ex. Gender, religious, affiliation, type of
made and give you a better understanding automobile owned, state of birth, eye color.
of how they affect you.
Quantitative variable
What is meant by statistics?  Information is reported numerically
 In the more common usage, statistics refers  Ex. Balance in your checking accounts,
to numerical information. minutes remaining in class, or number of
 We often present statistical information in a children in a family.
graphical form for capturing reader
attention and to portray a large amount of Quantitative variables- Classifications
information.
Discrete variables
Who uses statistics?  Can only assume certain values and there
 Statistical techniques are used extensively are usually “gaps” between values.
by marketing, accounting, quality control,
consumers, professional sports, people, Continuous variables
hospital administrations, educators,  Can assume any value within a specified
politicians, physicians, etc.. range

Types of statistics Four levels of measurement

Descriptive statistics Nominal level


 Methods of organizing, summarizing, and  Data that is classified into categories and
presenting data in an informative way. cannot be arranged in any particular order.
 .It describes and calculates the mean,  Ex. Eye color, gender, religious affiliation
median, and mode
Inferential statistics Ordinal level
 A decision, estimate, prediction, or  Data arranged in some order but the
generalization about a population, based on differences between data values cannot be
sample determined or are meaningless.
 Inferential statistics are more precise and  Ex. During a taste test.
allow for forecasting, experimentation, and
prediction of outcomes. Interval level
 Similar to the ordinal level, with additional
Note: a population or sample may consist of property that meaningful amount of
individuals or objects. differences between data values can be
determined. There is no natural zero point.
Population  Ex. Temperature or Fahrenheit
 Is a collection of all possible individuals,
objects, or measurement of interest. Ratio level
 The interval level with an inherent zero
Sample starting point. Differences and ratios are
 Is a portion, or part, of the population meaningful for this level of measurement.
interest.  Ex. Monthly income of surgeons, or
distance traveled by manufacturer’s
Why take a sample instead of studying every representatives per month.
member of the population?  Highest level of measurement
Statistical techniques Quantitative Data Graphs

Parametric statistics Histogram


 Require that data be interval or ratio.  A histogram is a bar graph that represents a
frequency distribution.
Non-parametric statistics  The width represents the interval and the
 If the data are nominal or ordinal, non height represents the corresponding
parametric must be used. frequency.
 Used to analyze interval or ration data.  There are no spaces between the bars.
 It is a series of contiguous rectangles that
Module 2:Descriptive Statistics represent the frequency of data in given
class intervals.
Ungrouped data
 raw data or data that have not been Frequency Polygons
summarized in any way.  A frequency polygon, like the histogram, is
a graphical display of class frequencies.However,
Grouped data instead of using bars or rectangles like a
 data that have been organized into a histogram, in a frequency polygon each class
frequency distribution frequency is plotted as a dot at the class
midpoint, and the dots are connected by a series
of line segments.

Ogive
 An ogive (o-jive) is a cumulative frequency
polygon. Construction begins by labeling the
x-axis with the class endpoints and the y-axis
with the frequencies.
 Ogives are most useful when the decision
Frequency Distributions maker wants to see running totals. For example,
 This is the summary of data presented in if a comptroller is interested in controlling costs,
the form of class intervals and frequencies. an ogive could depict cumulative costs over a
It is constructed according to individual fiscal year.
business researchers’ taste.
Dot Plots
Class Midpoint  A relatively simple statistical chart
 Class midpoint or the class mark is the that is generally used to display continuous,
value halfway across the class interval. It quantitative data is the dot plot.
can be calculated by the average of two  In a dot plot, each data value is plotted
class endpoints. along the horizontal axis and is represented
on the chart by a dot. If multiple data points have
Relative frequency the same values, the dots will stack up vertically
 It is the proportion of the total frequency
that is in any given class interval in a Stem-and Leaf Plots
frequency distribution.  Another way to organize raw data into
 It is calculated as individual class groups
frequency divided by total frequency: besides using a frequency distribution is a
stem-and-leaf plot. This technique is simple
Cumulative frequency and provides a unique view of the data.
 is a running total of frequencies through  A stem-and-leaf plot is constructed by
the classes of frequency distribution. separating the digits for each number of the
data into two groups, a stem and a leaf.
Quantitative Data Graphs
 are plotted along numerical scale while Pie Chart
 A pie chart is a circular depiction of data
Qualitative Data Graphs where the area of the whole pie represents
 are plotted using non-numerical categories. 100% of the data and slices of the pie
represent a percentage breakdown of the
sub levels. Pie charts show the relative
magnitudes of the parts to the whole.
Bar graph numbers.
 Applicable for ordinal, interval, and ratio
 A bar graph or chart contains two or more data
categories along one axis and a series of  Not applicable for nominal data
bars, one for each category, along the other  Unaffected by extremely large and
axis. Typically, the length of the bar extremely small values.
represents the magnitude of the measure
(amount, frequency, money, percentage, Arithmetic Mean
etc.) for each category.  Commonly called ‘the mean’ is the average
of a group of numbers
Pareto Graphs  Applicable for interval and ratio data
 Pareto analysis is a quantitative tallying of  Not applicable for nominal or ordinal data
the number and types of defects that occur  Affected by each value in the data set,
with a product or service. Analysts use this including extreme values
tally to produce a vertical bar chart that  Computed by summing all values in the
displays the most common types of defects, data set and dividing the sum by the
ranked in order of occurrence from left to number of values in the data set
right. The bar chart is called a Pareto chart.
Percentiles
Cross Tabulation  Measures of central tendency that divide a
 It is a process for producing a two- group of data into 100 parts
dimensional table that displays the  At least n% of the data lie below the nth
frequency counts for two variables percentile, and at most (100 - n)% of the
simultaneously. It is also referred to as data lie above the nth percentile
contingency table or pivot table Example: 90th percentile indicates that at least
90% of the data lie below it, and at most 10% of
Scatter Plot the data lie above it
 A scatter plot is a two-dimensional graph The median and the 50th percentile have the
plot of pairs of points from two numerical same value.
variables. Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Module 3: Descriptive Statistics

Measures of Central Tendency: Ungrouped


Data
 Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
Common Measures of Location
◦ Mode
◦ Median
◦ Mean
◦ Percentiles
◦ Quartiles

Mode Quartiles
 The most frequently occurring value in a  Measures of central tendency that divide a
data set group of data into four subgroups
 Applicable to all levels of data Q1: 25% of the data set is below the first quartile
measurement (nominal, ordinal, interval, Q2: 50% of the data set is below the second
and ratio) quartile
Bimodal Q3: 75% of the data set is below the third
 Data sets that have two modes quartile
Q1 is equal to the 25th percentile
Multimodal Q2 is located at 50th percentile and equals the
 Data sets that contain more than median
two modes Q3 is equal to the 75th percentile
 Quartile values are not necessarily
Median members of the data set
 Middle value in an ordered array of
Measures of variability
 describe the spread or the dispersion of a
set of data.

Range
The difference between the largest and
the smallest values in a set of data

Interquartile Range Classical Probability


 Range of values between the first and third Number of outcomes leading to the event
quartiles divided by the total number of outcomes
 Range of the “middle half” possible
 Less influenced by extremes Each outcome is equally likely
Formula: Interquartile Range  Q3  Q1 Determined a priori before performing the
experiment
Applicable to games of chance
Objective is that everyone correctly using the
method assigns an identical probability

Relative Frequency Probability


Relative Frequency of Occurrence method
is the probability of an event is equal to the
number of times the event has occurred in
the past divided by the total number of
opportunities for the event to have occurred
Frequency of occurrence is based on what
has happened in the past
Based on the historical data
Computed after performing the
experiment
Number of times an event occurred divided by
the number of trials
Objective is that everyone using the method
assigns an identical probability

Subjective Probability
It comes from a person’s intuition or
reasoning
Different individuals may correctly or
incorrectly assign different numeric probabilities
to the same event
Degree of belief in the results of the event
Module 4: Probability
Useful for unique or single trial experiments
Probability
 occurrences are assigned to the inferential Structure of Probability
process under conditions of uncertainty Experiment- is the process that produces an
outcome
Methods of Assigning Probabilities Event- an outcome of an experiment
Classical method of assigning probability Elementary event- events that cannot be
(Rules and laws) decomposed or broken down into other events
Relative frequency of occurrence Sample Space- a complete roster/listing of all
(Cumulated historical data) elementary events for an experiment
Subjective probability (personal intuition Trial-one repetition of the process
or reasoning) Set Notation is the use of braces to group
members
Example: using the chamber of commerce
 The UNION of x,y is formed by membership directory as the frame for a target
combining elements from both sets, and is population of member businesses owned by
denoted by x U y. women.
 An INTERSECTION is denoted by x ∩
yStructure of Probability Underregistration
Mutually Exclusive Events are events that  the frame does not contain all members of
such occurrence of one precludes the the target population.
occurrence of the other. Example: using the chamber of commerce
Independent Events is the occurrence or membership directory as the frame for a target
nonoccurrence of one has no effect on the population of all businesses.
occurrence of the others
Random sampling
Structure of Probability • Every unit of the population has the same
Collectively Exhaustive Events are listing of probability of being included
all possible elementary events for an in the sample.
experiment • A chance mechanism is used in the selection
Complementary Events these are two process.
events, one which comprises all the • Eliminates bias in the selection process
elementary events of an experiment that are • Also known as probability sampling
not in the other event
Nonrandom Sampling
Sample Space • Every unit of the population does not have the
The set of all elementary events for an same probability of
experiment being included in the sample.
Methods of describing a sample space • Open the selection bias
 roster or listing • Not appropriate data collection methods for
 Tree diagram most statistical methods
 Set builder notation • Also known as non-probability sampling
 Venn Diagram
Random Sampling Techniques
Module 5: Sampling and Sampling Simple Random Sample
Distribution Stratified Random Sample
◦ Proportionate
Reasons for Sampling ◦ Disportionate
Sampling can save money. Systematic Random Sample
 Sampling can save time. Cluster (or Area) Sampling
 For given resources, sampling can broaden
the scope of the data set. Simple Random Sample
 Because the research process is sometimes  Number each frame unit from 1 to N.
 destructive, the sample can save product.  Use a random number table or a random
 If accessing the population is impossible;  number generator to select n distinct
sampling is the only option.  numbers between 1 and N, inclusively.
 Easier to perform for small populations
Reasons for Taking a Census  Cumbersome for large populations
 Eliminate the possibility that a random
 sample is not representative of the Stratified Random Sample
population.  Population is divided into non-overlapping
 The person authorizing the study is Sub-populations called strata
 uncomfortable with sample information.  A random sample is selected from each
stratum
Population Frame  Potential for reducing sampling error
 A list, map, directory, or other source used
to represent the population Proportionate
 the percentage of thee sample taken
Overregistration from each stratum is proportionate to the
 the frame contains all members of the percentage
target population and some additional that each stratum is within the population
elements
Disproportionate Errors
 proportions of the strata within the sample  Data from nonrandom samples are not
are different than the proportions of the appropriate for analysis by inferential statistical
strata within the population methods.

Sampling Error
 occurs when the sample is not
representative of the population

Non-sampling Errors
• Missing Data, Recording, Data Entry, and
Analysis Errors
• Poorly conceived concepts , unclear definitions,
and defective
questionnaires
Cluster Sampling • Response errors occur when people so not
 Population is divided into non overlapping know, will not say, or overstate
clusters or areas in their answers
 Each cluster is a miniature, or microcosm,
of the
 population.
 A subset of the clusters is selected
randomly for the sample.
 If the number of elements in the subset of
clusters
 is larger than the desired value of n, these
clusters may be subdivided to form a new
set of clusters and subjected to a random
selection process.

Cluster Sampling

Advantages
•More convenient for geographically dispersed
populations
•Reduced travel costs to contact sample elements
•Simplified administration of the survey
•Unavailability of sampling frame prohibits
using other random sampling methods
Disadvantages
•Statistically less efficient when the cluster
elements are similar
•Costs and problems of statistical analysis are
greater than for simple random sampling.

Nonrandom Sampling
Convenience Sampling
 sample elements are selected for the
convenience of the researcher

Judgment Sampling
 sample elements are selected by the
judgment of the researcher
Quota Sampling
 sample elements are selected until the
quota controls are satisfied

Snowball Sampling
 survey subjects are selected based on
referral from other survey respondents.

You might also like