Statistics - Review
Statistics - Review
(A Review)
Problem 1
Measures of central tendency are
value 8 7 6 5 4 3 2 1
f 8 12 10 4 3 0 0 1
A. 4.75 C. 6.5
B. 6.34 D. 7
Problem 7
A. 18 C. 20
B. 19 D. 21
Problem 9
The correct answer is C.
The position of the median = (131+1) / 2
= 66. The 66th position = 20, therefore
the median = 20.
What is statistics?
SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.
Example: Average monthly income of Filipino Families
STATISTIC
A statistic is a numerical measure that describes a characteristic
of a sample.
Example: Average monthly income of 10 selected families
SOME TERMINOLOGIES
SAMPLING BIAS
A sampling method is bias if every member of the population
does not have equal likelihood of being in the sample
RANDOM SAMPLE
A random sample is one in which each member of the
population has an equal probability of being chosen.
QUANTITATIVE VARIABLE
Provides information in which a count or quantity is most
important
QUALITATIVE VARIABLE
Is a characteristic that can be placed into well-defined categories
or groups that do not depend on order.
Statistical data
• The collection of data that are relevant to the problem
being studied is commonly the most difficult,
expensive, and time-consuming part of the entire
research project.
• Statistical data are usually obtained by counting or
measuring items.
– Primary data are collected specifically for the analysis
desired
– Secondary data have already been compiled and are
available for statistical analysis
• A variable is an item of interest that can take on
many different numerical values.
• A constant has a fixed numerical value.
Types of Variables
Categorical (qualitative) variables have
values (data) that can only be placed into
categories, such as “yes” and “no.”
Ex. Course (area of specialization)
Data
Categorical Numerical
Examples:
Marital Status
Political Party Discrete Continuous
Eye Color
(Defined categories) Examples: Examples:
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured
- whole numbers characteristics)
only - can take decimal values
Levels of Measurement
A nominal scale classifies data into distinct categories
in which no ranking is implied.
Categorical
Data
ATM 16%
Internet 24%
Bar and Pie Charts
• Bar charts and Pie charts are often used for
categorical data
• Length of bar or size of pie slice shows the
frequency or percentage for each category
• Bar Chart can be horizontal or vertical.
• Vertical bar graph– horizontal axis is for the
category and vertical for the corresponding
frequency
• Horizontal bar graph – x-axis represents the
frequency and y -axis the category
Organizing Categorical Data:
Bar Chart
In a horizontal bar chart, a bar shows each category, the length of
which represents the amount, frequency or percentage of values
falling into a category.
Banking Preference
Internet
In person at branch
Drive-through service
at branch
Automated or live
telephone
ATM
The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.
Banking Preference
16% ATM
24%
2% Automated or live
telephone
Drive-through service at
17% branch
In person at branch
Internet
41%
Organizing Categorical Data:
Pareto Diagram
Chap 2-38
Organizing Categorical Data:
Pareto Diagram
Pareto Chart For Banking Preference
100% 100%
% in each category
80% 80%
Cumulative %
(line graph)
(bar graph)
60% 60%
40% 40%
20% 20%
0% 0%
In person Internet Drive- ATM Automated
at branch through or live
service at telephone
branch
Tables and Charts for
Numerical Data
Numerical Data
Stem-and-Leaf
Display Histogram Polygon Ogive
Organizing Numerical Data:
Ordered Array
An ordered array is a sequence of data, in rank order, from the
smallest value to the largest value.
Shows range (minimum value to maximum value)
May help identify outliers (unusual observations)
Age of Day Students
Surveyed
College 16 17 17 18 18 18
Students 19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Stem-and-Leaf Display
The number of classes depends on the number of values in the data. With
a larger number of values, typically there are more classes. In general, a
frequency distribution should have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the range (Highest
value–Lowest value) of the data by the number of class groupings desired.
Organizing Numerical Data:
Grouped Frequency Distribution
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Organizing Numerical Data:
Grouped Frequency Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44,
46, 53, 58
Relative
Class Frequency Frequency Percentage
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
10 but less than 20 3 15 3 15
20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
Why Use a Frequency Distribution?
Relative
Class Frequency Frequency
Percentage
Frequency
4
(In a percentage
histogram the vertical 3
axis would be defined
to show the 2
percentage of
observations per class)
1
0
5 15 25 35 45 55 More
Organizing Numerical Data:
The Polygon
Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Frequency Polygon: Daily High Temperature
40 but less than 50 45 4
50 but less than 60 55 2 7
6
Frequency 5
4
3
2
(In a percentage 1
polygon the vertical axis 0
would be defined to
5 15 25 35 45 55 65
show the percentage of
observations per class) Class Midpoints
Graphing Cumulative Frequencies:
The Ogive (Cumulative % Polygon)
Lower % less
class than lower
Class boundary boundary
10 but less than 20 10 15
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90 Ogive: Daily High Temperature
50 but less than 60 50 100
Cumulative Percentage
100
80
60
40
(In an ogive the percentage
of the observations less 20
than each lower class
boundary are plotted versus 0
the lower class boundaries. 10 20 30 40 50 60
Lower Class Boundary
Ungrouped Frequency Distribution