0% found this document useful (0 votes)
18 views57 pages

Statistics - Review

The document provides a review of statistics, focusing on measures of central tendency, variability, and data organization. It discusses various statistical concepts, including descriptive and inferential statistics, types of variables, and methods for data collection and analysis. Additionally, it covers how to represent data visually through tables and charts, emphasizing the importance of proper statistical methods in decision-making.

Uploaded by

Claire Maratas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views57 pages

Statistics - Review

The document provides a review of statistics, focusing on measures of central tendency, variability, and data organization. It discusses various statistical concepts, including descriptive and inferential statistics, types of variables, and methods for data collection and analysis. Additionally, it covers how to represent data visually through tables and charts, emphasizing the importance of proper statistical methods in decision-making.

Uploaded by

Claire Maratas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Statistics

(A Review)
Problem 1
Measures of central tendency are

A. inferential statistics that identify the best single


value for representing a set of data.
B. descriptive statistics that identify the best single
value for representing a set of data.
C. inferential statistics that identify the spread of
the scores in a data set.
D. descriptive statistics that identify the spread of
the scores in a data set.
Problem 1
The correct answer is B.
Descriptive statistics that identify the best
single value for representing a set of data.

Central tendency is a descriptive measure for


describing an entire set of data.
Problem 2
Given the following data set, what is
the value of the median?
2, 4, 3, 6, 1, 8, 9, 2, 5, 7
A. 2
B. 4.7
C. 4.5
D. 10
Problem 2

The correct answer is C.


First the data must be ordered. (1 2 2 3
4 5 6 7 8 9). Then, since there are an
even number of observations, the
median is found by averaging the two
middlemost scores (4 + 5)/2 = 4.5.
Problem 3
Which of the following is not a
characteristic of the mean?
A. It is affected by extreme scores.
B. It minimizes the sum of squared
deviations.
C. The sum of the deviations about the
mean is 0.
D. It is best used with ordinal data.
Problem 3

The correct answer is D.


The mean should not be used with
ordinal data. It is best for ratio and
interval data.
Problem 4
A measure of central tendency tells us, using a
single value, the best representation for an
entire set of scores. A measure of variability
tells us
A. if the high and low extreme scores cancel
each other out.
B. if the mean is greater than the mode.
C. how well the measure of central tendency
represents the entire set of scores.
D. whether or not to compute percentiles.
Problem 4
The correct answer is C.
Although a measure of central tendency is the
best representation for a set of scores, it still
may not be a very good representation. If the
data are heterogeneous then they will be
difficult to describe accurately with a single
value and the measure of variability will tell
us that.
Problem 5
Given the following set of data, what is the
variance?
2, 6, 8, 3, 7, 9, 1, 4
A. 40
B. 5
C. 2.74
D. 7.5
Problem 5
The correct answer is D.
First calculate the mean (40/8 = 5). Then
calculate deviation scores by subtracting the
mean from each score (-3, 1, 3, -2, 2, 4, -4, -1).
Then square each deviation score (9, 1, 9, 4, 4,
16, 16, 1) and add them (60). This sum is SS
(sum of squares). To get the variance divide it
by n, which is 8.

7.5 is the variance


Problem 6
Seven airline passengers in economy class on the
same flight paid an average of P2,730 per ticket.
Because the tickets were purchased at different
times and from different sources, the prices varied.
The first five passengers paid P2015, P2100, P3335,
P2500, and P1850. The sixth and seventh tickets
were purchased by a couple who paid identical
fares. What price did each of them pay?
A. 7,310 C. 3,655
B. 5,310 D. 4,265
Problem 6
The correct answer is C.

Multiply 2,730 by 7 to find the total


prices. Add the five prices: P2015,
P2100, P3335, P2500, and P1850.
Subtract the sum from the product of
2,730 & 7. Divide this difference by 2.
Problem 7
What is the value of the mean of this data set?

value 8 7 6 5 4 3 2 1
f 8 12 10 4 3 0 0 1

A. 4.75 C. 6.5
B. 6.34 D. 7
Problem 7

The correct answer is B.

Using ΣfX = 241 and Σf = 38, the mean is


241/38 or 6.34.
Problem 8
The donations (in pesos) given by 10 parents
to the graduating batch's project were as
follows: 500, 200, 100, 1000, 350, 500, 250,
300, 1500, 5000. Which measure of central
tendency would be most appropriate to
describe the average donation by 10
parents?
A. Mean C. Mode
B. Median D. Mean Absolute Deviation
Problem 8
The correct answer is B.

Median is totally resistant to extreme


observations. All it matters in median, is each
number's relative standing with respect to it. In
this case, median would be the right choice for
'central tendency' and not 'mean'.
Problem 9

A. 18 C. 20
B. 19 D. 21
Problem 9
The correct answer is C.
The position of the median = (131+1) / 2
= 66. The 66th position = 20, therefore
the median = 20.
What is statistics?

• A branch of mathematics taking and


transforming numbers into useful
information for decision makers
• Methods for processing & analyzing
numbers
• Methods for helping reduce the uncertainty
inherent in decision making
What is statistics?
Statistics is the process of:
1. Collecting data
2. Organizing data in tables
3. Summarizing data via graphs and
calculations
4. Analyzing the data
5. Interpreting the results
6. Making estimates and predictions
7. Formulating generalizations and
making of inferences
Why Study Statistics?
Decision Makers Use Statistics to:

1. Present and describe business data and


information properly
2. Draw conclusions about large groups of
individuals or items, using the information
collected from subsets of the individuals or
items under study.
3. Make reliable forecasts about a business
activity
4. Improve business processes
Branches of Statistics

The branch of mathematics that transforms data


into useful information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, Drawing conclusions and/or


and describing sample making decisions
data concerning a population
based only on sample data
SOME TERMINOLOGIES
POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.
Example: Average monthly income of Filipino Families

STATISTIC
A statistic is a numerical measure that describes a characteristic
of a sample.
Example: Average monthly income of 10 selected families
SOME TERMINOLOGIES
SAMPLING BIAS
A sampling method is bias if every member of the population
does not have equal likelihood of being in the sample

RANDOM SAMPLE
A random sample is one in which each member of the
population has an equal probability of being chosen.

QUANTITATIVE VARIABLE
Provides information in which a count or quantity is most
important

QUALITATIVE VARIABLE
Is a characteristic that can be placed into well-defined categories
or groups that do not depend on order.
Statistical data
• The collection of data that are relevant to the problem
being studied is commonly the most difficult,
expensive, and time-consuming part of the entire
research project.
• Statistical data are usually obtained by counting or
measuring items.
– Primary data are collected specifically for the analysis
desired
– Secondary data have already been compiled and are
available for statistical analysis
• A variable is an item of interest that can take on
many different numerical values.
• A constant has a fixed numerical value.
Types of Variables
 Categorical (qualitative) variables have
values (data) that can only be placed into
categories, such as “yes” and “no.”
Ex. Course (area of specialization)

 Numerical (quantitative) variables have


values (data) that represent quantities.
Ex. Number of subjects enrolled
Types of Data

Data

Categorical Numerical

Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
(Defined categories) Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured
- whole numbers characteristics)
only - can take decimal values
Levels of Measurement
 A nominal scale classifies data into distinct categories
in which no ranking is implied.

Categorical Variables Categories

Personal Computer Yes / No


Ownership

Type of Stocks Growth, Value, Mutual,


Owned Balance, Equity. other

Internet Provider Globe/PLDT/


SUN/SMART/SKY CABLE
Levels of Measurement

 An ordinal scale classifies data into distinct


categories in which ranking is implied

Categorical Variable Ordered Categories

Student class designation Freshman, Sophomore, Junior, Senior

Product satisfaction Satisfied, Neutral, Unsatisfied

Faculty rank Professor, Associate Professor, Assistant


Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC, C,
DDD, DD, D
Student Grades A, B, C, D, F
DATA
MANAGEMENT
Categorical Data Are Summarized
By Tables & Graphs

Categorical
Data

Tabulating Data Graphing Data

Summary Bar Charts Pie Charts Pareto


Table Diagram
Organizing Categorical Data:
Summary Table

 A summary table indicates the frequency, amount, or percentage of items


in a set of categories so that you can see differences between categories.
This is also known as CATEGORICAL Frequency Distribution Table.

Banking Preference Percent

ATM 16%

Automated or live telephone 2%

Drive-through service at branch 17%

In person at branch 41%

Internet 24%
Bar and Pie Charts
• Bar charts and Pie charts are often used for
categorical data
• Length of bar or size of pie slice shows the
frequency or percentage for each category
• Bar Chart can be horizontal or vertical.
• Vertical bar graph– horizontal axis is for the
category and vertical for the corresponding
frequency
• Horizontal bar graph – x-axis represents the
frequency and y -axis the category
Organizing Categorical Data:
Bar Chart
 In a horizontal bar chart, a bar shows each category, the length of
which represents the amount, frequency or percentage of values
falling into a category.

Banking Preference

Internet

In person at branch

Drive-through service
at branch
Automated or live
telephone

ATM

0% 10% 20% 30% 40% 50%


Organizing Categorical Data:
Bar Chart

• In a vertical bar chart, a bar shows each category,


the height of which represents the amount, frequency
or percentage of values falling into a category.
Organizing Categorical Data:
Pie Chart

 The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.
Banking Preference

16% ATM
24%
2% Automated or live
telephone
Drive-through service at
17% branch
In person at branch

Internet
41%
Organizing Categorical Data:
Pareto Diagram

• Used to portray categorical data (nominal


scale)
• A vertical bar chart, where categories are
shown in descending order of frequency
• A cumulative polygon is shown in the same
graph
• Used to separate the “vital few” from the “trivial
many”

Chap 2-38
Organizing Categorical Data:
Pareto Diagram
Pareto Chart For Banking Preference

100% 100%
% in each category

80% 80%

Cumulative %
(line graph)
(bar graph)

60% 60%

40% 40%

20% 20%

0% 0%
In person Internet Drive- ATM Automated
at branch through or live
service at telephone
branch
Tables and Charts for
Numerical Data
Numerical Data

Frequency Distributions and


Ordered Array Cumulative Distributions

Stem-and-Leaf
Display Histogram Polygon Ogive
Organizing Numerical Data:
Ordered Array
 An ordered array is a sequence of data, in rank order, from the
smallest value to the largest value.
 Shows range (minimum value to maximum value)
 May help identify outliers (unusual observations)
Age of Day Students
Surveyed
College 16 17 17 18 18 18
Students 19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Stem-and-Leaf Display

• A simple way to see how the data are


distributed and where concentrations of
data exist

METHOD: Separate the sorted data series


into leading digits (the stems) and
the trailing digits (the leaves)
Organizing Numerical Data:
Stem and Leaf Display
 A stem-and-leaf display organizes data into groups
(called stems) so that the values within each group
(the leaves) branch out to the right on each row.
Age of College Students
Age of Day Students Day Students Night Students
Surveyed
College 16 17 17 18 18 18 Stem Leaf Stem Leaf
Students
19 19 20 20 21 22 1 67788899
1 8899
22 25 27 32 38 42 2 0012257
2 0138
Night Students 3 28
3 23
18 18 19 19 20 21 4 2
4 15
23 28 32 33 41 45
Organizing Numerical Data:
Frequency Distribution
 The frequency distribution is a summary table in which the data are
arranged into numerically ordered classes.

 You must give attention to selecting the appropriate number of class


groupings for the table, determining a suitable width of a class grouping,
and establishing the boundaries of each class grouping to avoid
overlapping.

 The number of classes depends on the number of values in the data. With
a larger number of values, typically there are more classes. In general, a
frequency distribution should have at least 5 but no more than 15 classes.

 To determine the width of a class interval, you divide the range (Highest
value–Lowest value) of the data by the number of class groupings desired.
Organizing Numerical Data:
Grouped Frequency Distribution

A manufacturer of insulation randomly selects 20 winter


days and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Organizing Numerical Data:
Grouped Frequency Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44,
46, 53, 58

Relative
Class Frequency Frequency Percentage

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Tabulating Numerical Data:
Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44,
46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
10 but less than 20 3 15 3 15
20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
Why Use a Frequency Distribution?

• It condenses the raw data into a more useful


form
• It allows for a quick visual interpretation of the
data
• It enables the determination of the major
characteristics of the data set including where
the data are concentrated / clustered
Frequency Distributions: Some Tips

• Different class boundaries may provide different pictures for


the same data (especially for smaller data sets)

• Shifts in data concentration may show up when different class


boundaries are chosen

• As the size of the data set increases, the impact of alterations


in the selection of class boundaries is greatly reduced

• When comparing two or more groups with different sample


sizes, you must use either a relative frequency or a
percentage distribution
Organizing Numerical Data:
The Histogram

 A vertical bar chart of the data in a frequency distribution is


called a histogram.

 In a histogram there are no gaps between adjacent bars.

 The class boundaries (or class midpoints) are shown on the


horizontal axis.

 The vertical axis is either frequency, relative frequency, or


percentage.

 The height of the bars represent the frequency, relative


frequency, or percentage.
Organizing Numerical Data:
The Histogram

Relative
Class Frequency Frequency
Percentage

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
Histogram: Daily High Temperature
50 but less than 60 2 .10 10
7
Total 20 1.00 100
6
5

Frequency
4
(In a percentage
histogram the vertical 3
axis would be defined
to show the 2
percentage of
observations per class)
1
0
5 15 25 35 45 55 More
Organizing Numerical Data:
The Polygon

 A percentage polygon is formed by having the midpoint of


each class represent the data in that class and then connecting
the sequence of midpoints at their respective class
percentages.

 The cumulative percentage polygon, or ogive, displays the


variable of interest along the X axis, and the cumulative
percentages along the Y axis.

 Useful when there are two or more groups to compare.


Graphing Numerical Data:
The Frequency Polygon

Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Frequency Polygon: Daily High Temperature
40 but less than 50 45 4
50 but less than 60 55 2 7
6
Frequency 5
4
3
2
(In a percentage 1
polygon the vertical axis 0
would be defined to
5 15 25 35 45 55 65
show the percentage of
observations per class) Class Midpoints
Graphing Cumulative Frequencies:
The Ogive (Cumulative % Polygon)
Lower % less
class than lower
Class boundary boundary
10 but less than 20 10 15
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90 Ogive: Daily High Temperature
50 but less than 60 50 100

Cumulative Percentage
100
80
60
40
(In an ogive the percentage
of the observations less 20
than each lower class
boundary are plotted versus 0
the lower class boundaries. 10 20 30 40 50 60
Lower Class Boundary
Ungrouped Frequency Distribution

• When a frequency distribution table


lists all of the individual categories
(X values) it is called a regular or
ungrouped frequency distribution.
• When the range of data is small, the
data must be grouped into classes
that are not more than one unit in
width.
Example
Consider the scores of 25 students on their 15-item exam
in Statistics.

Since the range is small classes consisting of


single data value can be used.
Example Cont.
Class Tally Frequency
4 // 2
5 /// 3
6 / 1
7 ///// 5
8 ///// /// 8
9 //// 4
10 / 1
11 / 1

You might also like