0% found this document useful (0 votes)

19 views

Data Management

This document discusses data management and statistical techniques. It covers: 1. The goals of understanding statistical tools to process data, using linear regression and correlations to predict variables, and advocating statistical data in decision making. 2. Definitions of statistics, why statistics are studied, and the divisions of descriptive and inferential statistics. 3. Key concepts including populations, samples, parameters, statistics, and determining appropriate sample sizes using formulas from Cochran and Yamane.

Uploaded by

Aereal Mari

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Data Management

Uploaded by

Aereal Mari

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

DATA MANAGEMENT

Our target learning outcomes are a) Use a variety of statistical tools to process and
manage numerical data; b) Use the methods of linear regression and correlations to predict
the value of a variable given certain conditions; c) Advocate the use of statistical data in
making important decisions

UNIT 1: INTRODUCTION TO STATISTICS

PowerPoint 19: Definition of Statistics and terminologies

What is Statistics?
Statistics is the science of collection, organizing, presenting, analyzing, and
interpreting data to assist in making more effective decisions.

Why study statistics?

Data is everywhere. Statistical techniques are used to make many decisions that
affect our lives. No matter what your career, you will make professional decisions that involve
data. An understanding of statistical methods will help you make these decisions effectively

A. Divisions of Statistics
1. Descriptive Statistics. It deals with the methods of organizing, summarizing, and
presenting a mass of data to yield meaningful information. It includes anything done
to the data designed to summarize, or describe without any attempt to make
inference or conclusion about the gathered data.
Activities:
• Collect data; e.g., Survey
• Present data; e.g., Tables and graphs
• Summarize data; e.g., Sample mean
2. Inferential Statistics. It is concerned with generalizing about a population or other
groups of data based on the study of the sample. It comprises those methods
concerned with the analysis of a subset of data leading to predictions or inferences
about the entire set of data.
Activities:
• Estimation; e.g., Estimate the population mean weight using the
sample mean weight
• Hypothesis testing; e.g., Test the claim that the population mean
weight is 70 kg

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 1
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
B. Population and Sample
1. Population. It consists of the totality of the observations with which we are concerned.
It refers to a group of a total number of people, objects, or reactions that can be
described as having a unique or combination of qualities. Population can be either
finite or infinite.
• Parameter is any numerical value describing a characteristic of a population
usually represented by Greek letters.
Examples:
• If we consider all math classes to be the population, then the
average number of points earned per student over all the math
classes is an example of a parameter.
• There are 35, 000 students enrolled in a university and 15 % of
them are enrolled in math. The figure of 15% is a parameter
because it is based on the entire population of all enrolled
students.
2. Sample . It refers to a finite number of objects selected from the population. It is a
collection of some elements in a population or is a representative of the entire
population.
• Statistic is any numerical values describing a characteristic of a sample and
usually represented by the ordinary letters of the English alphabets
Example:
• If we consider one math class to be a sample of the population
of all math classes, then the average number of points earned
by students in that one math class at the end of the term is an
example of a statistic. The statistic is an estimate of a population
parameter, in this case the mean.
• An institution polled 2.3 million adults in the Philippines and 80%
said that they would vote for the presidency. That figure of 80 %
is a statistic because it is based on a sample, not the entire
population of all adults in the Philippines.

An illustration below is given to differentiate population and sample.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 2
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
C. Sample Size Determination

The number of respondents or subjects to form a sample is termed as the sample

size.
1. Cochran (1977) presented a set of formulas that can be used to determine the
sample size.
For finite and known For an infinite or unknown
population size, N population size, N:

Estimating a
Population Mean

Estimating a
Population Proportion

Where n is the sample size, ,𝒁𝜶 is the two- tailed z- score corresponding to the level of
𝟐
significance, s is the known standard deviation, e is the margin of error, p is the past
estimate of the population proportion, and q=1-p
NOTE
a. The level of significance,𝛼 , can take any of the standard values namely, 0.01,
0.05, and 0.10. Theoretically, the level of significance is the probability of the
type 1 error in hypothesis testing.
b. The following table presents the values of 𝒁𝜶 corresponding to the standard
𝟐
values of 𝛼

𝛼 𝒁𝜶
𝟐

0.01 2.575
0.05 1.96
0.10 1.645

c. The standard deviation, s, can be estimated from a pilot data set or the value
can be adopted from a previous study that considered the same or similar
population.
d. In the same manner as s, p can be the past estimate of the population
proportion or can be computed from a pilot data set.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 3
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
2. Yamane’s Formula (Simplified Formula for Proportions)
If the behavior of the population is not certain or the researcher is not familiar
with the population’s behavior, Yaro Yamen’s formula (1980) or Taro Yamane’s
formula (1967) may be used. The formula is:

𝑁
𝑛=
1 + 𝑁𝑒 2

Where N is the population size and e is the margin of error.

Example 78: 41% of Jacksonville residents said that they had been in a hurricane. How
many adults should be surveyed to estimate the true proportion of adults who have been
in a hurricane, with a 95%confidence interval and 3% margin of error.
Solution:
41 % is a past estimate of population proportion. Unknown population size. Hence,
we use the following formula.
𝛼 =0.05
p=0.41
q=1-0.41=0.59
𝑍𝛼 = 1.96
2
2
( 𝒁𝜶 ) 𝑝𝑔
𝟐
𝑛≥
𝑒2

( 𝟏. 𝟗𝟔)2 (0.41)(0.59)
𝑛≥
(0.03)2
𝑛 ≥ 1,032.54 ≈ 1.033

Example 79: From a population of 10,000 individuals of a certain town, what sample size
is needed in order to get an accurate result for a certain study using a margin of error of
3% .
Solution:
𝑁
𝑛=
1 + 𝑁𝑒 2
10,000
𝑛=
1 + (10,000)(0.03)2
𝑛 = 1,000

Hence, the sample size needed in order to get an accurate result for a a certain
study using a margin of error of 3% is 1000 individuals.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 4
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
D. Sampling Techniques

Sampling is the process of selecting units, like people, organizations, or objects from
a population of interest in order to study and fairly generalize the results back to the
population from which the sample was taken. The two types of sampling are
1. Random Sampling Techniques
Members from the population are selected in such a way that each individual
member in the population has an equal chance of being selected.
a. Simple Random Sampling . Every case in the population being sampled has an equal
chance of being chosen. It is an equal probability sampling Method (EPSEM).
Basic Steps:
1. Make a list of the population units and number them from a 1 to N,
where N is the population size.
2. Select n random numbers from 1 to N using some random process.
3. Employ any of the following selection procedure:
• Draw lots
• Lottery
• Usage of gadgets like the calculator or computer to generate
Random Numbers
• Table of Random Numbers
b. Systematic Random Sampling. we select some starting point randomly and then
select every kth (such as every 50th) element in the population until the desired
sample size is achieved.
Basic Steps:
1. Construct the sampling frame
2. Determine the sample size
𝑁
3. Determine the sample interval, k: 𝑘 =
𝑛
4. Identify the random start using SRS, r: 1 ≤ 𝑟 ≤ 𝑘
5. Commencing on the random start, select every kth item until the
desired sample size is reached.

c. Stratified Random Sampling. We subdivide the population into at least two different
subgroups (or strata) so that subjects within the same subgroup share the same
characteristics (such as gender or age bracket), then we draw a sample from each
subgroup (or stratum).

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 5
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
We use Proportional Allocation to draw a sample from each stratum to
reach the desired sample size.

Example 80: Suppose a school has five departments composed of the following number
of students. Determine the number of students to be part of the sample when the
researcher needs 363 respondents.
Solution:

Department 𝑁ℎ 𝑛ℎ
Business Administration (BA) 1,500 140
Management(M) 1,200 112
Finance(F) 850 80
Entrepreneurship(E) 200 19
Culinary Arts(CA) 150 14
Total 3,900

1,500
𝑛𝐴𝐵 = (363) = 139.62 ≈ 140
3,900
1,200
𝑛𝑀 = (363) = 111.69 ≈ 112
3,900
850
𝑛𝐹 = (363) = 79.12 ≈ 80
3,900
200
𝑛𝐸 = (363) = 18.62 ≈ 19
3,900
150
𝑛𝐶𝐴 = (363) = 13.96 ≈ 14
3,900

Hence, 140 students from Business Administration, 112 students from

Management, 80 students from Finance, 19 students from Entrepreneurship, and 14
students from Culinary Arts are part of the sample.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 6
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Sample Size Round-off
Rule: When the calculated sample size is not a whole number, it should be
rounded up to the next higher whole number. Rounding up a sample size calculation
for conservativeness ensures that your sample size will always be representative of the
population. For instance, A sample size calculation determined that 2006.083 data
points were necessary to represent the population. In this case, 2007 data points
samples should be taken.

d. Cluster Random Sampling. Divide the population into sections (or clusters), then
randomly select some of those clusters, and then choose all members from those
selected clusters.

e. Multi-Stage Sampling. This method uses several stages or phases in getting random
samples from the general population.
Commonly used if research is of National Scope.
• We divide the country to Regions
• Regions to Municipalities and Cities
• Municipalities and Cities to barangays
• Barangays to Sitios or sections

2. Non Random Sampling Techniques

a. Accidental or Haphazard or Convenience sampling. It is one of the most common

methods of sampling where methods done are normally biased since the researcher
considers his/her convenience in the collection of the data.
b. Purposive sampling. It is based on certain criteria laid down by the researcher. People
who satisfy the criteria are interviewed. The sub- categories of purposive sampling are:
1. Modal Instance Sampling. When we do modal instance sampling, we are
sampling most frequent cases. The problem with modal instance sampling is
identifying the “modal” case.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 7
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
2. Expert Sampling. It involves the assembling of a sample of persons with known
or demonstrable experience and expertise in some area.
3. Quota Sampling. Selecting items non randomly according to some fixed
quota.
4. Snowball Sampling. Begin by identifying someone who meets the criteria for
inclusion in your study. You ask them to recommend others who they may know
who also meet the criteria.

E. Statistical Data
It is the raw materials of research or any statistical investigations usually obtained by
counting or measuring items. Statistical data are usually obtained by counting or measuring
items. Data are categorized

1. According to Description:
a) Qualitative (Categorical) Data generally described by words or letters. They are not
as widely used as quantitative data because many numerical techniques do not
apply to the qualitative data. For example, it does not make sense to find an average
hair color and other attributes of the population.
• The gender (male, female) of survey respondents
• The numbers 24, 28, 17, 54, and 31 sewn on the shirts of the
basketball team are categorical data. These numbers are
substitutes for names. They do not count or measure anything.
Qualitative data can be separated into two subgroups:
1. Dichotomic takes the form of a word with two options, such as gender -
male or female.
2. Polynomic takes the form of a word with more than two options, such as
education - primary school, secondary school and university.
b) Quantitative (Numerical) Data are always numbers and are the result of counting
or measuring attributes of a population.
• The ages (in years) of survey respondents
• distance traveled
• number of children in a family,
Quantitative data can be separated into two subgroups:
1. Discrete is the result of counting. It is expressed as whole numbers and is
always exact.
• The numbers of eggs that hens lay are discrete data because
they represent counts.
• The number of students of a given ethnic group in a class.
• The number of books on a shelf.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 8
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
2. Continuous is the result of measuring. It is not necessarily whole numbers.
• The amounts of milk from cows are continuous data because
they are measurements that can assume any value over a
continuous span. During a year, a
a cow might yield an amount of milk that can be any value
between 0 and 7000 liters. It would be possible to get
5678.1234 liters because the cow is not restricted to the
discrete amounts of 0, 1, 2, . . . , 7000 liters.
• distance traveled
• weight of luggage

2. According to Source:
a. Primary data refers to the information which is gathered directly from an original
source or which are based on direct or first- hand experience using methods like
surveys, interviews, or experiments.
b. Secondary data refers to the information taken from published / unpublished
materials that have been previously gathered by other individuals, researcher’s or
agencies.

3. According to level of measurement

a. Nominal Scale. It involves categorizing cases according to the presence or absence
of some attribute. It is generally used for the purpose of classification. Data gathered
from variables measured at a nominal level can be categorized but cannot be
ranked, as there are no quantitative differences between and among them.
• gender
• religious affiliation
• eye color
b. Ordinal Scale. It is the simplest scale which orders people, objects, or events along
some continuum. Values of variables measured at the ordinal level offer at least a
rough indication of quantitative differences; they can also be categorized and
ranked, numbers are used only to place objects in order.
• year level
• job position
c. Interval Scale. It is the scale on which zero is arbitrary. It does not reflect the absence
of an attribute. Data gathered from variables measured at an interval scale can be
categorized, ranked, and can be added or subtracted.
• IQ Scores
• temperature
d. Ratio Scale. It possesses all of the characteristics of interval scales but has a true zero
point. Thus, a case where 0 is on a scale indicates the total absence of the property
being measured. For values at this level, differences and ratios are both meaningful.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 9
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
• Distances (in km) traveled by cars (0 km represents no distance traveled, and
400 km is twice as far as 200 km.)
• Prices of books(P0.00 does represent no cost, and a P300.00 book does cost
twice as much as a P150.00 book.)
• height
• weight

UNIT 2: METHODS OF DATA GATHERING AND PRESENTATION

PowerPoint 20: Methods of Data Gathering

A. Methods of Data Gathering

1. Interview (Direct) Method – a method of person-to-person exchange between the
interviewer and the interviewee.
Positive:
1. It provides consistent and more precise information since clarification
may be given by the interviewee.
2. Questions may be repeated or maybe modified to suit the interviewee’s
level of understanding.
Negative:
1. Time-consuming
2. Expensive
3. Limited field coverage
2. Questionnaire (Indirect) Method – in this method written responses are given to
prepared questions. A questionnaire is used to elicit answers to the problems of the
study. Questionnaires may be mailed or hand-carried.
Positive:
1. Inexpensive
2. Can cover a wide area in a shorter span of time.
3. Respondents may feel a greater sense of freedom to express views and
opinions because their anonymity is maintained.
Negative:
1. There’s a strong possibility of non-response, especially when
questionnaires are mailed.
2. Questions not easily understood may not be answered.
3. Observation Method – the investigator observes the behavior of the subject
/respondent. It is used when the subjects cannot talk or write.
Positive:
• The recording of behavior at the appropriate time and situation is
made possible.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 10
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
4. Experiment Method - this method is used when the objective is to determine the
cause-and-effect relationship of certain phenomena under controlled conditions. It
is usually used by scientific researchers.
5. Registration Method – this method of gathering information is enforced by law.
• registration of births
• deaths
• vehicles
• licenses
Positive:
1. Information is kept systematized.
2. Information is always made available to the public.

Characteristics of a Good Question

1. A good question is unbiased.
2. Questions must not be worded in a manner that influences the answer of a
respondent in a certain way, that is, to favor a certain response or be against it.
3. An unbiased question is stated in neutral language and there is no element of
pressure
4. A good question must be clear and simply stated.
5. It is easier to understand and a question that is simple and clear and is more likely to
be answered truthfully.
6. Questions must be precise
7. Questions must not be vague. The question should indicate clearly the manner on
how the answers must be given.
8. Good questionnaires lend themselves to easy analyses.

B. Methods of Data Presentation

PowerPoint 21: Methods of Presentation of Data

1. Textual Presentation – This type of presentation incorporates data in a set of narrative

sentences or paragraphs. It emphasizes and compares important figures. However, it
can be tedious to read especially if it consists of lengthy paragraphs and some figures
or words are repeated many times.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 11
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
2. Tabular Presentation – This is a systematic way of categorizing related data in rows
and columns. This methodical arrangement called statistical table presents data in a
more concise and greater detail than in textual or graphical form.

3. Graphical Method – This is a method of presenting quantitative data in pictorial form

produces a device which is often referred to as graph or chart. They have visual
appeal that can attract better and hold further, the reader’s interests.
Kinds of Graphs and Diagrams
1. Bar Graph. A bar graph uses bars of equal width to show frequencies of
categories of qualitative data. The vertical scale represents frequencies or
relative frequencies. The horizontal scale identifies the different categories of
qualitative data. It is best used for large changes over time/category.

2. Frequency Polygon. One type of statistical graph involves the class midpoints.
A frequency polygon uses line segments connected to points located directly
above class midpoint values. A variation of the basic frequency polygon is the
relative frequency polygon, which uses relative frequencies (proportions or
percentages) for the vertical scale. When trying to compare two data sets, it
is often very helpful to graph two relative frequency polygons on the same
axes.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 12
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
3. Ogive. Another type of statistical graph called an ogive (pronounced “oh-
jive”) involves cumulative frequencies. Ogives are useful for determining the
number of values below some particular value, as illustrated in Example 3. An
ogive is a line graph that depicts cumulative frequencies. An ogive uses class
boundaries along the horizontal scale, and cumulative frequencies along the
vertical scale.

4. Pie chart is a graph that depicts qualitative data as slices of a circle, in which
the size of each slice is proportional to the frequency count for the category.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 13
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
5. A stemplot (or stem-and-leaf plot) represents quantitative data by separating
each value into two parts: the stem (such as the leftmost digit) and the leaf
(such as the rightmost digit).

UNIT 3: DESCRIPTIVE STATISTICAL MEASURES

A. Measures of Central Tendencies

PowerPoint 22: Measures of Central Tendencies and other Locations

Numerical values that tend to locate in some sense the middle of a set of data
when arranged in increasing or decreasing order. The term average is often
associated with these measures mean, median, mode, midrange

1. Mean 𝝁 or x
a. Arithmetic Mean. It is obtained by adding all the observations and dividing the sum
by the number of observations, thus it is called computational average.
1. Population Mean: If 𝑥1 , 𝑥2 , ..., 𝑥𝑛 represents a finite population of size N,
the population mean is given by

2. Sample Mean: If 𝑥1 , 𝑥2 , ..., 𝑥𝑛 represents a finite sample of size n, the

sample mean is given by

Example 81: Suppose you chose ten people who entered the campus and whose ages
are as follows: 15 25 18 20 25 18 18 20 20 25 What is the mean age of this sample?
Solution:

The mean age of the sample is 20.40 .

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 14
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
b. Weighted Mean. If the data values 𝑥1 , 𝑥2 , ..., 𝑥𝑘 have assigned weights 𝑤1 , 𝑤2 , ..., 𝑤𝑘 ,
respectively, the mean is given by

Example 82: :A student was taking 5 subjects last semester. Find his average if his final
grades were as follows:

Solution:

3(1.75) + 5(2.50) + 3(2.25) + 2(1.50) + 4(3.0)

̅=
𝒙 = 2.32
3+5+3+2+4

Characteristics of Mean
1. Interval and ratio measurements
2. All the scores or measurements are considered in the computation of the mean.
3. Very high or very low scores or measurements affect the mean.

2. Mode 𝝁̂ or 𝒙̂
It is the value in the distribution with the highest frequency. It locates the point
where the observation values occur with the greatest density. It can be used for
quantitative aw sell as qualitative data.
A data set can have one mode, more than one mode, or no mode.
• When two data values occur with the same greatest frequency, each
one is a mode and the data set is bimodal.
• When more than two data values occur with the same greatest
frequency, each is a mode and the data set is said to be multimodal.
• When no data value is repeated, we say that there is no mode.

Example 83: Observe the given ungrouped data below:

a. 1,2,3,4,5,6,7 (No Mode)
b. 15.2, 12.3, 4.6, 12.3, 6.5, 12.3, 5.5 (𝒙
̂=12.3)
c. 15,12,4,15,4,6,5 (𝒙̂=12 and 𝒙 ̂= 4)
d. 3,4,5,1,3,2,4,5,7,10 (𝒙
̂=3, 𝒙̂=4, and 𝒙 ̂= 5)

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 15
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Characteristics of Mode
1. It is very easy to compute but is seldom used because it is very unstable.
2. When a rough or quick estimate of a central value is wanted.
3. It is most appropriate for nominal scale as a measure of popularity.

3. Median
It is a value that divides the distribution into two equal parts (after arranging the
values in ascending or descending order). As such, it is a positional average. The median
is defined by

Example 84 During the first marking period, Nicole's math quiz scores were 90, 92, 93, 88,
95, 88, 97, 87, and 98. What was the median quiz score?
Solution:
Ordering the data from least to greatest, we get:

Since n =9 (odd),

The median quiz score is 92. (Four quiz scores were higher than 92 and four
were lower.)

Example 85 The ages of 10 college students are listed below. Find the median.
18, 24, 20, 35, 19, 23, 26, 23, 19, 20
Solution:
Ordering the data from least to greatest, we get:

Since n =10 (even),

The median age of the college students is 21.5

Characteristics of Median
1. Ordinal or ranked measurements
2. Only the middle scores or measurements are considered in the computation of the
median.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 16
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
3. Very high or very low scores do not affect the median.
4. When there are extreme cases, thus the distribution is markedly skewed.
5. When we desire to know whether the cases fall within the upper halves or the lower
halves of a distribution

B. Measures of Relative Location

These measures, also known as quantiles or fractiles, are values below which a specific
fraction or percentage of the observations in a given data set must fall. These are
percentiles, deciles, and quartiles

1. The Percentiles
Percentiles are values that divide a set of observations into 100 equal parts. These
values, denoted by 𝑃1 , 𝑃2 , … , 𝑃99 , are such that 1% of the data falls below 𝑃1 , 2% falls below
𝑃2 , …, and 99% falls below 𝑃99 . The 𝑘th percentile, 𝑃𝑘 (𝑘 = 1, 2, 3, … ,99), can be determined
using the following procedure:
a. Arrange the data in increasing order and compute the value of the index
𝑘
𝑖=( ) 𝑛, where 𝑛 is the number of observations.
100
𝑥𝑖 +𝑥𝑖+1
b. If 𝑖 is an integer, 𝑃𝑘 = . If 𝑖 is not an integer, use the rounded-up value for 𝑖
2
and take 𝑃𝑘 = 𝑥𝑖 . Note that 𝑥𝑖 here pertains to the score in the data set.

Example 86. As part of a quality-control study aimed at improving a production line, the
weights (in ounces) of 50 bars of soap are measured. The results are as follows, sorted from
smallest to largest. Find, the 43rd percentile and 10th percentile

11.6 12.6 12.7 12.8 13.1 13.3 13.6 13.7 13.8 14.1
14.3 14.3 14.6 14.8 15.1 15.2 15.6 15.6 15.7 15.8
15.8 15.9 15.9 16.1 16.2 16.2 16.3 16.4 16.5 16.5
16.5 16.6 17.0 17.1 17.3 17.3 17.4 17.4 17.4 17.6
17.7 18.1 18.3 18.3 18.3 18.5 18.5 18.8 19.2 20.3

Solution

a. 43rd Percentile
We compute the index 𝑖. Note that 𝑘=43 𝑎𝑛𝑑 𝑛=50
43
Then 𝑖 = ( ) 50 = 21.5 ≈22 (𝑟𝑜𝑢𝑛𝑑 𝑢𝑝)
100
From the data set, 𝑥22 = 15.9
Hence, we have 𝑃43 = 𝑥22 = 15.9
Hence, 43% of the values lie below 15.9.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 17
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
b. 10th percentile 𝑃10

Notice that the data are already arranged in increasing order. We compute the
10
index 𝑖. Note that 𝑘 = 10 and 𝑛 = 50, so 𝑖 = ( ) 50 = 5.
100

.
𝑥5 +𝑥6
Since 𝑖 is an integer, 𝑃10 = From the data set, 𝑥5 = 13.1 and 𝑥6 = 13.3 (the fifth
2
𝑥5 +𝑥6 13.1.+13.3
and sixth values in the data set). Thus, we have 𝑃10 = = = 13.2.
2 2
This means that 10% of the bars of soap weigh less than 13.2 ounces.

2. The Deciles
Deciles are values that divide a set of observations into ten equal parts. These values,
denoted by 𝐷1 , 𝐷2 , … , 𝐷9, are such that 10% of the data falls below 𝐷1 , 20% falls below 𝐷2 , …,
and 90% falls below 𝐷9 . The 𝑘th Decile, 𝐷𝑘 (𝑘 = 1, 2, … ,9), can be determined using the
following procedure:
a. Arrange the data in increasing order and compute the value of the index
𝑘
𝑖 = ( ) 𝑛, where 𝑛 is the number of observations.
10
𝑥𝑖 +𝑥𝑖+1
b. If 𝑖 is an integer, 𝐷𝑘 = . If 𝑖 is not an integer, use the rounded-up value for 𝑖
2
and take 𝐷𝑘 = 𝑥𝑖 .

Example 87. Compute the 1st and 9th deciles of the data set from the previous example.

Solution

a. 9th Decile

Let us compute the index 𝑖 , given that k=9 and n=50

9
𝑖 = ( ) 50 = 45
10
(𝑥45+𝑥46)
Since i is an integer, 𝐷9 = . From the data set, 𝑥45 = 18.3 and 𝑥46 = 18.5
2
18.3+18.5
Thus, we have, 𝐷9 = = 18.4
2
Hence, 90% of the bars of soap weigh less than 18.4 ounces. Also, 𝐷9 = 𝑃90

b. First Decile

1
Let us compute the index 𝑖 given that 𝑘 = 1 and 𝑛 = 50: 𝑖 = ( ) 50 = 5.
10
𝑥5 +𝑥6
Since 𝑖 is an integer, 𝐷1 = . From the data set, 𝑥5 = 13.1 and 𝑥6 = 13.3. Thus, we
2
𝑥5 +𝑥6 13.1.+13.3
have 𝐷1 = = = 13.2.
2 2
This means that 10% of the bars of soap weigh less than 13.2 ounces.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 18
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
3. The Quartiles
Quartiles are values that divide a set of observations into four equal parts. These
values, denoted by 𝑄1 , 𝑄2 , and 𝑄3 , are such that 25% of the data falls below 𝑄1 , 50% falls
below 𝑄2 and 75% falls below 𝑄3 . The 𝑘th quartile, 𝑄𝑘 (𝑘 = 1, 2, 3), can be determined
using the following procedure:
a. Arrange the data in increasing order and compute the value of the index
𝑘
𝑖 = ( ) 𝑛, where 𝑛 is the number of observations.
4
𝑥𝑖 +𝑥𝑖+1
c. If 𝑖 is an integer, 𝑄𝑘 = . If 𝑖 is not an integer, use the rounded-up value for 𝑖
2
and take 𝑄𝑘 = 𝑥𝑖 .

Example 88. Compute the 2nd and the 3rd quartiles of the data set from the previous
example.
Solution

a. 2nd quartile 𝑄2

2
Let us compute the index 𝑖 given that 𝑘 = 2 and 𝑛 = 50: 𝑖 = ( ) 50 = 25.
4

.
𝑥25 +𝑥26
Since 𝑖 is an integer, 𝑄2 = From the data set, 𝑥25 = 16.2 and 𝑥26 = 16.2.
2
𝑥25 +𝑥26 16.2+16.2
Thus, we have 𝑄2 = = = 16.2.
2 2
This means that 50% of the bars of soap weigh less than 16.2 ounces.
Note that the value of 𝑄2 represents the median of the data and that 𝑄2 = 𝑃50 = 𝐷5 .

b. 3rd quartile, 𝑄3

3
We compute the index 𝑖. Note that 𝑘 = 3 and 𝑛 = 50: 𝑖 = (4) 50 = 37.5 ≈ 38(round up
since 37.5 is not an integer). Hence, we have 𝑄3 = 𝑥38 = 17.4.
This means that 75% of the bars of soap weigh less than 17.4 ounces. We also say that
𝑄3 = 𝑃75 .

UNIT 4: MEASURES OF VARIABILITY / DISPERSION

PowerPoint 23: Measures of Dispersion

The measures of central tendency and relative location do not by themselves give an
adequate description of the data. It is also very important for us to know how the
observations spread out from the average. The measures of variability/dispersion indicate
the extent to which individual items in a series are scattered about the average. It is used to
determine the extent of the scatter so that steps may be taken to control the existing
variation. After going through this unit, you are expected to know how to calculate

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 19
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
descriptive measures to explain the consistency or variability of data using a scientific
calculator.

For example, let us consider the following measurements from two samples of data:

Sample A 0.97 1.00 0.89 1.03 1.11

Sample B 1.06 1.01 0.88 0.91 1.14

Both samples have the same mean 𝑥̅𝐴 = 𝑥̅𝐵 = 1.00. However, looking closely at the values,
the measurements for sample A are more uniform, or the values are close to each other
compared to sample B. This is what we will quantify in this unit. There are two general
classifications of the measures of variability: (1) measures of absolute dispersion and (2)
measures of relative dispersion.

Measures of Variability indicate the extent to which individual items in a series are
scattered about the average It is used to determine the extent of the scatter so that steps
may be taken to control the existing variation. General Classifications of Measures of
Variation are
a) Measures of Absolute Dispersion
b) Measures of Relative Dispersion

A. Measures of Absolute Dispersion: Expressed in the units of the observations. It cannot be

used to compare variations of two data sets when the averages of these data sets differ
or when the observations differ in units of measurement.
1. Range. It is the difference between the largest and smallest values. It gives an
idea of the spread of the data set but is affected by outliers and does not
consider all values in the data set.
𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒
However, the range does not use the concept of deviation. It is a quick
but a rough measure of variation since it considers only the highest value and the
lowest value of the observations. It is affected by outliers but does not consider
all values in the data set. Thus, the range is not a very accurate measure of
variability.
2. Variance and Standard Deviation are the most common and useful measures of
variability. These two measures provide information about how the data vary
about the mean.
The variance 𝝈𝟐 or 𝒔𝟐 is a measure of variation which considers the
position of each observation relative to the mean of the set.

• Given a finite population 𝑋1 , 𝑋2 … 𝑋𝑁 , the population variance 𝜎 2 , which

is exact, can be calculated using any of the following formulas

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 20
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
∑(𝑋𝑖 − 𝜇)2 𝑁 ∑ 𝑋𝑖 2 − (∑ 𝑋𝑖 )2
𝜎2 = 𝑜𝑟 𝜎 2 =
𝑁 𝑁2
The two formulas will generate the same result, but the second is
computationally more convenient because it eliminates the step of
computing the deviations from the mean. The second formula is
recommended in computing the variance since it does not require the
computation of the mean first and it also eliminates round off errors
caused by taking deviations from the mean.

• On the other hand, given a random sample 𝑥1 , 𝑥2 … 𝑥𝑛 , the sample

variance 𝑠2 can be computed using

∑(𝑥𝑖 − 𝑥̅ )2 𝑛 ∑ 𝑥𝑖 2 − (∑ 𝑥𝑖 )2
𝑠2 = 𝑜𝑟 𝑠2 =
𝑛−1 𝑛(𝑛 − 1)

The standard deviation 𝝈 or s is the square root of the variance.

• The population standard deviation 𝜎 is the square root of the population

variance

𝑁 ∑ 𝑋𝑖 2 − (∑ 𝑋𝑖 )2
𝜎=√
𝑁2

• The sample standard deviation 𝑠 is the square root of the sample

variance.

𝑛 ∑ 𝑥𝑖 2 − (∑ 𝑥𝑖 )2
𝑠=√
𝑛(𝑛 − 1)

The variables in the abovementioned formulas are defined as follows:

𝜎 = population standard deviation 𝑋𝑖 𝑎𝑛𝑑 𝑥𝑖 = 𝑖th observation

𝑠 = sample standard deviation 𝜇 = population mean
𝑥̅ = sample mean 𝑁 = population size
𝑛 = sample size

If the data are clustered around the mean, then the variance and the standard
deviation will be somewhat small. If the data are widely scattered about the mean, the
variance and the standard deviation will be somewhat large.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 21
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Let us take note of the following:
1. The standard deviation unit is the same as that of the raw data, so it is preferable to
use the standard deviation as a measure of variability instead of the variance, whose
unit is the square of the unit of the raw data.
2. In the formula for 𝑠, we divide by the quantity 𝑛 − 1 (instead of 𝑛) to make the sample
variance an unbiased estimator of the population variance (an estimator is unbiased
if its average value is equal to the parameter it is estimating). Hence, it is critical to
determine if the data set is from a population or a sample because of the difference
in the formula to be used.
3. There is a big difference between (∑ 𝑥𝑖 )2 and ∑ 𝑥𝑖 2 ! The first, (∑ 𝑥𝑖 )2 , means we add up
all the 𝑥𝑖 values first, then square the sum. The second, ∑ 𝑥𝑖 2 , means we should square
each of the 𝑥𝑖 values first, then add them up! To illustrate, suppose our data set is 4,
3, 6, and 7. This gives us:
∑ 𝑥𝑖 = 4 + 3 + 6 + 7 = 20
(∑ 𝑥𝑖 )2 = (4 + 3 + 6 + 7)2 = 202 = 400
∑ 𝑥𝑖 2 = 42 + 32 + 62 + 72 = 110
See the difference there!

A scientific calculator can easily compute these quantities using the Statistics mode.
You can watch the video demonstration for the models Casio 991EX and Casio 991ES on the
following links, if you are not familiar with the statistical functions of your calculator.

https://www.loom.com/share/85e2345a662446c9bcab064213aeb381
https://www.loom.com/share/a08e8f19258148e0ba91ef945a39521a

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 22
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Example 89. A high school teacher at a small private school assigns trigonometry practice
problems to be worked via the net. Students must use a password to access the problems,
and the time of log-in and log-off is automatically recorded for the teacher. At the end of
the week, the teacher examines the amount of time each student spent working on the
assigned problems. The data is provided below in minutes. Find the Range, Standard
Deviation, and Variance for the above data.
15, 28 25 48 22 43 49 34 22 33 27 25 22 20 39

Solution
x 𝒙𝟐

15 225
28 784
25 625
48 2304
22 484
43 1849
49 2401
34 1156
22 484
33 1089
27 729
25 625
22 484
20 400
39 1521
452 (Total) 15160 (Total)

For the Standard deviation:

𝑛 ∑ 𝑥𝑖 2 − (𝑥𝑖 )2
𝑠=√
𝑛(𝑛 − 1)

15(15160) − (452)2
𝑠=√
15(15 − 1)
𝒔 = 𝟏𝟎. 𝟒𝟖𝟕𝟏𝟖

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 23
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
For the variance:
The variance is 109.98095. The standard deviation is 10.48718. The variance is
the square of the standard deviation. 10.48718 squared is equal to 109.98095.

𝒔𝟐 = 𝟏𝟎𝟗. 𝟗𝟖𝟎𝟗𝟓

For Range:
The high score is 49; the low score is 15. Hence, the range is 49 - 15 = 34.

Uses of the Variance and Standard Deviation

1. Variances and standard deviations can be used to determine the spread of the data.
If the variance or standard deviation is large, the data are more dispersed. This
information is useful in comparing two (or more) data sets to determine which is more
(most) variable.
2. The measures of variance and standard deviation are used to determine the
consistency of a variable. For example, in the manufacture of fittings, such as nuts
and bolts, the variation in the diameters must be small, or the parts will not fit together.
3. The variance and standard deviation are used to determine the number of data
values that fall within a specified interval in a distribution.
4. Finally, the variance and standard deviation are used quite often in inferential
statistics.
5. These uses will be shown in later chapters of this textbook.

B. Measures of Relative Dispersion. It is used to compare variations in the dispersion of two

data sets when the averages of these data sets differ or when the observations differ in
units of measurement. It is unitless.
1. Coefficient of Variation. It indicates how large the standard deviation is in
relation to the mean. It can be used to compare variations for different variables
with different units. The larger the coefficient of variation, the more dispersed the
observations are.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 24
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Example 90: If we have a standard deviation of 1.5 and a mean of 5, what is the coefficient
of variation?
Solution

𝑥̅ = 5
𝑠 = 1.5
1.5
𝐶𝑉 = ( ) 100% = 30 %
5

In other words, the standard deviation is 30% of the mean. When comparing two data
sets, the general rule of thumb you should follow is: The higher the coefficient of variation,
the higher the variability of the data set. This means that, when comparing two or more data
sets, the one with the highest coefficient of variability can be said to have the highest
variation.

Example 91: The mean of the number of sales of cars over a 3-month period is 87, and the
standard deviation is 5. The mean of the commissions is P261,250.00 and the standard
deviation is P38,650.00. Compare the variations of the two.
Solution

The coefficients of variation are

5
𝐶𝑉 = ( ) 100% = 5.70 %
87 Sales

P38,650.00
𝐶𝑉 = ( ) 100% = 14.80 %
P261,250.00 Commissions

Since the coefficient of variation is larger for commissions, the commissions are more
variable than sales.

UNIT 5: CORRELATION AND REGRESSION

PowerPoint 24: Correlation and Regression

A. Correlation
It measures the strength of the association or relationship between variables. the
variables are not designated as dependent or independent.
𝑐𝑜𝑟𝑟𝑒𝑙 𝑋 𝑎𝑛𝑑 𝑌=𝑐𝑜𝑟𝑟𝑒𝑙 𝑌 𝑎𝑛𝑑 𝑋
It is not define as causation (cause and effect relationship)
Assume that the association is linear, that one variable increases or decreases a fixed
amount for a unit increase or decrease in the other.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 25
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Pearson Correlation Coefficient
• denoted by 𝑟
• use to measure the degree of linear association or relationship
• measured on a scale that varies from +1 through 0 𝑡𝑜 – 1
• formula is
n xy −  x y
r=
n x2 − ( x) 2
n y 2 − (y )
2

The value of r is interpreted as follows:

r Interpretation
1.0 Perfect positive/negative correlation
0.80-0.99 Very strong positive/ negative correlation
0.60-0.79 Strong positive/ negative correlation
0.40-0.59 Moderate positive/ negative correlation
0.20-0.39 Weak positive/ negative correlation
0.01-0.19 Very weak positive/ negative correlation
0.0 No correlation

Perfect positive Perfect negative

correlation correlation

No correlation Use other measures

of correlation

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 26
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Example 92: Given the following data on the number of hours of study (x) for an
examination and the scores (y)received by a random sample of 10 students, compute for
the Pearson correlation coefficient.
Solution

Student 𝒙 𝒚 ∑ 𝒙𝒚 ∑ 𝒚𝟐 ∑ 𝒙𝟐
1 8 56 448 3136 64
2 5 44 220 1936 25
3 11 79 869 6241 121
4 13 72 936 5184 169
5 10 70 700 4900 100
6 5 54 270 2916 25
7 18 94 1692 8836 324
8 15 85 1275 7225 225
9 2 33 66 1089 4
10 8 65 520 4225 64

n xy −  x y
r=
n x 2 − ( x) 2
n y 2 − (y )
2

10(6996) − (95)(652)
r=
10(1121) − (95) 10(45688) − (652)
2 2

r = 0.9625
There is a very strong positive linear relationship between the number of hours
of study (𝑥) for an examination and the scores (𝑦) received by a random sample of
10 students.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 27
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Example 93: Consider the scores obtained in Math and Statistics by 10 students.

Student 1 2 3 4 5 6 7 8 9 10
Math 5 8 10 12 12 14 15 16 18 20
Score
Stat 2 7 8 9 10 12 14 10 16 12
Score
Solution

Student 1 2 3 4 5 6 7 8 9 10 Total
Math Score 5 8 10 12 12 14 15 16 18 20 130
(𝒙)
Stat Score 2 7 8 9 10 12 14 10 16 12 100
(𝒚)
𝒙𝒚 10 56 80 108 120 168 210 160 288 240 1440

𝒙𝟐 25 64 100 144 144 196 225 256 324 400 1878

𝒚𝟐 4 49 64 81 100 144 196 100 256 144 1138

10(1440) − (130)(100)
r=
10(1878) − (130) 10(1138) − (100)
2 2

r = 0.8692

There is a very strong positive linear relationship between math and stat scores.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 28
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
B. Regression
It is used to examine the relationship between one dependent and one independent
variable and to predict the dependent variable (Y) when the independent variable (X) is
known.
It finds the best line (regression line) that predicts Y from X.
The Regression Line
It is a line that is as close as possible to all the data points at once.
The Regression Equation
It is an equation that represents the relationship between one dependent and
one independent variable.
𝒚= 𝒂 + 𝒃𝒙
The slope is
n xy −  x y
b=
n x2 − ( x) 2

The y-intercept

a=
 y − b  x 
n  n 
 

Coefficient of Determination (𝑹𝟐 ). It is the square of the correlation coefficient. It is

interpreted as the proportion of the variance in the dependent variable that is predictable
from the independent variable. The fraction of data points falls on the regression line.
𝑹𝟐 =1 (all points lie exactly on a straight line with no points scattered about the line) means
that the dependent variable is perfectly predicted without error using the independent
variable X
𝑹𝟐 =0 means that the dependent variable cannot be predicted using the independent
variable X.
An 𝑹𝟐 between 0 and 1 indicates the extent to which the dependent variable is
predictable.
An 𝑹𝟐 of 0.10 means that 10 percent of the variance in Y is predictable from X;
an 𝑹𝟐 of 0.20 means that 20 percent is predictable; and so on.
SSR
R2 =  100
SSY
Where: SSR = b1 SPXY

SPXY =  x i y i −
( x )( y ) i i

SSY =  y i
2
−
( y ) i
2

SSX =  x i
2
−
( x ) i
2

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 29
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Example 94: The paired data below consist of the costs of advertising (in thousands
of pesos) and the number of products sold (in thousand units).
Solution

Cost # Products  xy x 2
y 2

(x) Sold
(y)
9,000.00 85,000.00 765,000,000.00 81,000,000.00 7,225,000,000.00
2,000.00 52,000.00 104,000,000.00 4,000,000.00 2,704,000,000.00
3,000.00 55,000.00 165,000,000.00 9,000,000.00 3,025,000,000.00
4,000.00 68,000.00 272,000,000.00 16,000,000.00 4,624,000,000.00
2,000.00 67,000.00 134,000,000.00 4,000,000.00 4,489,000,000.00
5,000.00 86,000.00 430,000,000.00 25,000,000.00 7,396,000,000.00
9,000.00 83,000.00 747,000,000.00 81,000,000.00 6,889,000,000.00
10,000.00 73,000.00 730,000,000.00 100,000,000.00 5,329,000,000.00
Total 44,000.00 569,000.00 3,347,000,000.00 320,000,000.00 41,681,000,000.00

1. Plot a scatter diagram

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 30
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
2. Find the equation of the regression line to predict weekly sales from advertising
expenditures.

Thus, the equation is 𝑦 = 55788.25 + 2.7885𝑥

3. Estimate the number of products sold when advertising costs is P4,500.

𝑦=55788.25+2.7885𝑥
𝑦=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑠𝑜𝑙𝑑=55788.25+2.7885(4,500)
𝑦=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑠𝑜𝑙𝑑=68,336.50 units

4. Determine the coefficient of determination

Therefore, 50.08 % of the variance in the number of products sold is predictable from
the cost of advertising.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 31
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
Learning Reinforcement 6

Directions: Write your solutions and answers on a clean sheet of

paper, or you may print this page and answer there. Submit the
image of your HANDWRITTEN SOLUTIONS as a single pdf file in the
submission bin for this activity in the Classroom. You may use
image scanning apps on your phone (CamScanner or Tap
Scanner) to save several images into one pdf file, or place your
images in a document and save them as a pdf file.

A. Classify the following statements as to whether they belong to the area of

descriptive statistics or inferential statistics
___________________ 1. At most 5% of SLU students are smokers.
___________________ 2. Assuming that less than 20% of the Kalinga coffee beans were
destroyed by a typhoon these past months, we should expect an
increase of no more than P30 for a kilogram of coffee by the end
of the year.
___________________ 3. An employee generalized that the average monthly salary of a
regular employee in a certain company is P12,000.
___________________ 4. A study found out that all customers who have received a gift
certificate from a store 75% went back to the store to shop.
___________________ 5. The average grade in statistics of 50 students is 83.60.
B. At what level are the following variables measured?
_______________ 1. The scores of students in a statistics quiz
_______________ 2. The birth order of children in the family
_______________ 3. Weights of a sample of bags of raw materials for the
production of a certain product, measured in grams.
_______________ 4. The natural eye color of a sample of 100 children.
_______________ 5. The final grade of graduate students taking up Statistics.
C. Classify the following variables as quantitative or qualitative variables. If the variable
is quantitative, identify whether it is discrete or continuous.
_______________ _______________ 1. The type of payment used by customers
_______________ _______________ 2. The evaluation rating of instructors
_______________ _______________ 3. The classification of employees in a
company
_______________ _______________ 4. The weekly allowance of students

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 32
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.
_______________ _______________ 5. The length of telephone calls made by
students to their parents

D. In each of the following situations, a random sample must be obtained. Determine

whether a cluster, stratified, or systematic random sampling would be appropriate.
Explain in detail how the sampling is to be conducted. Do not discuss expected results
or conclusions.
1. A large convenience store chain wishes to determine its customers’ level of
satisfaction with regard to their service.
2. A nationwide survey on charter change is to be conducted. (Note: there are
seventeen regions in the Philippines.)
3. An educational researcher wants to compare the difference in career goals
between male and female students of Otto Hahn University who has 10,000
students.
4. A social researcher wants to determine whether electronic engineers who
work in the communications field earn more than those who are in
semiconductor industries.
5. A market analyst would like to compare the durability, in terms of mean time
before wear, of two leading brands of car tires.
E. The numbers of incorrect answers on a true or false competency test for a random
sample of 15 students were recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, and
2. Find a. mean, b. median, c. mode, d. range, e. standard deviation, f. variance
F. A corporation administers an aptitude test to all new sales representatives.
Management is interested in the extent to which this test is able to predict their
eventual success. The accompanying table records average weekly sales (in
thousands of pesos) and aptitude test scores for a random sample of eight
representatives.

1. Plot a scatter diagram

2. Estimate the linear regression of weekly sales on aptitude test scores.
3. Estimate the weekly sales when test scores is 70
4. Determine the coefficient of determination

Well done! It’s time to answer Quiz 1!

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any 33
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited.

Data Management
100% (8)
Data Management
16 pages
??module 6 ?
No ratings yet
??module 6 ?
33 pages
Gmath Module 7
No ratings yet
Gmath Module 7
31 pages
Den
No ratings yet
Den
15 pages
Unit - Iv Sampling
No ratings yet
Unit - Iv Sampling
14 pages
Module 6 Lesson 1
No ratings yet
Module 6 Lesson 1
8 pages
P&S R19 - Unit-4-5 (Ref-2)
No ratings yet
P&S R19 - Unit-4-5 (Ref-2)
14 pages
Stats Reviewer
No ratings yet
Stats Reviewer
10 pages
Unit 2 Statistics PDF
No ratings yet
Unit 2 Statistics PDF
18 pages
sample distribution
No ratings yet
sample distribution
8 pages
Statistics-Glossary CSE
No ratings yet
Statistics-Glossary CSE
13 pages
Cschool teviewre
No ratings yet
Cschool teviewre
6 pages
Preliminary Concepts On Statistical Inference
100% (1)
Preliminary Concepts On Statistical Inference
39 pages
Reviewer_in_Statistics_and_Probability
No ratings yet
Reviewer_in_Statistics_and_Probability
7 pages
Slovin's Formula Is Used To Calculate The Sample Size Necessary To Achieve A Certain Confidence Interval When Sampling A Population
No ratings yet
Slovin's Formula Is Used To Calculate The Sample Size Necessary To Achieve A Certain Confidence Interval When Sampling A Population
5 pages
Module On Data MGT
No ratings yet
Module On Data MGT
32 pages
Mariden P.Ramos Ednalene D.Untalan
No ratings yet
Mariden P.Ramos Ednalene D.Untalan
32 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
STAT Stage Final Module 3 LECTURE
No ratings yet
STAT Stage Final Module 3 LECTURE
14 pages
7 Sample Design and Sampling
No ratings yet
7 Sample Design and Sampling
36 pages
Ge4 Week 10 11
No ratings yet
Ge4 Week 10 11
3 pages
Summary of Statistics
No ratings yet
Summary of Statistics
49 pages
ACCTY 312 - Lesson 1
No ratings yet
ACCTY 312 - Lesson 1
11 pages
Q3 Mod 4
No ratings yet
Q3 Mod 4
8 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
34 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Business Statistics
No ratings yet
Business Statistics
25 pages
Fds Unit 3 Final Correction
No ratings yet
Fds Unit 3 Final Correction
34 pages
MMW-CHAPTER-4.1
No ratings yet
MMW-CHAPTER-4.1
24 pages
BRIEF HISTORY OF STATISTICS
No ratings yet
BRIEF HISTORY OF STATISTICS
15 pages
CHAPTER_1
No ratings yet
CHAPTER_1
4 pages
Stat Introduction To Statistical Methodology
No ratings yet
Stat Introduction To Statistical Methodology
12 pages
Lesson 4
No ratings yet
Lesson 4
22 pages
STATISTICS
No ratings yet
STATISTICS
8 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
unit 2 Probability theory
No ratings yet
unit 2 Probability theory
7 pages
ESWP_B.-Sampling-Techniques
No ratings yet
ESWP_B.-Sampling-Techniques
5 pages
Bsa Unit 5
No ratings yet
Bsa Unit 5
16 pages
Unit 3. Sampling Distribution
100% (2)
Unit 3. Sampling Distribution
10 pages
EM-104-Module
No ratings yet
EM-104-Module
12 pages
Unit-Iii P&S
No ratings yet
Unit-Iii P&S
21 pages
Summary of Statistics
No ratings yet
Summary of Statistics
47 pages
Topic 2 - Data Collection and Sampling Techniques
No ratings yet
Topic 2 - Data Collection and Sampling Techniques
15 pages
Lecture 1 Concept of Sampling
No ratings yet
Lecture 1 Concept of Sampling
4 pages
Lecture 1 GS Stat
No ratings yet
Lecture 1 GS Stat
45 pages
Module 4 (301 SI-2) (1)
No ratings yet
Module 4 (301 SI-2) (1)
24 pages
To Statistics
No ratings yet
To Statistics
85 pages
Rosalie Act. 2.0
No ratings yet
Rosalie Act. 2.0
9 pages
NSTA 51516 Slides
No ratings yet
NSTA 51516 Slides
97 pages
Statistics: Lesson 5
No ratings yet
Statistics: Lesson 5
11 pages
Copy of Estimation and Test of Hypothesis
No ratings yet
Copy of Estimation and Test of Hypothesis
48 pages
Em8 1session1.3
No ratings yet
Em8 1session1.3
29 pages
Stats-And-Prob-Reviewer (Grade 11 Stem)
100% (1)
Stats-And-Prob-Reviewer (Grade 11 Stem)
5 pages
Sample - Is The Subset of The Entire Population
No ratings yet
Sample - Is The Subset of The Entire Population
6 pages
- Module 4-Sampling 2
No ratings yet
- Module 4-Sampling 2
56 pages
Module 6 Lesson 1
No ratings yet
Module 6 Lesson 1
7 pages
Variable and Types of Statistical Variables
100% (1)
Variable and Types of Statistical Variables
9 pages
Data Management: Lesson 4.1 Statistics and Data
No ratings yet
Data Management: Lesson 4.1 Statistics and Data
23 pages
Lesson-1-Data-Management-1
100% (1)
Lesson-1-Data-Management-1
3 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Sampling Distribution
No ratings yet
Sampling Distribution
29 pages
A Feasibility Study of Establishing A Dairy Shop in Bataan
No ratings yet
A Feasibility Study of Establishing A Dairy Shop in Bataan
23 pages
Asme PTC 6 Report - 1997 PDF
80% (5)
Asme PTC 6 Report - 1997 PDF
86 pages
Neway Beyene proposal 2025 Hawassa university
No ratings yet
Neway Beyene proposal 2025 Hawassa university
20 pages
The Voice of The Customer
No ratings yet
The Voice of The Customer
7 pages
Aiha Journal Industrial Hygiene
No ratings yet
Aiha Journal Industrial Hygiene
6 pages
Programme Guide MSCAST - Sep 2023
No ratings yet
Programme Guide MSCAST - Sep 2023
49 pages
PROPOSAL-DEFENSE-QUESTIONS
No ratings yet
PROPOSAL-DEFENSE-QUESTIONS
2 pages
CHAPTER 6 AUDIT EVIDENCE (TECHNIQUES) _ The ACCA group
No ratings yet
CHAPTER 6 AUDIT EVIDENCE (TECHNIQUES) _ The ACCA group
5 pages
Summer 2021 CO Brac STA501 - 02
No ratings yet
Summer 2021 CO Brac STA501 - 02
5 pages
Quarter 2 - Probability Sampling Techniques
No ratings yet
Quarter 2 - Probability Sampling Techniques
5 pages
Clark Material Handling
100% (1)
Clark Material Handling
26 pages
What Is A Split Plot ANOVA
No ratings yet
What Is A Split Plot ANOVA
2 pages
Synopsis Topic: A Study On Competitive Strategies Used by Malayala Manorama To Attain Competitive Advantage in Newspaper Industry
No ratings yet
Synopsis Topic: A Study On Competitive Strategies Used by Malayala Manorama To Attain Competitive Advantage in Newspaper Industry
3 pages
Ziwa La Ngombe Survey-Mombasa-2000
No ratings yet
Ziwa La Ngombe Survey-Mombasa-2000
130 pages
Dissertation On Population Education
100% (2)
Dissertation On Population Education
4 pages
PHS311_Theory_QA_All_11_Modules_FULL
No ratings yet
PHS311_Theory_QA_All_11_Modules_FULL
9 pages
Proposed Outpatient Exit Interview Methodology
No ratings yet
Proposed Outpatient Exit Interview Methodology
7 pages
IIRS
0% (1)
IIRS
42 pages
Literature Review Image Processing
100% (1)
Literature Review Image Processing
8 pages
Business Research-2018
No ratings yet
Business Research-2018
75 pages
Psych Stats 4 Parametric Tests
No ratings yet
Psych Stats 4 Parametric Tests
133 pages
Australasian Business Statistics 4Th Edition Black Test Bank Full Chapter PDF
100% (26)
Australasian Business Statistics 4Th Edition Black Test Bank Full Chapter PDF
47 pages
Preeti Narang Irctc
No ratings yet
Preeti Narang Irctc
55 pages
2015 GMP Validation Forum D1.T4.2.2 EMA and FDA Approaches To Process Validation
100% (1)
2015 GMP Validation Forum D1.T4.2.2 EMA and FDA Approaches To Process Validation
77 pages
Sustainability 14 12547
No ratings yet
Sustainability 14 12547
17 pages
Skills For IB Geography Sampling
No ratings yet
Skills For IB Geography Sampling
20 pages
Exercises 1.1: The of Which Are More Consistent From The Point of View of ?
No ratings yet
Exercises 1.1: The of Which Are More Consistent From The Point of View of ?
3 pages
This Study Resource Was: Activities/Assessments
No ratings yet
This Study Resource Was: Activities/Assessments
2 pages
Audit Sampling (ACP323)
No ratings yet
Audit Sampling (ACP323)
16 pages

Data Management

Uploaded by

Data Management

Uploaded by

DATA MANAGEMENT

UNIT 1: INTRODUCTION TO STATISTICS

Why study statistics?

An illustration below is given to differentiate population and sample.

The number of respondents or subjects to form a sample is termed as the sample

Where N is the population size and e is the margin of error.

Hence, 140 students from Business Administration, 112 students from

2. Non Random Sampling Techniques

a. Accidental or Haphazard or Convenience sampling. It is one of the most common

3. According to level of measurement

UNIT 2: METHODS OF DATA GATHERING AND PRESENTATION

A. Methods of Data Gathering

Characteristics of a Good Question

B. Methods of Data Presentation

1. Textual Presentation – This type of presentation incorporates data in a set of narrative

3. Graphical Method – This is a method of presenting quantitative data in pictorial form

UNIT 3: DESCRIPTIVE STATISTICAL MEASURES

A. Measures of Central Tendencies

2. Sample Mean: If 𝑥1 , 𝑥2 , ..., 𝑥𝑛 represents a finite sample of size n, the

The mean age of the sample is 20.40 .

3(1.75) + 5(2.50) + 3(2.25) + 2(1.50) + 4(3.0)

Example 83: Observe the given ungrouped data below:

Since n =10 (even),

The median age of the college students is 21.5

B. Measures of Relative Location

Let us compute the index 𝑖 , given that k=9 and n=50

UNIT 4: MEASURES OF VARIABILITY / DISPERSION

Sample A 0.97 1.00 0.89 1.03 1.11

A. Measures of Absolute Dispersion: Expressed in the units of the observations. It cannot be

• Given a finite population 𝑋1 , 𝑋2 … 𝑋𝑁 , the population variance 𝜎 2 , which

• On the other hand, given a random sample 𝑥1 , 𝑥2 … 𝑥𝑛 , the sample

The standard deviation 𝝈 or s is the square root of the variance.

• The population standard deviation 𝜎 is the square root of the population

• The sample standard deviation 𝑠 is the square root of the sample

The variables in the abovementioned formulas are defined as follows:

𝜎 = population standard deviation 𝑋𝑖 𝑎𝑛𝑑 𝑥𝑖 = 𝑖th observation

For the Standard deviation:

Uses of the Variance and Standard Deviation

B. Measures of Relative Dispersion. It is used to compare variations in the dispersion of two

The coefficients of variation are

UNIT 5: CORRELATION AND REGRESSION

The value of r is interpreted as follows:

Perfect positive Perfect negative

No correlation Use other measures

𝒙𝟐 25 64 100 144 144 196 225 256 324 400 1878

𝒚𝟐 4 49 64 81 100 144 196 100 256 144 1138

Coefficient of Determination (𝑹𝟐 ). It is the square of the correlation coefficient. It is

1. Plot a scatter diagram

Thus, the equation is 𝑦 = 55788.25 + 2.7885𝑥

3. Estimate the number of products sold when advertising costs is P4,500.

4. Determine the coefficient of determination

Directions: Write your solutions and answers on a clean sheet of

A. Classify the following statements as to whether they belong to the area of

D. In each of the following situations, a random sample must be obtained. Determine

1. Plot a scatter diagram

Well done! It’s time to answer Quiz 1!

You might also like