0% found this document useful (0 votes)

23 views21 pages

Data Management 1 Merged 1

Uploaded by

ranchiematters

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views21 pages

Data Management 1 Merged 1

Uploaded by

ranchiematters

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

DATA MANAGEMENT (STATISTICS)

Objectives
1. Recognize the basic terms of statistics.
2. Determine and apply the measures of central tendency, variability, and
position.
3. Apply the measures of central tendency, and variability in normal
distribution.
4. Determine the linear regression and correlation of the set of data.

Lesson Proper
Introduction
Data management is a process by which information is acquired and processed to ensure the
accessibility and reliability of the data for its users. One of the most important tools in processing and
managing such information is statistics. Statistics is utilized in most areas of human endeavor. It is
usually used in education, research, business, agriculture, and other fields and even in everyday life
activities.

LESSON 1. Statistics Introduction and Definition

The science of conducting studies to collect, organize, summarize, analyze, and draw
conclusions from data is called statistics. It is used in almost all fields of human
endeavor such as sports, education, health, research, and among others. Statistical
analysis is used to manipulate, summarize, and investigate data for a useful decision
– making information results.
Data or the pieces of information maybe collected by conducting a survey, interview, observation, and
experiment. The data gathered can be properly organized and presented graphically by a line graph,
bar graph or pictograph or with the aid of a statistical table known as frequency distribution table
(FDT). A concise and meaningful conclusion is obtained from the analysis and interpretation of data.
Relevant information can be deduced from the analysis of numerical descriptions and predictions may
be made based on a small group to project the whole population. The work of statistics offers a wide
area of concern. Thus, statistics is subdivided into two branches, namely: descriptive statistics and
inferential statistics.

Sir Ronald Aylmer Fisher (February 17, 1890 -July

29, 1962), British statistician and geneticist who
pioneered the application of statistical procedures
to the design of scientific experiments. He is
consider as the Father of Modern Statistics.

In 1990, he was awarded a scholarship to study

mathematics at University of Cambridge. In 1992,
he graduated from B.A. in Astronomy, and he
continue to study astronomy and physics at the
university, and study the theory of errors which
connects him to statistics.
From 1914 to 1919, he taught high school
mathematics and physics while continuing his
research in statistics and genetics. In 1918, he
published an important paper where
Source: he used powerful statistical tools to reconcile
https://www.adelaide.edu.au/library/special/mss/fisher
inconsistencies between Charles Darwin’s ideas of natural selection and rediscovered
experiments of Australian botanist Gregor Mendel.
In 1919, he became statistician for the Rothamsted Experimental Station and did statistical
work associated with plant – breeding experiments which led to theories about gene
dominance and fitness. From 1943 until 1957, he was Balfour Professor of Genetics at
Cambridge. He investigated the linkage of genes for different traits and developed methods of
multivariate analysis to deal with such questions. To avoid bias in selection of experiment
materials (inaccurate and misleading), he introduced principle of randomization. In this way,
random selection is used to diminish the effects of variability in experimental materials.
One of the most important achievements of Fisher is the concept of analysis of variance
or ANOVA.
Types of Statistics
1. Descriptive Statistics
Consists of methods for collection, organization and summarization, and presentation of
data/ information
Example: construction of graphs, charts, and tables and the calculation of various descriptive
measures such as averages, measures of variation, and percentiles

2. Inferential Statistics
Consists of methods for drawing and measuring the reliability of conclusions
about a population based on information obtained from a sample of the
population.
After collection, organization, summarization, and presentation of data (descriptive), inferential
statistics is used to determine the findings and draw conclusions, respectively.

This denotes, that descriptive statistics and inferential statistics are interrelated. Use
descriptive statistics to organize and summarize the obtained information from sample before
carrying out an inferential statistic.

Descriptive statistics leads us to appropriate inferential method.

Population and Sample

Population
The collection of all individuals or items under consideration in a statistical study.
Sample
That part of the population from which information is obtained.
For example, in a certain study about Statistics University with 6,589 students. The 6,589 students is
the population. Hence, if the researcher randomly selected class A with 44 students, the 44 students is
the sample. Sample is the representative of the population.

Before we through the discussions, let use first define some basic operational terms
in statistics:
Variable – a characteristic or attribute that can assume different values. Any characteristic,
number, or quantity that can be measured or counted. It is also called data item.
Collected information for variables, describe the situation.
Example. Age, sex, business income and expenses, birth, expenditure,
class grades, eye color, and among others

Types of Variables
1. Numeric Variables/ Quantitative Variables
Have values that describe a measurable quantity as a number, like ‘how many’
or ‘how much’. These are that quantifiable variables. Data collected in
numeric variable is called quantitative data.
a. Continuous Variable
Observations can take any value between a certain set of real numbers. The value
given to an observation for a continuous variable can include values as small as the
instrument of measurement allows.
Examples: height, time, age, and temperature
3
Height can be 1.62m, time can be 3.5hours (3 hours and 30 minutes), age can be16 4
2
years old (16 years and 9 months), and temperature can be 36 5 ℃ 𝑜𝑟 36.40℃

b. Discrete Variable
Observations can take a value based on a count from a set of distinct whole values. A
discrete variable cannot take the value of a fraction between one value and the
next closest value.
Examples: number of registered cars, number of business locations,
and number of children in a family, all of which measured
as whole units (i.e. 1, 2, 3 cars)

2. Categorical Variables/ Qualitative Variables

Have values that describe a 'quality' or 'characteristic' of a data unit, like 'what type' or 'which
category'. Categorical variables fall into mutually exclusive (in one category or in another) and
exhaustive (include all possible options) categories. Therefore, categorical variables are
qualitative variables and tend to be represented by a non-numeric value. Data collected is
called qualitative data.
a. Ordinal Variable
Observations can take a value that can be logically ordered or ranked.
The categories associated with ordinal variables can be ranked higher or lower
than another, but do not necessarily establish a numeric difference between
each category.
Examples: academic grades (i.e. A, B, C), clothing size (i.e. small,
medium, large, extra-large) and attitudes (i.e. strongly agree,
agree, disagree, and strongly disagree)
b. Nominal Variable
Observations can take a value that is not able to be organized in a logical
sequence.
Examples: sex, business type, eye color, religion and brand

Source: Australian Bureau of Statistics (2013)

Data

Data – values (measurements or observations) that the variables can assume.

Variables whose values are determined by chance are called
random variables.
Data Set – collection of data
Data Value or Datum – each value in the data set
Quantitative data – data from numeric/ quantitative variables; quantifiable data
Qualitative data – data from categorical/ qualitative variables; non – numeric
Discrete data – data from discrete variables; non – fraction data
Continuous data – data from continuous variable; data from the set of real numbers.

For example, the grades of 5 students in Statistics are 94, 75, 82.5, 74.9, and 89.
From the example above, the grades of students is the variable. Under numeric variable, it
classified as continuous variable since it can be represented by decimal or fraction.
Furthermore, 94, 75, 82.5, 74.9, and 89 is the data set. Each value is the data value or datum
(e.g. 94 is data value or datum). These data are continuous data since it can be from a set of
real numbers.

Moreover, variables can also be classified by how they are categorized besides qualitative and
quantitative data – measurement scales/ level of measurement.

Level of Measurement
1. Nominal level of measurement
Classifies data into mutually exclusive (no overlapping) categories in which no order
or ranking can be imposed on the data. Nominal data are countable.
Example: gender, zip codes; political party; religion; nationality

2. Ordinal level of measurement

Classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. Contain more information. Consists of
distinct categories in which order is implied. Values in one category are larger
or smaller than values in other categories (e.g. rating-excelent, good, fair, poor)
Example: evaluation (superior, average, poor); ranking (first, second, etc.); letter
grades (A, B, C, D, E, F)

3. Interval level of measurement

Ranks data, and precise differences between units of measure do exist; however, there
is no meaningful zero. Set of numerical measurements in which the distance between
numbers is of a known, constant size.
Example: IQ level; temperature

There is a meaningful difference of 1 point between an IQ of 109 and an IQ of 110.

Temperature is another example of interval measurement, since there is a meaningful
difference of 1°F between each unit, such as 72 and 73°F.

One property is lacking in the interval scale: There is no true zero. For example, IQ tests
do not measure people who have no intelligence. For temperature, 0°F does not mean
no heat at all.

4. Ratio level of measurement

Possesses all the characteristics of interval measurement, and there exists a true
zero or non - arbitrary zero point. In addition, true ratios exist when the same
variable is measured on two different members of the population. Consists of
numerical measurements where the distance between numbers is of a known,
constant size
Example: height; weight; area; number of phone calls
There exists a true zero or non - arbitrary zero point, zero weight, height, area, or
phone calls is meaningful, it could implies that the thing does not exist.

For example, if one person can lift 200 pounds and another can lift 100 pounds, then
the ratio between them is 2 to 1. Put another way, the first person can lift twice as
much as the second person.

There is not complete agreement among statisticians about the classification of data into one
of the four categories. For example, some researchers classify IQ data as ratio data rather
than interval. Also, data can be altered so that they fit into a different category. For instance,
if the incomes of all professors of a college are classified into the three categories of low,
average, and high, then a ratio variable becomes an ordinal variable.

LESSON 2. Data Collection and Sampling Techniques

Developed to mathematically determine the most effective way to acquire a sample that
would accurately reflect the population of the study.

The most common mathematical formula to determine the number of sample in reference to
population is the Slovin’s Formula which is introduced by Slovin in 1960. To this day, it is still
unknown who really Solvin is, many names associated either Mark Slovin, Michael Slovin, or
Kulkol Slovin.

Slovin’s Formula

𝑁
𝑛=
1 + 𝑁𝑒 2

where:

n is the sample size

N is the population size
e is the margin of error (e.g. 0.01, 0.05, 0.1, etc)

Use Slovin’s formula if you have no idea about the population’s behavior. Slovin’s formula
determines sample in proportion to the population. Slovin’s formula is applicable only
when estimating a population proportion and when the confidence coefficient is 95%.
There are other sampling formula that could be used to determine samples in relation to
the characteristics of the variables.

In most educational and scientific researches, 0.05 margin of error (level

of significance is used most of the times.

Margin of error tells how many times percentage points your results will differ
from the real population. For example, 0.05 (5%) level of significance which
implies 0.95 (95%) confidence level to the real population value.

Example: Assuming a certain is to be conducted to a certain community with 6,518 residents.

Determine the number of respondents of the study with 5% level of significance using Slovin’s
formula.
Solution: From the assumption, 6,518 is the population size, and 0.05 (5%) is the margin
of error. Therefore:
𝑁
𝑛=
1 + 𝑁𝑒 2

6,518
𝑛=
1 + 6,518(0.05)2

6,518
𝑛=
1 + 6,518(0.0025)

6,518
𝑛=
1 + 16.295

6,518
𝑛=
17.295

𝑛 = 376.87 ≈ 377

This implies that using Slovin’s formula, the given’s sample size is 377 (respondents).

Sampling Techniques
Sampling techniques are methods of identifying who will be the respondents of the study
(sample). For instance, in the previous example, how to identify the 377 respondents? Here
comes the sampling techniques.

Types of Sampling Techniques

1. Probability/ Random Sampling Techniques
All members of the population have an equal chance of being selected to be part of
the sample.
a. Simple Random Sampling Technique (e.g. fishbowl method or lottery method,
table of random numbers, or computer) In this method, names will be placed
inside a bowl or box, then the target respondents will be picked by one by one
until the target number of respondents is obtained.

Example: If we are going to select 5 of 10 using simple random sampling,

names or code of the 10 members of population will be placed inside
the bowl and box. Then, 5 names or codes will be pick one by one.
Then the selected 5 will be the respondents.

i. Simple Random Sampling Technique with Replacement

In here, each of them can be selected or pick up more than once because
their names will be put back after they are picked.

ii. Simple Random Sampling Technique without Replacement

Here, if their names or codes are picked it will not be placed again in the bowl
or box.

b. Systematic Random Sampling Technique

Obtained by selecting every kth member of the population where k is a counting
number.

Example: For the sake of illustration let us limit the population size. Suppose 10
population size is 10, and the sample is 5. How can we obtain the 5
samples?
Solution: Step 1. Divide the population size by sample size.
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
10
=
5
=𝟐
This implies that every 2nd will be selected.
Step 2. To start arrange the population in order, and
randomly select the starting first sample.

Assuming, by simple random selection (fishbowl or

lottery, we have chosen 4. So 4 is the first sample.

1 2 3 4 5 6 7 8 9 10

From 4, every 2nd will be selected until 5 target samples is obtained. So:

1 2 3 4 5 6 7 8 9 10
1st 2nd 1st 2nd 1st 2nd

Our samples are 4, 6, 8, and 10, but we have only 4 samples,

5 samples is not obtained. We need another sample.
Continue counting in cycle. Implies:

4 5 6 7 8 9 10 1 2 3
1st 2nd 1st 2nd 1st 2nd 1st 2nd 1st
Target 5 samples is now obtained: 4th, 6th, 8th, 10th, and 2nd

c. Stratified Random Sampling Technique

Obtained by dividing the population into subgroups or strata according
to some characteristic relevant to the study. (There can be several
subgroups.) Then subjects are selected at random from each subgroup.

Example: The town has 250 homeowners of which 25, 175, and 50 are
upper income, middle income, and low income, respectively.
Explain how we can obtain a sample of 20 homeowners,
using stratified sampling with proportional allocation,
stratifying by income group.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 9
Solution:
Step 1. Divide the population into subpopulations (strata).
Stratum 1: upper income (25)
Stratum 2: middle income (175)
Stratum 3: lower income (50)

Step 2. From each stratum, proportionate the sample size.

𝑠𝑡𝑟𝑎𝑡𝑢𝑚 𝑠𝑖𝑧𝑒
(𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒) ( )
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒
25
Stratum 1: upper income (25) 20 ∙ 250 = 2
175
Stratum 2: middle income (175) 20 ∙ 250 = 14
50
Stratum 3: lower income (50) 20 ∙ 250 = 4

Step 3. Use all members obtained in Step 2 as the sample.

This implies that, upper income have 2 samples, middle income
have 14, and lower income have 4 samples.

Samples from each stratum can be obtained by either simple

random sampling or systematic random sampling.

Interpretation: This stratified sampling procedure ensures that no

income group is missed. It also improves the precision
of the statistical estimates (because the homeowners
within each income group tend to be homogeneous)
and makes it possible to estimate the separate
opinions of each of the three strata (income groups).

d. Cluster Random Sampling Technique

Obtained by dividing the population into sections or clusters and then
selecting one or more clusters at random and using all members in the
cluster(s) as the members of the sample. Groups or cluster could be by
geographic area or schools in large district. Cluster sampling is used
when the population is large or when it involves subjects residing in a
large geographic area.

Example: To save time, the planner decided to use cluster sampling.

The residential portion of the city was divided into 947
blocks, each containing 20 homes. Explain how the planner
used cluster sampling to obtain a sample of 300 homes.
Solution:
Step 1. Divide the population into groups (cluster)

The planner used the 947 blocks as the clusters, thus dividing
the population (residential portion of the city) into 947 groups.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 10
Step 2. Obtain a simple random sample of the clusters.

The planner numbered the blocks (clusters) from 1 to 947 and

then used a table of random numbers to obtain a simple
random sample of 15 of the 947 blocks.

Step 3. Use all the members of the clusters obtained in Step 2 as the
sample

The sample consisted of the 300 homes comprising the 15

sampled blocks:

15 blocks × 20 homes per block = 300 homes.

Interpretation. The planner used cluster sampling to obtain a sample of

300 homes: 15 blocks of 20 homes per block. Each of the
three interviewers was then assigned 5 of these 15
blocks. This method gave each interviewer 100 homes
to visit (5 blocks of 20 homes per block) but saved much
travel time because an interviewer could complete the
interviews on an entire before driving to another
neighborhood. The report was finished on time.

e. Multi-Stage Sampling Technique

Most large-scale surveys combine one or more of simple random
sampling, systematic random sampling, cluster sampling, and stratified
sampling. This is frequently by pollsters and government agencies.

2. Nonprobability/ Nonrandom Sampling Techniques

In these techniques, all members of the population have no equal chance of
being selected to be part of the sample.

a. Convenience or Accidental Sampling Technique

The use of most convenient way of determining the samples.

For instance, a survey about Facebook users, to select respondents

using convenience sampling technique, the researcher
could send private message to online Facebook friends.
Not all Facebook friends have equal chance to part of the
sample because what if the person is offline, therefor he/
she has no chance to be part of the respondents.

b. Quota Sampling Technique

Ensures equal or proportionate representation of the subjects,
depending on which trait is considered as the basis of quota. The usual

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 11
bases of quota are age, gender, education, race, religion, & socio-
economic status.

Example: The basis of quota is college level & research needs equal
presentation with 100 as sample size. Researcher must
select 25 from each year level.

c. Volunteer or self –selected Sampling Technique

If a person decided to include themselves as part of the samples.

d. Purposive/ Purposeful or Judgmental/ Judgement or Selective or

Deliberate Sampling Technique
Researcher selects samples who fulfil the criteria as well as inclusion in
the population as per knowledge of the researcher.

For example, a study about experiences of post disaster depression

among people living in earthquake affected areas,
therefore the respondents are the people who are
victims of earthquake and suffering post disaster
depression.

e. Snowball/ Networking Sampling Technique

Used to identify potential subjects in studies where subjects are hard
to locate. Works like chain referral. This is also known as chain referral
sampling technique.

After observing the initial subject, the researcher asks for assistance
from the subject to help in identifying people with a similar trait of
interest. It is like asking subjects to nominate another with the same
trait. The same process is done until sufficient number of subjects is
obtained.

f. Expert Sampling Technique

Samples are chosen their expertise.

For example, a study about volcanoes, then you will consult

volcanologists.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 12
LESSON 3. Measures of Central Tendency

It is a descriptive measures that indicate where the center or most of the typical
value of the data set lies. This often called averages. There are three most important
measures of central tendencies: the mean, median and mode. The mean and median
apply only to quantitative data, whereas the mode can either be used in quantitative
or qualitative data.
Statistic – a characteristic or measure obtained by using data values from sample.
Parameter – a characteristic or measure obtained by using all the data values from a
specific population.

Data Classification
a. Ungrouped/ Small Data – if data is 30 and below.
b. Grouped/ Large Data – if data is more than 30.

a) Ungrouped/ Small Data

Suppose, Carmella’s scores in seven 100 - item tests are 78, 96, 85, 91, 70, 79, and 96.
Determine the mean, median, and mode.

1. Mean
It is the sum of the observations divided by the number of observations.
Among the three this is the most reliable. Also called average.

𝑥 – mean of sample. Read as x – bar.

𝜇 – mean of population. A Greek letter pronounce as mu.

∑ 𝑥 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛−1 + 𝑥𝑛
𝑥= =
𝑛 𝑛
∑ 𝑥 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑁−1 + 𝑥𝑁
𝜇= =
𝑁 𝑁
Where:
𝑥 is the individual datum,
𝑛 is the sample size,
𝑁 is the population size.

∑ 𝑥 78 + 96 + 85 + 91 + 70 + 79 + 96
𝑥= =
𝑛 7
595
𝑥= = 𝟖𝟓
7

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 13
The mean being described above is arithmetic mean. But, besides this, there
are other types of mean such weighted mean, and combined/ compound
mean.
2. Median
When data in increasing or decreasing order, it is the middle most number.
• If the number of observations is odd, then the median is the
observation exactly in the middle of the ordered list.
• If the number of observations is even, then the median is the mean of
the two middle observations in the ordered list.
In both cases, if we let n denote the number of observations, then the median
is at position (n + 1)/2 in the ordered list. Median is denoted by 𝑥̃ (read as x
Let us consider the given above, arrange the data in increasing order. Since
the number of data 7 which odd, it satisfy the first condition.
70 78 79 85 91 96 96
1st 2nd 3rd 4th 5th 6th 7th
𝑛+1 7+1 8
The middle most number is 85. Hence, the position is = = = 4, so
2 2 2
85 is the 4th term.
𝑥̃ = 85
To illustrate the 2nd condition if we have even number of data, let consider
the same given we will add another number, suppose the additional number
is 68.
68 70 78 79 85 91 96 96
1st 2nd 3rd 4th 5th 6th 7th 8th
Median is the average of the numbers at the center, 79 and 85, respectively.
79 + 85 164
𝑥̃ = = = 82
2 2
𝑛+1 8+1 9
The position is 2 = 2 = 2 = 4.5. The position of 82 as median is 4.5th. This
means that 82 is halfway between the 4th and the 5th term.
Median is also the most stable measures among the three because it is not
affected by outliers (extremes). Outliers are the data that are either extremely
high or extremely low.
Let us consider again the same example, but this time, we’re going to change
either of the highest or lowest or both.
70 78 79 85 91 96 96
From the given, 85 is the median.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 14
1 78 79 85 91 96 96
We changed the lowest number from 70 to 1, but the median is still 85.
70 78 79 85 91 96 500
We replaced the highest from 96 to 500, still the median is 85.
1 78 79 85 91 96 500
We replaced both the lowest and highest, still the median is 85.
3. Mode
• The most frequent data.
• If no value occurs more than once, then the data set has no mode.
• Otherwise, any value that occurs with the greatest frequency is a mode
of the data set.
• Denoted by 𝑥̂ (read as x – hut).

The given data above is:

70 78 79 85 91 96 96
There are two 96 and other data appear only once. Therefore, the mode is 96
(unimodal).
𝑥̂ = 96
What if the set of value is:
70 79 79 85 91 96 96
Both 79 and 96 appeared twice and other data appeared once, therefore the
modes are 79 and 96 (Bimodal), respectively.
𝑥̂ = 79 & 96
Types of Modes
• Unimodal - one mode
• Bimodal -two modes
• Trimodal - three modes
• Multimodal -4 and above number of modes
The situation above shows that mean, median, and mode are 85, 82, and 96,
respectively.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 15
b) Grouped/ Large Data
The ages of the first 50 persons who enter the mall were tallied, as shown below.
Determine the mean, median, and mode of their ages.
Age Frequency
10 – 19 5
20 – 29 20
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50
From the table above, age is the classes.
1. Mean
∑ 𝑓𝑥 𝑓1𝑥1 + 𝑓2 𝑥2 + 𝑓3 𝑥3 + ⋯ + 𝑓𝑛−1 𝑥𝑛−1 + 𝑓𝑛 𝑥𝑛
𝑥= =
𝑛 𝑛
∑ 𝑓𝑥 𝑓1 𝑥1 + 𝑓2 𝑥2 + 𝑓3 𝑥3 + ⋯ + 𝑓𝑁−1 𝑥𝑁−1 + 𝑓𝑁 𝑥𝑁
𝜇= =
𝑁 𝑁
Where:
𝑥 is sample mean
𝜇 population mean
𝑛 is the sample size
𝑁 is the population size
𝑓 is class frequency
𝑥 is class mark

To start let us first complete the table below. In each class, for instance, class
10 – 19, the smaller value is the lower limit which 10 (in the given class), and
upper limit which 19 (in the given class). Class mark is the average of the lower
limit and upper limit of the class. In lowest class (class with lowest values), the
class mark is:
10 + 19 29
= = 14.5
2 2
You could do the same process the other class. But, there is alternative way to
continue the process by use of class interval.
𝐶𝑙𝑎𝑠𝑠(𝐴𝑔𝑒) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓) 𝐶𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 (𝑥) 𝑓𝑥
10 – 19 5 14.5
20 – 29 20
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50

Class interval is the difference succeeding lower limits or difference of

succeeding upper limits. For example, 20 and 10 are lower limits of two

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 16
succeeding classes, and 19 and 29 are upper limits of two succeeding class.
Their difference is the class interval, such:
20 – 10 = 29 − 19 = 10
This also true to other classes.
To continue, just add the class interval to the initial class mark.
𝐶𝑙𝑎𝑠𝑠(𝐴𝑔𝑒) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓) 𝐶𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 (𝑥) 𝑓𝑥
10 – 19 5 14.5 5 ∙ 14.5 = 72.5
20 – 29 20 14.5+10=24.5 490
30 – 39 10 34.5 345
40 – 49 7 44.5 311.5
50 – 59 8 54.5 436
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

The column is the product of the frequency (f) and class mark (x). Add all the
product to get ∑ 𝑓𝑥. Hence, to get the mean:
∑ 𝑓𝑥 1,655
𝑥= = = 𝟑𝟑. 𝟏
𝑛 50
This implies that the average age who comes to mall is more or less 33 years
old.
2. Median
𝑛
− 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑚𝑒 + (2 )𝑖
𝑓𝑚𝑒
Where:
𝑥̃ is the median
𝐿𝐵𝑚𝑒 lower boundary of the median class
𝑛 is the sample size
𝑐𝑓𝑏 is the summation of frequencies before the median class (lower
classes of median class). 𝑐𝑓 stands for cumulative frequency.
𝑓𝑚𝑒 is the frequency of the median class
𝑖 is the class interval
Let us use the previous results. Add another column for summation of
frequencies. If you’re going only to find the median, you can disregard the 3 rd
column (class mark) and 4th column (fx).
𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑓 𝑥 𝑓𝑥 cf
10 – 19 5 14.5 72.5 5
20 – 29 20 24.5 490 5+20=25
30 – 39 10 34.5 345 25+10=35
40 – 49 7 44.5 311.5 35+7=42
50 – 59 8 54.5 436 42+8=50
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 17
𝑛
− 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑚𝑒 + (2 )𝑖
𝑓𝑚𝑒
Divide first the sample size into 2.
𝑛 50
= = 𝟐𝟓 (𝟐𝟓𝒕𝒉 𝒕𝒆𝒓𝒎)
2 2
Observe the last column, class 10 -19 has 1st to 5th terms. Hence, class 20 – 29
has the 6th to 25th terms, then class 30 – 39 has the 26th to 35th terms, and so
on. Since the 25th term belongs to class 20 – 29, therefore the median class will
the class 20 – 29.

𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑓 𝑥 𝑓𝑥 ∑𝑓
10 – 19 5 14.5 72.5 5
20 – 29 20 24.5 490 25
30 – 39 10 34.5 345 35
40 – 49 7 44.5 311.5 42
50 – 59 8 54.5 436 50
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

𝑓𝑚𝑒 𝑐𝑓𝑏

The last variable with no value yet is 𝐿𝑚𝑒 . This is the average of the lower
boundary of the median class which 20 in this case and upper boundary of the
lower class before the median class which is 19 in this case. So:
19 + 20 39
𝐿𝐵𝑚𝑒 = = = 19.5
2 2
Then, compute for the median of the given. The value of class interval (𝑖 ) is 10
the same as what we used earlier to determine the mean.
𝑛
− 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑚𝑒 + (2 )𝑖
𝑓𝑚𝑒
50
−5
𝑥̃ = 19.5 + ( 2 )8
20
25 − 5
𝑥̃ = 19.5 + (
)8
20
20
𝑥̃ = 19.5 + ( ) 8
20
𝑥̃ = 19.5 + (1)8
𝑥̃ = 19.5 + 8 = 𝟐𝟕. 𝟓
Therefore, the median is 27.5. The middle most age is more or less 28 years.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 18
3. Mode
𝑓𝑚𝑜 − 𝑓𝑏
𝑥̂ = 𝐿𝐵𝑚𝑜 + ( )𝑖
2𝑓𝑚𝑜 − 𝑓𝑏 − 𝑓𝑎
Where:
𝑥̂ is the mode
𝐿𝐵𝑚𝑜 is the lower boundary of the modal class
𝑓𝑚𝑜 is the frequency of the modal class
𝑓𝑏 is the frequency before the modal class or frequency of
immediate lower class than modal class
𝑓𝑎 is the frequency after the modal class or frequency of
immediate higher class than modal class

Class 20 – 29 has the highest frequency, immediately that is the modal class.
In case two or more have the highest equal frequencies, therefore the classes
with the highest equal frequency are modal classes.

𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑓 𝑥 𝑓𝑥 ∑𝑓
𝑓𝑏
10 – 19 5 14.5 72.5 5
20 – 29 20 24.5 490 25
30 – 39 10 34.5 345 35
40 – 49 7 44.5 311.5 42
50 – 59 8 54.5 436 50
𝑓𝑎
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

𝑓𝑚𝑜

The frequency of the modal class is 20. The frequency of class before modal class
(lower class immediately next to modal class) is 5. Hence, the frequency of the class
after the modal class (higher class immediately next to modal class) is 10. The class
interval is also 10 (like in the mean and median). Lower limit of the modal class is the
same process as the lower limit of the median class. The average of lower limit of the
modal class and upper limit of the immediate lower class next to modal class.
20 + 19 39
𝐿𝐵𝑚𝑜 = = = 19.5
2 2
Then, compute for the mode.
𝑓𝑚𝑜 − 𝑓𝑏
𝑥̂ = 𝐿𝐵𝑚𝑜 + ( )𝑖
2𝑓𝑚𝑜 − 𝑓𝑏 − 𝑓𝑎
20 − 5
𝑥̂ = 19.5 + ( )8
2(20) − 5 − 10
15
𝑥̂ = 19.5 + ( )8
40 − 15
15
𝑥̂ = 19.5 + ( ) 8
25

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 19
3
𝑥̂ = 19.5 + ( ) 8
5
24
𝑥̂ = 19.5 +
5
𝑥̂ = 19.5 + 4.8
̂ = 𝟐𝟒. 𝟑
𝒙

The mode is 24.3. Most of the age who enter the mall is more or less 24 years old.

References
Almukkahal, R., et. al. (2016). CK-12 Advanced Probability and Statistics Concepts.
Flexbook: next generation textbook.
Australian Bureau of Statistics (2013). What is Variable? Retrieved 04 June 2020 from
https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+langu
age+-
+what+are+variables#:~:text=A%20variable%20is%20any%20characteri
stics,type%20are%20examples%20of%20variables.
Bluman, A. G. (2018). Elementary Statistics: A Step by Step Approach , Tenth Edition,
ISBN 978 – 1 – 259 -75533 McGraw – Hill Education, New York City, USA.
Retrieved 03 June 2020 from https://b-ok.asia/book/5009088/f236d3
Dataceuticc, Inc. (2018). Sir Ronald Aylmer Fisher – The Father of Modern Statistics.
Retrieved 06 June 2020 from
https://www.dataceutics.com/blog/2018/7/24/sir-ronald-aylmer-fisher-
the-father-of-modern-statistics
Encyclopedia Britanica, Inc. (2020). Sir Ronald Aylmer Fisher. Retrieved 06 June
2020 from https://www.britannica.com/science/physical-anthropology
Gupta, S. (2014). Sampling Methods. Retrieved 06 June 2020 from
https://www.slideshare.net/shubhanshug1/seminar-sampling-
methods?qid=d1f11eda-cdd5-44b8-81de-
f0cd88637e6e&v=&b=&from_search=1
Ratner, B. (2009). The correlation coefficient: Its values range between +1/−1, or do
they?. Spring Nature Switzerland. Retrieved 17 June 2020 from
https://doi.org/10.1057/jt.2009.5
Tejada, J.J. & Punzalan, R. B. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician, Vol. 61, No. 1, pp. 129 – 136. Retrieved 06 May 2020 from
https://www.psai.ph/docs/publications/tps/tps_2012_61_1_9.pdf
Weiss, N. A. (2012). Elementary Statistics, 8th Edition, ISBN 978 – 0- 321 – 69123 - 1.
Pearson Education, Inc., Boston, USA. Retrieved 03 June 2020 from https://b-
ok.asia/book/1236722/d339a2
http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 20
Appendix A

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 21
http://onlinestatbook.com/2/calculators/normal_dist.html

Appendix B.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 22
http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), [email protected] | 23

Sampling Techniques
No ratings yet
Sampling Techniques
22 pages
Inquiries, Investigations and Immersion: Quarter 3 - Module 7: Population and Sampling Methods
74% (23)
Inquiries, Investigations and Immersion: Quarter 3 - Module 7: Population and Sampling Methods
28 pages
Statistics
No ratings yet
Statistics
45 pages
National Family Health Survey (NFHS-5) 2019-21 Vol-II
No ratings yet
National Family Health Survey (NFHS-5) 2019-21 Vol-II
229 pages
Statistics
No ratings yet
Statistics
81 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
41 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Statistical Analysis (Lecture 1)
No ratings yet
Statistical Analysis (Lecture 1)
40 pages
Educational Research MCQS PDF
95% (21)
Educational Research MCQS PDF
51 pages
Chapter 2 Sampling and Sampling Distribution
No ratings yet
Chapter 2 Sampling and Sampling Distribution
23 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
Std121-121e - Business Statistics Course Booklet 2023
No ratings yet
Std121-121e - Business Statistics Course Booklet 2023
82 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
MMW Module 4
No ratings yet
MMW Module 4
54 pages
Basic Concepts in Statistics-Aggie
No ratings yet
Basic Concepts in Statistics-Aggie
55 pages
Part 1 Notes AGB Unit1
100% (1)
Part 1 Notes AGB Unit1
17 pages
1-Introduction To Statistics
100% (1)
1-Introduction To Statistics
19 pages
Basics of Business Statistics
100% (1)
Basics of Business Statistics
66 pages
Math 101 Course Notes
100% (1)
Math 101 Course Notes
166 pages
Chapter-1 (Introduction To Biostatistics)
No ratings yet
Chapter-1 (Introduction To Biostatistics)
30 pages
SPSS for you
From Everand
SPSS for you
A Rajathi
4.5/5 (4)
Final AB 19-21 PIM3 Basics of Business Statistics
No ratings yet
Final AB 19-21 PIM3 Basics of Business Statistics
37 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Unit 2
No ratings yet
Unit 2
72 pages
Section 6 Data - Statistics For Quantitative Study
No ratings yet
Section 6 Data - Statistics For Quantitative Study
142 pages
Stas Tics
No ratings yet
Stas Tics
129 pages
1 Nature of Statistics
No ratings yet
1 Nature of Statistics
33 pages
Self-Learning Module For Grade 11: Chapter Iii: Sampling and Sampling Distribution
100% (1)
Self-Learning Module For Grade 11: Chapter Iii: Sampling and Sampling Distribution
50 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
UU MBA SEM II Business Research Methods SLM 151 188
No ratings yet
UU MBA SEM II Business Research Methods SLM 151 188
38 pages
PIM3 - Basics of Business Statistics
No ratings yet
PIM3 - Basics of Business Statistics
37 pages
The QC Problem Solving Approach
67% (3)
The QC Problem Solving Approach
42 pages
STAT Module I Notes
No ratings yet
STAT Module I Notes
10 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
(Math 01) Basic Statistics
No ratings yet
(Math 01) Basic Statistics
9 pages
LS 01 - Basic Concept - Dispersion
No ratings yet
LS 01 - Basic Concept - Dispersion
57 pages
Math As A Tool Data Management Introduction and Central Tendency
No ratings yet
Math As A Tool Data Management Introduction and Central Tendency
12 pages
Lesson 01
No ratings yet
Lesson 01
6 pages
MP4
No ratings yet
MP4
118 pages
1 Introduction To Statistics
No ratings yet
1 Introduction To Statistics
89 pages
Note For Int To Statistics
No ratings yet
Note For Int To Statistics
24 pages
(Probability & Statistics) For BSCS (Lecture .1 & 2)
No ratings yet
(Probability & Statistics) For BSCS (Lecture .1 & 2)
13 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Statistics Introduction
No ratings yet
Statistics Introduction
26 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
Chapter 1 - 250119 - 072242
No ratings yet
Chapter 1 - 250119 - 072242
11 pages
MMW Module 4 Lesson 1
No ratings yet
MMW Module 4 Lesson 1
13 pages
1 Descriptive Part
No ratings yet
1 Descriptive Part
13 pages
Tareq Alodat Stat
No ratings yet
Tareq Alodat Stat
246 pages
STS Reviewer
No ratings yet
STS Reviewer
7 pages
Module 2
No ratings yet
Module 2
13 pages
Unit 9
No ratings yet
Unit 9
9 pages
Stat Report
No ratings yet
Stat Report
28 pages
Module - Data Management (Part 1)
No ratings yet
Module - Data Management (Part 1)
23 pages
ISO 8243 2013 Cigarettes - Sampling
No ratings yet
ISO 8243 2013 Cigarettes - Sampling
18 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
Written Report Gathering and Organizing Data
No ratings yet
Written Report Gathering and Organizing Data
13 pages
Business Statistics May Module
No ratings yet
Business Statistics May Module
72 pages
Statistics
No ratings yet
Statistics
3 pages
Chapter 2 Stat (MMW)
No ratings yet
Chapter 2 Stat (MMW)
13 pages
Summarize Topic in Statistical
No ratings yet
Summarize Topic in Statistical
5 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Dr.-Zakir Project Writing Part-2
No ratings yet
Dr.-Zakir Project Writing Part-2
57 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
PAS 111 Week 1
No ratings yet
PAS 111 Week 1
3 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
4 pages
Lesson 1 Basic Statistics
No ratings yet
Lesson 1 Basic Statistics
7 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Applied Mathematics Notes
No ratings yet
Applied Mathematics Notes
31 pages
Experimental and Quasi Expo
No ratings yet
Experimental and Quasi Expo
72 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
3 pages
Glossary of Research Methods
From Everand
Glossary of Research Methods
Dr. Awadhesh Kishore
No ratings yet
Data Management
No ratings yet
Data Management
7 pages
Math 403 Quiz 1 - Answer Key v1.0
No ratings yet
Math 403 Quiz 1 - Answer Key v1.0
9 pages
Advantages and Disadvantages
No ratings yet
Advantages and Disadvantages
5 pages
GSDF
No ratings yet
GSDF
75 pages
Lesson 1:: Basic Terminologies in Statistics
No ratings yet
Lesson 1:: Basic Terminologies in Statistics
3 pages
Chapter 7 Sampling Distributions
No ratings yet
Chapter 7 Sampling Distributions
37 pages
Tran Quoc Thanh GDH210825 5060 Assignment 2 Front Sheet Jan24
No ratings yet
Tran Quoc Thanh GDH210825 5060 Assignment 2 Front Sheet Jan24
31 pages
Module 1 - Population and Sampling
No ratings yet
Module 1 - Population and Sampling
6 pages
PDF Document
No ratings yet
PDF Document
24 pages
A Study On Consumer Buying Behaviour at The Time To Purchase Hero Honda Bike - Document Transcript
No ratings yet
A Study On Consumer Buying Behaviour at The Time To Purchase Hero Honda Bike - Document Transcript
22 pages
Effects of Commercial Banks Credit To Small Scale Industries On Economic Growth in Kenya
No ratings yet
Effects of Commercial Banks Credit To Small Scale Industries On Economic Growth in Kenya
19 pages
Nikita Sarels 22251423 Computer Info Systems 2
No ratings yet
Nikita Sarels 22251423 Computer Info Systems 2
8 pages
ICML - 2016 - Stratified Sampling Meets Machine Learning
No ratings yet
ICML - 2016 - Stratified Sampling Meets Machine Learning
10 pages
Research (Third Quarter)
No ratings yet
Research (Third Quarter)
3 pages

Data Management 1 Merged 1

Uploaded by

Data Management 1 Merged 1

Uploaded by

DATA MANAGEMENT (STATISTICS)

LESSON 1. Statistics Introduction and Definition

Sir Ronald Aylmer Fisher (February 17, 1890 -July

In 1990, he was awarded a scholarship to study

Descriptive statistics leads us to appropriate inferential method.

2. Categorical Variables/ Qualitative Variables

Source: Australian Bureau of Statistics (2013)

Data – values (measurements or observations) that the variables can assume.

2. Ordinal level of measurement

3. Interval level of measurement

There is a meaningful difference of 1 point between an IQ of 109 and an IQ of 110.

4. Ratio level of measurement

LESSON 2. Data Collection and Sampling Techniques

n is the sample size

In most educational and scientific researches, 0.05 margin of error (level

Example: Assuming a certain is to be conducted to a certain community with 6,518 residents.

Types of Sampling Techniques

Example: If we are going to select 5 of 10 using simple random sampling,

i. Simple Random Sampling Technique with Replacement

ii. Simple Random Sampling Technique without Replacement

b. Systematic Random Sampling Technique

Assuming, by simple random selection (fishbowl or

Our samples are 4, 6, 8, and 10, but we have only 4 samples,

c. Stratified Random Sampling Technique

Step 2. From each stratum, proportionate the sample size.

Step 3. Use all members obtained in Step 2 as the sample.

Samples from each stratum can be obtained by either simple

Interpretation: This stratified sampling procedure ensures that no

d. Cluster Random Sampling Technique

Example: To save time, the planner decided to use cluster sampling.

The planner numbered the blocks (clusters) from 1 to 947 and

The sample consisted of the 300 homes comprising the 15

15 blocks × 20 homes per block = 300 homes.

Interpretation. The planner used cluster sampling to obtain a sample of

e. Multi-Stage Sampling Technique

2. Nonprobability/ Nonrandom Sampling Techniques

a. Convenience or Accidental Sampling Technique

For instance, a survey about Facebook users, to select respondents

b. Quota Sampling Technique

c. Volunteer or self –selected Sampling Technique

d. Purposive/ Purposeful or Judgmental/ Judgement or Selective or

For example, a study about experiences of post disaster depression

e. Snowball/ Networking Sampling Technique

f. Expert Sampling Technique

For example, a study about volcanoes, then you will consult

a) Ungrouped/ Small Data

𝑥 – mean of sample. Read as x – bar.

The given data above is:

Class interval is the difference succeeding lower limits or difference of

You might also like