0% found this document useful (0 votes)
7 views8 pages

ZGE1104 Chapter 4 Data Management-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

ZGE1104 Chapter 4 Data Management-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ZGE 1104

Volume 4
2022

MANALANG
Mathematics in the Modern World MENDOZA
ZGE1104 MATHEMATICS IN THE MODERN WORLD

Chapter IV. Data Management

A. Introduction to Statistics

Statistics is a branch of science pertaining to the methods of collecting/obtaining,


organizing, presenting, analyzing, interpreting data and then drawing conclusions
based on the data.

Descriptive Statistics tries to summarize or describe a collection of data. It is a set


of methods to describe data that we have collected.

Some of the most commonly used statistical treatments used are percentages,
measures of central tendency (such as the mean, median and mode), measures of
variation (such as range, average deviation, standard deviation, variance and
coefficient of variation) and measures of skewness and kurtosis.

Inferential Statistics is use to draw conclusions and make predictions based on the
analysis of numeric data. It is a set of methods used to make a generalization,
estimate, prediction or decision.

A pair of one measure of central tendency and one measure of variation can be use
to draw a conclusion, commonly used pair are mean and standard deviation.

Exercise: Classify whether the statement belongs to the area of Descriptive Statistics
and Inferential Statistics.

1. Ninety two percent of the class has age between 16-18 years.
2. Ninety five percent of the class may pass Basic Statistics.
3. According to the local survey, the top three popular courses are: Psychology
(23%), Hospitality (19%) and Computer-related course (10%).
4. The normal blood sugar level of human is 70 mg/dL to 120 mg/dL.
5. Drinking pineapple juice may boost our immune system.

1
ZGE1104 MATHEMATICS IN THE MODERN WORLD

In the study of study of statistics, two terms are commonly used: population and sample.

Population is defined as the complete or entire collection of elements (person or


things) to be studied while sample refers to the representative part or finite number
of elements chosen from the population.

In relation to population and sample, next is to differentiate parameter from


statistic.

Parameter is numerical value calculated from a population. Statistic is a number


that describes a set of observations in a sample.

Variables are characteristics or values that vary across individuals. It can be


qualitative or quantitative.

Qualitative Variables, also known as categorical variables, are used to represent


character, class or kind but not in amount. Some examples of qualitative variables
are gender, religion, nationality, favorite color and birthplace.

Quantitative Variables are variables that can be measured on a numeric or


quantitative scale. It can be classified as discrete or continuous.

Discrete uses natural numbers or counting numbers. Some examples of discrete


variables are number of students enrolled in STA111, number of iPad units in a
store and number of buildings in Metro Manila.

A quantitative variable is continuous if it uses decimals or fractions. Some


examples of continuous variables are height, weight, length, width and speed of a
bullet.

2
ZGE1104 MATHEMATICS IN THE MODERN WORLD

Levels of measurement are used to determine the statistical tool that can be used
to describe a data. There are four levels of measurement; these are Nominal,
Ordinal, Interval and Ratio.

The first level is called the Nominal level. In this level, names are assigned to
objects for the purpose of identifying or belonging to a group or category. The data
can not be arranged in an ordering system. Examples of data under this level are
religion, nationality or race, gender, birthplace and course.

The second level is the Ordinal level. In this stage, the words or numbers are
assigned to objects to represent the rank or order between them. It implies ranking,
order or inequalities. Examples are class rank, contest winners, degree of burn
and cancer stages.

Interval level is the third level of measurement. It refers to quantitative


measurements used to identify and rank but in this scale, differences between two
items can be determined and operations such as multiplication and division are
worthless. Interval scales do not have a true zero point. Example of an interval
data is temperature.

Lastly, fourth level of measurement is the Ratio level. It is similar to interval scale
but ratio has a true zero point and operations such as multiplication and division
are therefore significant. Examples of data under ratio are income, age, height,
weight, area and volume.

Sampling is the process of choosing elements, such as person, objects or groups


from a known population of interest to be included in a study in order to generate
a fair result. Sampling is done to reduce cost since it is less expensive conduct
survey in a sample than in whole population. Another advantage of using a sample
instead of a population is that in sampling, data can be obtained faster. Also,
greater scope and accuracy are expected since the volume of work in encoding
and computing will be reduced.
3
ZGE1104 MATHEMATICS IN THE MODERN WORLD

B. Measures of Central Tendency


Measures of Central Tendency are descriptive measures that are used to describe
the center of a set of data, arranged numerically. The three different types of “average”
will be discussed, the mean, the median and the mode.

The most commonly used to measure the central tendency is the mean. It is also
called the computed average. It is defined as the sum of the values divided by the total
number of items.

The median is the middle value in a set of data. The value which divides the
distribution into two equal parts, with one half of the values is lower than the median
and other half are higher than the median.

The third measure on central tendency is the mode. It is easily found by inspection. It
is a point on the distribution in which the frequency is higher than any other value.

A distribution with only one mode is called unimodal while f it has two modes, then it
is called bimodal. If it has more than two modes, the distribution is called multimodal.
The mode does not exist in a distribution if no value is repeated.

Exercise: Determine the mean, median and mode of the given set of data.

1. 8, 10, 13, 13, 16


2. 2, 5, 3, 8, 5, 7, 2
3. 12, 10, 15, 14, 11, 18
4. 1, 9, 10, 2, 9, 4, 2, 1
5. 3, 6, 4, 4, 6, 3, 6, 3, 4

Best use of the mean, median and mode.

The mean is computed if the values are in interval or ratio scale. The mean is
influenced by outliers that may be at the extremes of the data set. The median is
used for ordinal scale. Unlike the mean, the median is not influenced by outliers at
the extremes of the data set. The mode is practical for nominal data. In such cases,
the mode may not exist or may not be very meaningful.

4
ZGE1104 MATHEMATICS IN THE MODERN WORLD

C. Measures of Dispersion

Measures of Dispersion or Variability describes the spread or the scatterings of the


values around the mean.

The range is the difference between the highest and lowest value/observation.

The average deviation is the measure of the distance of each value to the mean.

 xx
The formula is given by: AD  where 𝑥̅ is the mean, 𝑥 are the values and 𝑛
n
is number of values.

Variance measures how much variability there is in the entire distribution. The
standard deviation is the most commonly used measure of dispersion. It is the
positive square root of the variance. The formulas are as follows:

 ( x  x)
2
 ( x  x)
2

s 2
 s
n 1 n 1

Variance Standard Deviation

D. Kinds of Distribution

In a symmetrical or normal distribution the mean, median, and mode all fall at the
same point or equal.

In a positively skewed distribution, the extreme scores are larger, thus the mean
is larger than the median.

In negatively skewed distribution, the order of the measures of central tendency


would be the opposite of the positively skewed distribution, with the mean being
smaller than the median, which is smaller than the mode.

5
ZGE1104 MATHEMATICS IN THE MODERN WORLD

Skewness measures the degree of symmetry of a distribution. One of the formulas of


3(mean  median) 3( x  md )
skewness is the given by Sk   .
s tan dard deviation s

When Sk = 0, the distribution is Normal or Symmetrical, when Sk > 0, the distribution


is Positively Skewed and when Sk < 0, the distribution is Negatively Skewed.

E. Hypothesis Testing

A statistical hypothesis is a conjecture concerning one or more population whose


veracity can be stablished using sample data.
Parametric tests are applied to data that are normally distributed. Moreover, it is
assumed that the measurement of variables are either interval or ratio level.
Nonparametric tests do not require a normal distribution and the variables of interest
are on nominal or ordinal level.

https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ahUKEwi9vdqMq7biAhUXZt4KHQSSD2IQjRx6BAg
BEAU&url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fmethods.sagepub.com%2Fbook%2Funderstanding-social-science-
research%2Fn10.xml&psig=AOvVaw3r5_gFGEBctk29Qmey76r_&ust=1558861859448779

6
ZGE1104 MATHEMATICS IN THE MODERN WORLD

F. Correlation

Correlation measures the strength of the linear association between two quantitative
variables: the independent variable and the dependent variable. The independent
variables are variables that can be manipulated or controlled while dependent
variables are those that cannot be controlled.

The most commonly used technique to calculate the coefficient of correlation is by


using the Pearson Product Moment Correlation Coefficient. The formula is given
by

NXY  XY
r
[ NX  X  ][ NY 2  Y  ]
2 2 2

where 𝑋 = the observed data from the independent variable, 𝑌 = the observed data
from the dependent variable, 𝑁 = sample size and 𝑟 = degree of relationship of x and
y

The range of the correlation coefficient is -1 and +1. If the value of the coefficient is
close to -1.00, it represents a perfect negative correlation while a value of +1.00
represents a perfect positive correlation. If the value is equal to 0.00, it means that
there is no relation between the variables.

References:
 Almeda, Josefina V. et.al. (2010). Elementary Statistics. Diliman, Quezon City: University of the Philippines Press.
 Aufmann R., et al (2018). Mathematical Excursions, Fourth Edition. USA: Cengage Learning.
 Bluman, A. G. (2009). Elementary statistics: A step by step approach. New York: McGraw-Hill Higher Education.
 Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability & statistics for engineers & scientists (9th edition.).
Boston: Prentice Hall.
 First Generation Training the Trainors (2016). Philippines: Ateneo De Manila University.
 Photo credits: Google Images

You might also like