Data Management
Data Management
Objectives/ Outcomes
4
USE & APPLICATION OF STATISTICS
7
The study of statistics can be organized in
different ways.
6
Population & Sample
A population is the collection of all individuals or items
under consideration in a statistical study. Parametric data
or parameter refers to the data gathered from a
population.
A sample is the part of the population from which
information is obtained. Statistical data is the data
gathered from a sample.
Sampling is selecting a group that best represents the
entire population.
Two Branches of Statistics
8
1. Descriptive Statistics deals with the collecting,
describing, and analyzing a set of data without
drawing conclusions about a large group of data.
Descriptive statistics are tabular, graphical, and
numerical summaries of data.
The purpose of descriptive statistics is to facilitate
the presentation and interpretation of data.
9
If a researcher is using data gathered on a group to
describe or reach conclusions about same group,
the statistics are called descriptive statistics.
10
There are a number of items that belong to descriptive
statistics, such as:
The average, or measure of the center of a data set,
consisting of the mean, median, mode, or midrange
The spread of a data set, which can be measured with
the range or standard deviation
Overall descriptions of data such as the
five number summary
Measurements such as skewness and kurtosis
The exploration of relationships and correlation
between paired data
The presentation of statistical results in graphical
form
11
2.Inferential Statistics is concerned
with the analysis of a subset of data
leading to predictions or inferences
about the entire set of data, without
dealing with each individual in the
population.
12
If a researcher gathers data from a sample and
uses the statistics generated to reach conclusions
about the population from which the sample was
drawn, the statistics are called inferential statistics.
13
There are a number of items that belong to inferential
statistics, such as:
A confidence interval gives a range of values for an
unknown parameter of the population by measuring a
statistical sample.
Tests of significance or hypothesis testing where
scientists make a claim about the population by
analyzing a statistical sample.
It includes linear regression analyses, logistic regression
analyses, ANOVA, correlation analyses,
structural equation modeling, and survival analysis.
14
Statistician and researchers are interested in
particular variables of a sample or population.
15
Two kinds of variables
1. Qualitative variables
are variables that can be placed into
distinct categories, according to some
characteristic or attribute.
2. Quantitative variables
are numerical and can be ordered or
ranked.
16
Two kinds of variables
Quantitative variables can be further
classified into two groups: discrete and
Discrete variablescontinuous.
variables that can take on distinct and separate
values finite
Continuous variables
variables that can assume an infinite number of
values in a given interval determined through
measurement instead of counting.
17
Two Ways to Describe
Group Performance
1. Measure of Central Tendency
2. Measure of Variability / Measure
of Dispersion
18
Measures of Central Tendency are used to
determine the average performance of a
group of scores. It provides a very
convenient way of describing a set of
scores with a single number that describes
the performance of a group.
19
A. Mean refers to the arithmetic average.
B. Median refers to the centermost score when
the scores in the distribution are arranged
according to magnitude.
C. Mode refers to the score/s that occurs most frequently in
the score distribution.
20
MEAN ()
It is the sum of all values in a data set divided by the number of the
values that are summed.
It is the middlemost value in the data set. It is found midway between the
highest value and the lowest value in a rank distribution and divides the
distribution into two equal parts.
= 5.05
MODE ()
It is the value which occurs most frequently or has the
highest frequency in the data set.
b. 3.4 2.2 3.5 3.4 2.2 2.6 2.1 3.9 2.2 3.4
= 3.4 and 2.2
c. 105 200 159 110 225 170 115 250 285 190
= No mode
Example 1:
Find the mean, median and mode of the given ungrouped data.
75, 86, 83, 82, 84, 90, 88, 83, 87, 91
Mean
Solution:
75+86+83+82+84+90+88+83+87+91 = 849/10 =
Mean = 84.9
28
Median
75, 82, 83, 83, 84, 86, 87, 88, 90, 91
Median= 84+86 = 170/2 = 85
Mode
75, 86, 83, 82, 84, 90, 88, 83, 87, 91
Mode= 83 ----- Unimodal
29
II. Measures of Variability indicate
the spread of scores in a group. It is a
single value that is used to describe
the spread of the scores in a
distribution. It is also known as
variation or dispersion.
30
There are many ways to describe variability or
spread including:
Range
Variance and Standard Deviation
31
1. Range- The range is the difference in the maximum
and minimum values of a data set. The maximum is the
largest value in the dataset and the minimum is the
smallest value. The range is easy to calculate but it is
very much affected by extreme values.
Find the range of the given ungrouped data.
75, 86, 83, 82, 84, 90, 88, 83, 87, 91
R = Highest Score- Lowest Score
R = 91-75
Range = 16
32
Example
R = 23 – 5
R = 18
2. Variance and Standard Deviation
Standard Deviation is the most important measure of
variation or dispersion. It is the average distance of all
the scores that deviates from the mean value.
A low standard deviation indicates that the data points
tend to be very close to the mean. A high standard
deviation indicates that the data points are spread out
over a large range of values.
34
Variance is one of the most important measures of
variation. It shows variation about the mean.
A small variance indicates that the data points tend to be
very close to the mean, and to each other. A high
variance indicates that the data points are very spread
out from the mean, and from one another. Variance is
the average of the squared distances from each point to
the mean.
35
VARIANCE
It is the average of the squared deviation of the values about mean.
POPULATION VARIANCE ()
=
Where x – individual value
– population mean
N – population size
SAMPLE VARIANCE ()
=
Where x – individual value
– sample mean
n – sample size
Population Variance Steps in Solving Variance of Ungrouped Data
∑ (𝒙 − 𝝁 )𝟐
1. Solve for the mean value.
𝟐
𝝈 =
𝑵 2. Subtract the mean value from each score.
3. Square the difference between the mean and
Sample Variance each score.
4. Find the sum.
𝒔 𝟐
=
∑ (𝒙 − 𝒙 )𝟐 5. Solve for the population variance or sample
𝒏 −𝟏
variance using the formula of ungrouped
data.
38
The standard deviation can be thought of as a "standard"
way of knowing what is normal (typical), what is very
large, and what is very small in the data set.
39
Standard Deviation
Population standard deviation
𝝈=
√ ∑ (𝒙 − 𝝁 )𝟐
𝑵
Steps in Solving Variance of Ungrouped Data
1. Solve for the mean value.
2. Subtract the mean value from each score.
3. Square the difference between the mean and each score.
Sample standard deviation
4. Find the sum of step 3.
5. Solve for the population standard deviation or sample
40
Population Variance Sample Variance
∑ (𝒙 − 𝝁 )
𝟐
𝒔
𝟐
=
∑ (𝒙 − 𝒙 )𝟐
𝝈 𝟐= 𝒏 −𝟏
𝑵
𝟏𝟖𝟔 𝟐 𝟏𝟖𝟔
𝒔 =
𝟐
𝝈 =
𝟏𝟎 𝟏𝟎 −𝟏
𝟐 𝟏𝟖𝟔
𝝈 𝟐=𝟏𝟖 .𝟔 𝒔 =
𝟗
𝒔 𝟐=𝟐𝟎 . 𝟔𝟕
41
Population Standard Deviation Sample Standard Deviation
𝟐
𝝈 =
√ ∑ (𝒙 − 𝝁 )𝟐
𝑵
𝟐
𝒔 =
√ ∑ (𝒙 − 𝒙 )𝟐
𝒏 −𝟏
𝝈 = 𝟐
√
𝟏𝟖𝟔
𝟏𝟎
𝟐
𝒔 =
√
𝟏𝟖𝟔
𝟏𝟎 −𝟏
𝝈 𝟐= √ 𝟏𝟖. 𝟔
𝝈 =𝟒 .𝟑𝟏
𝟐
𝒔 =
𝟏𝟖𝟔
𝟗√ s
42
Example
A sample of six street vendors along San Pedro St. were surveyed and
obtained their average daily income as follows : Php 560 Php
320 Php 440 Php 650
Php 200 Php 490
2660 / 6
443.33
Income (x) (x - ) (x - )²
200
320
440
490
560
650
=
Income (x) (x - ) (x - )²
200 200 – (443.33) = -243.33 59,209.49
320 320 – (443.33)= -123.33 15,210.29
440 440 – (443.33)= -3.33 11.09
490 490 – (443.33)= 46.67 2,178.09
560 560 – (443.33)= 116.67 13,611.89
650 650 – (443.33)= 206.67 42,712.49
2660 = 132,933.34
=
=
= 26, 586.67
STANDARD DEVIATION
The square root of the variance.
s=
s = 163.05
THANK YOU!!
48