Class Module - APPLIED STATS - ICFV
Class Module - APPLIED STATS - ICFV
ECO 2209
1|Page
TABLE OF CONTENTS
Table of Contents
PREFACE 3
INTRODUCTION 4
DATA PRESENTATION 5
MEASURES OF CENTRAL TENDENCY 7
MEASURES OF VARIABILITY 7
PROBABILITY 8
NORMAL DISTRIBUTION 9
SIMPLE LINEAR REGRESSION 10
HYPOTHESIS TESTING 12
ANALYSIS OF VARIANCE 13
CHI-SQUARE TEST 13
REFERENCES 14
2|Page
PREFACE
This module attempts to illustrate statistical concepts and tools by using easy-to-
understand examples and exercises, in order for students to apply these to the
different business and economic problems. Moreover, statistical principles are, as
much possible, explain with words rather than formulas. The author hopes that this
module could help students enrolled in this subject, and them as future practitioners
in the field of business and economics, to appreciate the importance of “statistical
thinking” on transforming data into meaningful information that are indispensable in
the decision-making processes.
3|Page
Week 1
I. INTRODUCTION
2. Basic Concepts about Data. Data collected are either classified as quantitative or
qualitative.
a. Quantitative (numerical) – data, whose sizes are meaningful. These type of data
may be further classified into discrete or continuous
Discrete – those data that can be counted. These are values that can be put
into one to one correspondence with a subset of the set of counting numbers
(e.g. number of students enrolled in this subject).
b. Qualitative (categorical) – data that answer the question “what kind.” These data
can either be ordered or unordered.
c. Interval – data that has a zero point and no meaning (e.g. temperature
measured either in Celsius or Fahrenheit).
d. Ratio – It has all the features of an interval scale and requires absolute, fixed,
and non-arbitrary zero point (e.g. per capita GNP or GDP).
4|Page
It is imperative that one understands the importance of knowing the level of
measurement, because of the following reasons: (1) it helps one to decide
how to interpret data; and (2) it helps one to decide what statistical
analysis is appropriate on the assigned values.
Types of Data
a. Primary (Raw) – any set of data or information that directly collected from
the source (e.g. IATF’s COVID-19 statistics announced via national
television).
Weeks 2 and 4
5. Textual Form. Presenting the data in the form of words, sentences and
paragraphs. It allows one to present qualitative data that cannot be presented in
graphical or tabular forms (e.g. commonly used words during an interview). The
most common form of presenting data in textual form is the wordcloud (See
Figure 1). Recommended softwares: MS Excel (add-in function) and
Polleverywhere.com (web-based analytical tool).
5|Page
a. Line Graphs. This is used to display the continuous data and it is useful for
predicting future events over time (historical or time period analysis).
b. Bar Graphs. This is used to display the category of data and it compares the
data using solid bars to represent the quantities.
d. Frequency Table. Presents the data in summary form by aggregating the data,
in particular choosing suitable non-ovelapping classes, tallying, (or counting) the
data into these classes, and presenting them in tabular form.
e. Stem and Leaf Display. In this form, the data are organized from least value to
the greatest value. This can be constructed by splitting each data in two parts, a
stem (one or more of the leading digits) and leaf (which consists of the remaining
digits) (See Figure 2).
6|Page
Weeks 5 to 6
a. Mean. The most common used measure of central tendency is the average
(also called the mean or arithmetic mean). Simply, the mean or average of a
data set is the sum of the data divided by the number of data.
b. Mode. The value of a variable that occurs most frequently. It is also referred
as nominal average.
c. Median. The “middle observation” when the data set is sorted (in either
increasing or decreasing order), or also termed as the “central value of a
distribution.”
d. Percentiles, Quartiles. Quartiles are values that separate the sorted data
into four equal groups and percentiles are values that separate the sorted
data into 100 equal groups.
Weeks 7 to 8
Resort to the measures of central tendency, by using a single summary number such as
the mean, is not enough to provide a clear picture of the distribution of a list. Several lists
of data may have the same mean, but the spread of the lists may be different. Thus,
calculating other features of the data such as measure of spread or variation may also
be important. This can be done by computing for the following:
a. Range. The difference between the largest and smallest values in the list
(Formula: Range = largest or maximum value – smallest or minimum value)
b. Inter-quartile range. The difference between the upper and lower quartiles. A
five-number summary consists of the smallest value, the lower quartile, the
media, the upper quartile, and the largest value (Formula: Upper Quartile (Q3)
– First Quartile (Q1).
7|Page
Formula: Mean Absolute Deviation = ∑ | Xi – Mean| / no. of observations
Week 9
V. Probability
The formulas for nPk and nCk are called counting formulas since they can be
used to count the number of possible permutations or combinations in a
given situation without having to list them all.
8|Page
Week 9
2) mean and median are equal; both located at the center of the distribution;
9|Page
c. Measure of Skewness. In order to assess whether a distribution is skewed
or asymmetric, one may calculate some measure of shape. One method is
to compute for skewness, which can be obtained by computing for the
ration (Upper Quartile – Median) / (Median – Lower Quartile). Another
measure of skewness is the difference Mean – Median, which zero for
symmetric data, positive for right skewed data, and negative for left skewed
data.
Weeks 11 to 12
b.1. Using a table of Critical Values. The 95% Critical Values of the
Sample Correlation Coefficient Table can be used to give a good idea
of whether the computed value of r is significant or not. Compare r to
the appropriate critical value in the table. If r is not between the positive
and negative critical values, then the correlation coefficient is
10 | P a g e
significant. If r is significant, then one may want to use the line for
prediction.
Formula: Ŷ = â+ βX
∑ Xi ∑ Xi
â= −b =X −bX
n n
∑ XiYi−∑ Xi ∑Yi Sy
β= 2 2
=r
n ∑ X i −( ∑ X i )
2 Sx
11 | P a g e
Formula: SSR = ∑ ( ŷ ¿i− ȳ ) ¿
2
√
2
Formula: σest = ∑ ( yi− ŷi )
n
Weeks 13 to 14
12 | P a g e
reject the null hypothesis. In this case, the result is statistically
significant at the 5% level.
Weeks 15 to 16
Weeks 17 to 18
X. Chi-square Tests
13 | P a g e
determine the goodness of fit to a hypothesized distribution (Steps
will be discussed in class lecture).
REFERENCES:
Albert, J.R.G. (2008). Basic Statistics for the Tertiary Level. 1st edition. Rex Bookstore Inc.
Manila, Philippines.
14 | P a g e