0% found this document useful (0 votes)
171 views14 pages

Class Module - APPLIED STATS - ICFV

This document provides an outline for a course on applied statistics for business and economics. It covers key topics including data presentation, measures of central tendency, measures of variability, probability, the normal distribution, simple linear regression, hypothesis testing, and analysis of variance. The goal is to illustrate statistical concepts and tools using practical examples to help students apply these principles to business and economic problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views14 pages

Class Module - APPLIED STATS - ICFV

This document provides an outline for a course on applied statistics for business and economics. It covers key topics including data presentation, measures of central tendency, measures of variability, probability, the normal distribution, simple linear regression, hypothesis testing, and analysis of variance. The goal is to illustrate statistical concepts and tools using practical examples to help students apply these principles to business and economic problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

APPLIED STATISTICS FOR BUSINESS AND ECONOMICS

ECO 2209

Ivan Cassidy F. Villena

1|Page
TABLE OF CONTENTS

Table of Contents
PREFACE 3
INTRODUCTION 4
DATA PRESENTATION 5
MEASURES OF CENTRAL TENDENCY 7
MEASURES OF VARIABILITY 7
PROBABILITY 8
NORMAL DISTRIBUTION 9
SIMPLE LINEAR REGRESSION 10
HYPOTHESIS TESTING 12
ANALYSIS OF VARIANCE 13
CHI-SQUARE TEST 13
REFERENCES 14

2|Page
PREFACE
This module attempts to illustrate statistical concepts and tools by using easy-to-
understand examples and exercises, in order for students to apply these to the
different business and economic problems. Moreover, statistical principles are, as
much possible, explain with words rather than formulas. The author hopes that this
module could help students enrolled in this subject, and them as future practitioners
in the field of business and economics, to appreciate the importance of “statistical
thinking” on transforming data into meaningful information that are indispensable in
the decision-making processes.

3|Page
Week 1

I. INTRODUCTION

1. Definition of Statistics. Statistics (singular) involves the collection, organization,


analysis, and presentation of data that are subject to variability, and how these
data could be process into useful information that involve uncertainty.

2. Basic Concepts about Data. Data collected are either classified as quantitative or
qualitative.

a. Quantitative (numerical) – data, whose sizes are meaningful. These type of data
may be further classified into discrete or continuous

Discrete – those data that can be counted. These are values that can be put
into one to one correspondence with a subset of the set of counting numbers
(e.g. number of students enrolled in this subject).

Continuous – data that can be measured (e.g. Exact height of a student


enrolled in this subject).

b. Qualitative (categorical) – data that answer the question “what kind.” These data
can either be ordered or unordered.

3. Importance of Statistics. The primary objective of statistics is to transform data


into meaningful and useful information by describing, explaining, predicting
and/or control some phenomena of interest (e.g. determining whether the
vaccine is effective).

4. Types of data and Level of Measurement. To have a better understanding of


data, it is important to know the meaning of a variable. A variable is a
characteristic of a unit of observation or subject that can take on different
values for different units/subjects or for the same unit/subjects at different
periods.

a. Nominal – the simplest scale of measurement where a value or unit of data is


assigned to one of at least two qualitative classes of categories (e.g. sex, civil
status). It is exhaustive and mutually exclusive.

b. Ordinal – involves placement of values or codes in some rank/order to create


an ordinal scale variable (e.g. social class: upper, middle, and lower).

c. Interval – data that has a zero point and no meaning (e.g. temperature
measured either in Celsius or Fahrenheit).

d. Ratio – It has all the features of an interval scale and requires absolute, fixed,
and non-arbitrary zero point (e.g. per capita GNP or GDP).

4|Page
It is imperative that one understands the importance of knowing the level of
measurement, because of the following reasons: (1) it helps one to decide
how to interpret data; and (2) it helps one to decide what statistical
analysis is appropriate on the assigned values.

Types of Data

a. Primary (Raw) – any set of data or information that directly collected from
the source (e.g. IATF’s COVID-19 statistics announced via national
television).

b. Secondary – data that are directly provided by an organization or


government agency in convenient form such as written report or census
data, or data that are processed and re-processed by individuals or
entities from sources other than the primary source of information (e.g.
PSA’s Labor Force Survey and BSP’s Core Banking Statistics).

Weeks 2 and 4

II. DATA PRESENTATION

5. Textual Form. Presenting the data in the form of words, sentences and
paragraphs. It allows one to present qualitative data that cannot be presented in
graphical or tabular forms (e.g. commonly used words during an interview). The
most common form of presenting data in textual form is the wordcloud (See
Figure 1). Recommended softwares: MS Excel (add-in function) and
Polleverywhere.com (web-based analytical tool).

Figure 1. Example of a “Wordcloud”

6. Graphical Form. It is method commonly used to analyze numerical data. It is


used to present to the readers and audience the relation between data, ideas,
information and concepts in a diagram. There are different types of graphical
forms of presentation of data:

5|Page
a. Line Graphs. This is used to display the continuous data and it is useful for
predicting future events over time (historical or time period analysis).

b. Bar Graphs. This is used to display the category of data and it compares the
data using solid bars to represent the quantities.

c. Histograms. This is used to present the distribution of data. In statistics,


histogram is utilized to measure the “skewness” of data.

d. Frequency Table. Presents the data in summary form by aggregating the data,
in particular choosing suitable non-ovelapping classes, tallying, (or counting) the
data into these classes, and presenting them in tabular form.

e. Stem and Leaf Display. In this form, the data are organized from least value to
the greatest value. This can be constructed by splitting each data in two parts, a
stem (one or more of the leading digits) and leaf (which consists of the remaining
digits) (See Figure 2).

Figure 2. Example of a “Stem and Leaf” Graph

6|Page
Weeks 5 to 6

III. Measures of Central Tendency

Measure of central tendency is an index of the central location of a distribution. It is a


single value that is used to identify the center of the data or the typical value.

a. Mean. The most common used measure of central tendency is the average
(also called the mean or arithmetic mean). Simply, the mean or average of a
data set is the sum of the data divided by the number of data.

b. Mode. The value of a variable that occurs most frequently. It is also referred
as nominal average.

c. Median. The “middle observation” when the data set is sorted (in either
increasing or decreasing order), or also termed as the “central value of a
distribution.”

d. Percentiles, Quartiles. Quartiles are values that separate the sorted data
into four equal groups and percentiles are values that separate the sorted
data into 100 equal groups.

Weeks 7 to 8

IV. Measures of Variability/Variation

Resort to the measures of central tendency, by using a single summary number such as
the mean, is not enough to provide a clear picture of the distribution of a list. Several lists
of data may have the same mean, but the spread of the lists may be different. Thus,
calculating other features of the data such as measure of spread or variation may also
be important. This can be done by computing for the following:

a. Range. The difference between the largest and smallest values in the list
(Formula: Range = largest or maximum value – smallest or minimum value)

b. Inter-quartile range. The difference between the upper and lower quartiles. A
five-number summary consists of the smallest value, the lower quartile, the
media, the upper quartile, and the largest value (Formula: Upper Quartile (Q3)
– First Quartile (Q1).

c. Mean Absolute Deviation. It is used to measure the spread of variance,


formed by taking the mean of the squared deviations from the average.

7|Page
Formula: Mean Absolute Deviation = ∑ | Xi – Mean| / no. of observations

d. Variance. It is the mean of the squared deviations from the average.

Formula: σ2 = ∑ (xi - µ)2 / no. of observations

Standard Deviation – it is the square root of the variance.

Week 9

V. Probability

Probability theory or commonly known as “probability” is a branch of mathematics


concerned with the analysis random phenomena. Simply, it tells us how an event would
likely something to happen. Obviously, the best example of understanding probability is
flipping a coin – wherein the there are two possible outcomes: the heads or tails (Basic
Formula: Probability of an event /P(A) = (no. of ways it can happen) / (total no. of
outcomes)

a. Permutations and Combinations. The various ways in which objects from


a set may be selected, generally without replacement, to form subsets. This
selection of subsets is called permutation when the order of selection
is a factor – otherwise, it is combination.

Permutation: nPk = n! / (n-k)!

The symbol nPk reads “n permutes k.” The expression n!—read “n factorial”—


indicates that all the consecutive positive integers from 1 up to and
including n are to be multiplied together, and 0! is defined to equal 1.

Combination: nCk = n! / k! (n-k)!

For combinations, k objects are selected from a set of n objects to produce


subsets without ordering. The number of such subsets is denoted by nCk,
read “n choose k.” For combinations, since k objects have k! arrangements,
there are k! indistinguishable permutations for each choice of k objects

The formulas for nPk and nCk are called counting formulas since they can be
used to count the number of possible permutations or combinations in a
given situation without having to list them all.

8|Page
Week 9

VI. Normal Distribution

a. The Normal Distribution. When data are continuous, one associate it to a


curve rather than a histogram to its distribution. There are many theoretical
continuous distributions and normal distribution is one such continuous
distribution. The normal distribution is characterized by two parameters: (1)
the mean; and (2) standard deviation of the distribution.

Normal distributions have the following features (See Figure 3):


1) symmetric bell shape;

2) mean and median are equal; both located at the center of the distribution;

3) one standard deviation from the mean is about 68%;

4) two standard deviations from the mean is about 95%

5) three standard deviations from the mean is about 99.7%

Figure 3. The Normal Distribution Curve

b. Why the normal distribution is the most important curve in statistics.


One reason is because many variables are normally distributed or at least,
approximately normally distributed such as heights, weights, and
examination scores. Another reason is that it is easy for statisticians to work
with, because a number of inferential statistical tools are based on the
assumption that the data come from normal distributions. But the most
important reason is the fact that the sample for large observations tend to
be normal regardless of the original population from where the sample
values came from.

9|Page
c. Measure of Skewness. In order to assess whether a distribution is skewed
or asymmetric, one may calculate some measure of shape. One method is
to compute for skewness, which can be obtained by computing for the
ration (Upper Quartile – Median) / (Median – Lower Quartile). Another
measure of skewness is the difference Mean – Median, which zero for
symmetric data, positive for right skewed data, and negative for left skewed
data.

d. Kurtosis. It measures whether the data are heavy-tailed or light-tailed


relative to normal distribution. Data sets with high kurtosis (positive) tend to
have heavy tails, or outliers, while data sets with low kurtosis (negative)
tend to have low tails, or lack of outliers.

When the value of kurtosis is zero, the distribution is mesokurtic. A positive


value indicates leptokurtic distribution, and platykurtic when it is in the
negative.

Weeks 11 to 12

VII. Regression and Correlation

a. Pearson Product-Moment Correlation (r). Two variables are said to be


associated if knowing the value of one of them tells that something about
the value of the other. The degree to which the two variables are
associated is measured by a quantity known as the correlation
coefficient. Two variables that are positively correlated would tend to
go together in the same direction (x+, y+). On the other hand, two
variables are negatively correlated if the values tend to go on a opposite
direction (x+, y-) (See Figure 4 for the formula).

Figure 4. Pearson Product-Moment Correlation Formula

b. Test of Significance of Correlation Coefficient.

b.1. Using a table of Critical Values. The 95% Critical Values of the
Sample Correlation Coefficient Table can be used to give a good idea
of whether the computed value of r is significant or not. Compare r to
the appropriate critical value in the table. If r is not between the positive
and negative critical values, then the correlation coefficient is

10 | P a g e
significant. If r is significant, then one may want to use the line for
prediction.

Process: Compare the computed r to the critical values associated with


the computed degree of freedom (df = n – 2). If r < negative critical
value or r > positive critical value, then r is significant, otherwise.

URL Link for Table of Critical Values:


https://www.statisticssolutions.com/table-of-critical-values-pearson-
correlation/

c. Simple Linear Regression Analysis. When one visualizes the point in


a scatterplot generally clustering about a line, it is interesting to obtain
an estimate of such line in order to estimate the expected level of the
continuous variable for a known specific value. This statistical tool is
called the simple linear regression.

Formula: Ŷ = â+ βX

∑ Xi ∑ Xi
â= −b =X −bX
n n

∑ XiYi−∑ Xi ∑Yi Sy
β= 2 2
=r
n ∑ X i −( ∑ X i )
2 Sx

d. Sum of Squares. A statistical tool that is used to identify the dispersion


of data as well as how the data can fit the model in regression analysis.
It is the most important outputs in regression analysis The general rule
is that a smaller sum of squares indicates a better model, as there
is less variation in the data.

Types of Sum of Squares

1. Total Sum of Squares. Measures the variation of the values of a


dependent variable from the sample mean of the dependent
variable.

Formula: TSS = ∑ ( yi− ȳ )2= SSR + SSE

2. Regression Sum of Squares. Describes how well a regression


model represents the modeled data. A higher sum indicates that
the model does not fit the data well.

11 | P a g e
Formula: SSR = ∑ ( ŷ ¿i− ȳ ) ¿
2

3. Residual Sum of Squares (Sum of Squared Errors). Measures


the variation of modeling errors. In other words, it depicts how the
variation in the dependent variable in a regression model cannot be
explained by the model. Generally, a lower residual sum of
squares indicates that the regression model can better explain
the data while a higher residual sum of squares indicates that
the model poorly explains the data.

Formula: SSE = ∑ ( yi− ŷi )2

4. The Standard Error of Estimate. Measures the accuracy of


prediction.


2
Formula: σest = ∑ ( yi− ŷi )
n

5. Coefficient of Determination. Measures the proportion of variance


in the dependent variable that is predictable from the independent
variable. Values of 1 or 0 would indicate the regression line
represents all or none of the data, respectively. A higher coefficient
is an indicator of a better goodness of fit for the observations.

Formula: square root of the correlation coefficient (r2) converted into


percentage

Weeks 13 to 14

VIII. Hypothesis Testing

a. Hypothesis Test: Basic Idea. Statistical hypothesis testing involves


two competing claims about a population parameter of interest (i.e.
statements regarding a population parameter(s), and making a
decision to accept one of these claims on the basis of evidence (and
uncertainty in the evidence). Hypothesis testing is actually subject to
errors, the chances of which one would like it to be is small.

1. Hypothesis Testing using p-value. This value represents the


chance of generating a value extreme as the observed value of the
test statistic or something more extreme if the null hypothesis is
true. One may use the p-value in combination with the level of
significance to also make a decision on whether to reject or not to
reject the null hypothesis. In such case, if the p-value is less than
the level of significance (p<0.05), usually 5%, then one may

12 | P a g e
reject the null hypothesis. In this case, the result is statistically
significant at the 5% level.

2. Confidence Interval: Basic Idea. This are viewed as interval


estimates for the population mean as they provide a band (range)
of values within which one is confident that the true value of the
population mean lies. Such interval estimates also provide us a
sense of the impression as well as the accuracy of “point
estimates.” Whenever interval estimates are generated, it is
important that confidence level be stated. Confidence levels, like
chances, are between 0 and 100%. Commonly, used values of the
confidence level are 67%, 90%, 95%, and 99%.

3. One Sample T test. A statistical procedure used to determine


whether a sample of observations could have been generated by a
process with a specific mean (Steps will be discussed in class
lecture).

4. Z test for Proportion. It is a statistical procedure that used to have


a distribution that could be readily approximated by a normal curve.
The Z statistic can also be used for situations involving binary
classification and counts for the binary classes (Steps will be
discussed in class lecture).

Weeks 15 to 16

IX. Analysis of Variance (ANOVA)

a. ANOVA. When one would like to decide whether observed differences


among more than two sample means can be attributed to chance or
whether they reflect actual differences among the means of the
populations the data are sampled from. One-way ANOVA is one
statistical procedure of analyzing difference among sample means, and
the other is Two-way ANOVA, which is used to compare the mean
differences between groups that have been split on two independent
variables (Steps will be discussed in class lecture).

Weeks 17 to 18

X. Chi-square Tests

a. Chi-square statistics It is a summary measure of how far the


observed numbers of counts in each category are from their expected
values. One of the advantages of the test is that it may be employed to
determine the independence of variables. It may be also used to

13 | P a g e
determine the goodness of fit to a hypothesized distribution (Steps
will be discussed in class lecture).

REFERENCES:

Albert, J.R.G. (2008). Basic Statistics for the Tertiary Level. 1st edition. Rex Bookstore Inc.
Manila, Philippines.

14 | P a g e

You might also like