Bus Stat. 11
Bus Stat. 11
CLO 2: Recognize and apply the basic definitions and rules of probability theory
CLO 3: Read and interpret the results of linear correlation and regression
CLO 4: Independently solve mathematical problems applying computational skills and assessing the results.
Course Coverage
1. Introduction to Statistics
1.1. Definition Statistical Terms
1.2. Levels of Measurements
2.Organization of Data and Sampling Method
2.1. Methods of Data Collection
2.2. Sampling
2.3. Methods of Data Presentation
3. Measures of Central Tendency
3.1.Ungrounded Data
3.2.Grouped Data
4. Measures of Location of Data
4.3. Percentile, Quartile, Interquartile Range, and Decile
5. Measures of Dispersion
5.1. Ungrouped Data
5.2. Grouped Data
6. Hypothesis Testing
6.1. Basic Concepts of Statistical Hypothesis Testing
6.2. ANOVA
6.3. Chi-Square
7. Linear Correlation and Regression
7.1. Coefficient of correlation
7.2. Testing the Significance of the Correlation Coefficient
7.3. Linear Regression
7.4. Using Regression to Develop a Forecasting Trend Line
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 1 of 48
Module 1
Statistics is a branch of mathematics that deals with the collection, organization, presentation, analysis,
and interpretation of data with the purpose of describing and drawing inferences about the numerical properties
of a population.
In statistics, population does not only mean a group of people but it also means a defined groups or aggregates
of objects, animals, materials, measurements, “things”, “events” or “happening” of any kind. It is a collection of
all possible individuals, objects, or measurements of interest. Thus, a sack of rice, a whole pizza pie, or a set of
weights and heights are considered population.
Since it would be impractical to study the whole population as in the case of asak of rice, then it is
necessary to just take a sample of the population. Thus, a handful of rice is a sample of the population in a sack
of rice. So, sample is defined as any subgroup of the population drawn by some appropriate method from the
population. It should be a representative of the population, that is, the sample will sow the properties of the
population.
Types of Varibles
1. Qualitative variable are those obtained from a qualitative population. When the charactistic or variable
being studied is nonnumeric it is called qualitative variable or an attribute. Example civil status, gender,
hair colour, etc.
2. A quantitative variable is the type when the variables studied can be reported numerically. Example:
age, scores, height, length, weight all that can be quantified.
a. Discrete variables can assume certain values, and there are usually gaps between the values.
Example: the number of chair in a room; the number of students in a class, the number of employees
in an office etc.
b. Continuous variables can assume any value within a specific range. Example: the weight of the
shipment of apples; length of the lawn; the height of a man; etc.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 2 of 48
1. Nominal level; in the nominal level of measurement, the observations can only be classified or counted.
There is no particular order to the labels. Example: the number placed at the back of basketball player
which helps the reference identify the particular player, gender, civil status, etc.
2. Ordinal level of measurement: In an ordinal level measurement, data categories are ranked or ordered
accordingly. Example: the rating of the students given to a professor during the evaluation, the honor
given to students during graduation (first honor, second honor, etc)
3. Interval Level Data: The interval level of measurement includes all the characteristics of the ordinal
level, but in addition, the difference between value is a constant size. Example: score of the students in
an examination, IQ scores, etc.
4. Ratio Level Data is the highest level measurement. The ratio of measurement has all the characteristic
of the interval level, but in addition, the zero point is meaningful and the ratio between two numbers is
meaningful. Examples are wages, height, and weight. Money is a good example, because if one has no
money, we are referring to zero.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 3 of 48
Module 1
Name __________________________________________________________
Exercise 1.1
Module 2
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 4 of 48
Organization of data and Sampling methods
apply the basic statistical concepts and principles in the collection of data;
Frequency Distribution. Refer to the grouping of data into categories showing the number of
observations in each mutually exclusive category. A summary of data presented in the form of class intervals
and frequencies.
Examples 1:
2 5 7 8 10
2 5 7 8 10
3 5 7 8 10
3 5 7 8 10
4 6 7 9 10
4 6 8 9 10
4 6 8 9 11
4 6 8 9 12
5 6 8 9 12
5 6 8 9 12
5 6 8 9 12
5 7 8 9 12
Steps in Organizing Data into a Frequency Distribution
Step 1. Determine the Range
Range is the difference between the highest and the lowest number in a set of data.
Step 2. Determine the number of classes it will contain. One rule of thumb is to select between 5 and 15 classes.
To approximate the class width or size, divide the range by the desired number of classes.
10/5 = 2; normally, the class size is rounded to the nearest whole number.
Step 3. Tally. The following table summarizes the raw data or the ungrouped data into a frequency distribution.
Class Midpoint it is midpoint of each class or sometimes called as class Mark. It the value half – way
across the class interval and can be calculated as the average of the two class endpoints.
Example: The class mark of the class 2 - 3 is (2 + 3) / 2 = 2.5; the class mark of the class 4 -5 is (4 + 5) = 4.5,
etc.
Relative Frequency is the proportion of the total in any given class interval in a frequency distribution.
Example: the relative frequency of the frequency 4 is 4/60 or .0667; the relative frequency of 12/60 = .20000,
etc.
Table 1.4 Example of class mark, Relative frequency and cumulative frequency
The cumulative frequency is a running total of frequencies through the classes of frequency
distribution. Example: based from table 1.4, the cumulative frequency is 4 + 12 = 16. 16 + 13 = 29; 29 + 19 =
48; 48 + 7 = 55; 55 + 5 = 60
Sampling Method
Sampling is widely used in business as a means of gathering useful information about a population.
Data are gathered from sample and conclusions are drawn about the population as a part of the interval statistics
process
N
n=
1+ N e2
Where
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 6 of 48
N = total population
n = sample size
Example: Given the following data, determine the sample size. Population (N) is 1,000, margin of error is 5%
1,000
n=
1+ 1000¿ ¿
1,000
n=
1+ 2.5
n = 286
Therefore, the sample size is 286. This means that 286 will be drawn from the 1,000
a. Random sampling
b. Nonrandom sampling
a. Random sampling. In random sampling every unit of the population has the same probability of being
selected into the sample.
1. Simple random sampling. It can be viewed as basis for other sampling techniques, with simple
random sampling, each unit of the frame is numbered from 1 to N (where N is the size of the
population ). Then a table of random number or a number generator is used to determine the
unit to be included in the sample.
2. Stratified random sampling. A stratified random sampling method divides the population first
into homogeneous subgroups, called strat, from which simple random samples are then drawn.
3. Systematic random sampling. In a random sampling method whereby every Kth item is
selected to produce a sample of size n from a population of size N.
Determining the value of K
K=N/n
Where
N = population size
n = sample size
K = size of interval for selection
4. Cluster (or area) random sampling. Cluster (or area) sampling involves dividing the
population into non-overlapping area, or clusters.
b. Nonrandom Sampling. Sampling techniques are used to select elements from the population by any
mechanism that does not involve a random selection process are called non-random sampling. The following
are the non-random sampling techniques.
1. In convenience sampling, element for the sample are selected for the convenience of the
researcher. Example, a convenience sample of homes for door to door interview might include
houses where people are at home, houses near the street, first door apartment, houses with
friendly people, etc. Using the telephone directory to know the popularity of the president of the
country is also an example of convenience sampling.
2. Judgment sampling or Purposive sampling occurs when elements selected for the sample are
chosen by the judgement of the researcher. Example, when a researcher is studying the
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 7 of 48
activities of the retired employees of a certain company, judgment or purposive sampling is
needed because what he needs are those that have retired as the subject of his study.
3. Quota sampling are similar to stratified sampling, except that in quota sampling instead of
randomly sampling from each stratum, the researcher uses a non-random sampling method to
gather data from one stratum until the desired quota of samples is filled. Example, suppose a
researcher wants to stratify the population into owners of different types of cars. Here he will
interview all car owners by looking into the quota for each brand.
4. Snowball sampling. In snowball sampling, the subjects are done through referrals from other
survey respondents. The researcher identifies a person who fits the profile of subjects wanted
for the study. The researcher, then asks this person the names and location of the others who
would also fit the profile of the subjects wanted for the study.
Module 2
Name __________________________________________________________
1. The following data represent the number of passenger per flight in a sample of 50 flight from Legaspi
to Manila then to Perto Princesa. (use 5 classes)
23 34 66 67 13 58 19 17 65 17
25 20 47 28 16 38 44 29 48 29
69 34 35 60 37 52 80 59 51 33
48 46 23 38 52 50 17 57 41 77
45 47 49 19 32 64 27 61 70 19
57 23 35 18 21
26 51 47 29 21
46 43 29 23 39
50 41 19 36 28
31 42 52 29 18
28 46 33 28 20
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 8 of 48
Sampling Method
Exercise 2.2.
1. For the following researcher problem, determine what sampling method/s should be used.
___________________a. A city wide study of motels and hotels is being conducted.
___________________b. A study of consumer’s attitude and behavior.
___________________c. A researcher would like to determine the popularity of a candidate
___________________d. A study of the retired employees of private educational institutions.
2. For each of the following researcher problems, list some strata into which the variables can be divided.
___________________a. Age of the respondent
___________________b. Size of the company (sales volume)
___________________c. Geographic location
___________________d. Occupation of the respondents
___________________e. Types of business
3. For each of the following researcher projects, list at least one area or cluster that could be used in obtaining the
sample.
_________________________ A study of road conditions of the city
_________________________ A study of the effects of the cement factory of the place
Module 3
Objective
Definition. A single value that summarizes set of date. Measures of central tendency yield information about
the center, or middle part, of a group of numbers.
Mean
The arithmetic mean is the average of a group of numbers and is computed by summing all numbers and
dividing by the number of values. Because the arithmetic mean is so widely used, most statisticians refer to it
simply as mean.
Where :
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 9 of 48
μ represents the population mean. It is the Greek Lower case letter for “mu”
∑ ❑ is the Greek capital letter “sigma” and indicates the operation of adding
∑ x is the sum of the X values
Example 1. There are 12 automobile companies in Albay. Listed below is the number of patents granted by the
government to each Automobile company.
Solution:
This is a population because we are considering the automobile companies of Albay obtaining patents. To
obtain the mean we get the total number of patents granted and divided by the number of companies of Albay.
Using the formula (1) we have,
How do we interpret the value of 195? The average number of patents received by an automobile company is
195. Because we consider all the companies receiving patents, this value is a population parameter.
Sample Mean
The mean is the sum of all the values divided by the total number of values.
Sample Mean x́ =
∑x
n
Where
What is the mean interest rate on this sample of long term bonds?
x́=
∑ x = 9.50+ 7.25+ 6.50+4.75+12.00+ 8.30 = 48.3 =8.05
n 6 6
The mean interest rate of the sample of long term bonds is 8.05
The mean is affected by each and every value, which is considered as an advantage. It is also a disadvantage
because extremely large value or small value can cause the mean to be pulled toward the extreme value.
The mean is most commonly used measure of central tendency it uses its data item on its computation, it
is a familiar measure, and it has mathematical properties that make it attractive to use in inferential statistics
analysis.
Mode
The mode is the most frequency occurring value in a set of data. If there are two modes in the set of
data, then data are said to be bimodal. Data sets with more than two modes is referred to as multimodal.
Example 1. Data set: 15, 11, 14, 3, 21, 17, 22, 16, 19, 16, 19, 16, 5, 7, 16, 8, 9, 20, 4
Solution: 16 is the mode because, 16 occurs three times in the data set.
2. Data set: 15, 11, 14, 3 21, 17, 22, 16, 19, 22, 16, 5, 22, 7, 16, 8, 9, 20, 4
Solution: Data set is bimodal, because, 16 and 22 has the same number of values in the set and these data appear
three times in the data set.
Median
Median is the middle value in an ordered array of number. The following steps are used to determine the
median.
Step 2. For an odd number of terms, find the middle term of the ordered array, It is the median
Step 3. For an even number of terms, find the average of the two middle terms. The average is the median.
Example 1. Suppose a business researcher wants to determine the median for the following numbers:
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 11 of 48
15, 11, 14, 3 21, 17, 22, 16, 19, 16, 5, 7, 16, 8, 9, 20, 4
3,4,5,7,,8,9,11,14,15,16,16,17,19,20,21,22
Step 2. Since the array contains 17 term, (an odd number of terms), the median is the middle number, or 15
Step 3. If the number 22 is removed from the data set, the array would contain only 16 terms.
3,4,5,7,,8,9,11,14,15,16,16,17,19,20,21
Step 4. Now for an even number of terms, the statistician determines the median by getting the average of the
two middles values, 14 and 15. The resulting median is (14+15)/2 = 14.5.
Note : Another way to locate the median is to find the (n + 1)/2 term in on ordered array.
For example from the above data set, is to find the (n+1)/2 = 18/2 = 9, that is the 9th term is the median. The
median is 15.
3,4,5,7,8,9,11,14,(15),16,16,16,17,19,20,21,22
If there is an even number of terms, the median is (16+1)/2 = 8.5; the median for these data is located halfway
between 8th and 9th terms, or average of 14 and 15. Thus the median is (14+15)/2 = 14.5
3,4,5,7,8,9,11,14,15,16,16,16,17,19,20,21
Population MEAN
μ grouped=
∑ fM = ∑ fM = f 1 M 1+ f 2 m 2+ …+fnmn
N ∑f f 1+f 2+…+ fn
f = class frequency
N = total frequency
M = class mark
Example 1:
Step 3. Get the sum of the result from step 2. (Summation of fm)
Step 4. Determine the mean by dividing the ( ∑ fM )/ ∑ f = 564/60 = 9.4, Hence , μ=9.4
Sample MEAN
Example 2.
1 2 3 4
Class Interval Frequency M fM
10 – 14 6 12 72
15 – 19 22 17 374
20 – 24 35 22 770
25 – 29 29 27 783
30 – 34 16 32 512
35 – 39 8 37 296
40 – 44 4 42 168
45 – 49 2 47 94
Summation ∑ ❑ ∑ f =122 ∑ fM =3,069
N
Formula (Md) = l + 2
md
−cf p
f med [ ]
(i)
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 13 of 48
cf p = a cumulative total of the frequency up to but not including the frequency of the median class
Step 3: Determine the cumulative total of the frequencies up to but not including the frequency of the median
class (29)
Step 6: Substitute the value obtain from step to step 5 to the formula:
30−29 1
Md = 9.5 + [ 19 ]
(3 )=9.5+ =9.5+0.16=9.66
9
Hence, Md = 9.66
Step 2. Determine the n/2; cumulative frequency up to but not including the frequency of the median class;
lower of the median class; frequency of the median class; class size.
N/2 = 122/2 = 61
Cfb = 28
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 14 of 48
Fmodc = 35
Lmd = 19.5
i=5
N
Formula (Md) = l + 2
md
f med[ ]
−cf p
(i)
61−28
Md = 19.5+ [ 35 ]
(5) = 19.5 + 33/35 (5) = 19. + 4.71 = 24.21
Median is 24.21.
Module 3
Name __________________________________________________________
Exercise 3.1
Exercise 3.2
Compute and interpreted the different measures of Central Tendency. (Show your complete solution)
1. A sample of households that subscribe to the United Bell Phone Company revealed the following
numbers of calls received last week.
52 43 30 38 30 42 12 46
34 46 32 18 41 5 39 37
2. The following data represent the number of passenger per flight in a sample of 50 flight from Legaspi
City to Manila.
23 46 66 67 13 58 19 17 65 17
25 20 47 28 16 38 44 29 48 29
69 34 35 60 37 52 80 59 51 33
48 46 23 38 52 50 17 57 41 77
45 47 49 19 32 64 27 61 70 19
Exercise 3.3
Compute and interpreted the different measures of Central Tendency. (Show your complete solution)
1. The air transport association recorded the following number of passenger arriving and departing on the
Busiet Airport in Metro Manila. The following Frequency distribution has been constructed.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 16 of 48
Module 4
Objective:
Lesson 1.
P
(N )
i = 100
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 17 of 48
Where
i = percentile location
a. if I is a whole number, the Pth percentile is the average of the value at the ith location and the value at the
(I + 1)th location.
b. if I not a whole number, the Pth percentile value is located at the whole number part of I + 1.
First Quartile (Q1) separates the first lowest, one - fourth of the data from the upper three - fourth and is
equal to 25th percentile or it is the value below which 25 percent of th observations occur. The second Quartile
or Q2 (the median) is the value below which 50 percent of the observations occur and the Third Quartile,
labelled as Q3, is the value below which 75 percent of the observations occur.
Deciles divide a set of observations into 10 equal parts. The deciles labelled as (D1, D2, D3, . . . ,D9 ) are the
values below which 10, 20, 30, . . .,90 percent of the observations occur, respectively.
Percentiles divide the observations into 100 part . . . (P1, P2, P3, . . . P99)
Determine the Location of Quartile, Deciles and Percentiles, in terms of the location of the percentiles.
Where: i = location
P/100 = percentile
Quartiles
Step 1. Arrange the observations or data from the smallest to the largest or from the lowest to the highest value.
Step 2. Determine the location of the first quartile using the formula:
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 18 of 48
Q2 = median = (116 + 121)/118.5
c. the value of Q3, is determined by P75 as follows:
75
(8 )=6
i = 100
because i, is a whole number, P75 is the average of the 6th and the 7th numbers,
Example 2: The following shows the top 16 global marketing categories for advertising spending for a
recent year according to advertising Age. Spending is given in millions of pesos. Determine the first,
second and third quartiles for these data.
Category Ad Spending
Automobile 22,195
Personal Care 19,526
Entertainment & Media 9,538
Food 7,793
Drug 7,707
Electronics 4,023
Soft Drinks 3,916
Retail 3,576
Cleaners 3,571
Restaurants 3,553
Computers 3,247
Telephone 2,488
Financial 2,433
Beer, Wine & Liquor 2,050
Candy 1,137
Toys 699
Solution
25
i= ( 16 ) =4
100
because i is a whole number, Q1 is found to be the average of the 4th and 5th values from the bottom (lowest
value from the observation ).
2,433+2,488
Q1 = =2,460.5
2
Q2 = P50 = media; with 16 items, the median is the average of the 8th and 9th
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 19 of 48
Q3 = (7,707 + 7,793)/ 2 = 7,750
Therefore the value of first, second, and third quartiles is 2,460.5, 3,573.5, and 7,750 respectively.
Example 3. Shown below are the 20 top companies in the computer industry by sales in 2005.
compute for the Q1, Q3, D5, D9, P30 and P60
Solutions:
Because I is 15, the third quartile is found to be average of the 15th and 16th values from the bottom of the
distribution.
Q3 = (60,200 + 65,030)/2 = 62,615; thus , Q3 = Php 62,615.00
Decile
5th Decile
Step 2 because I is 10, ,a whole number, the D5, is the average of the 10th and 11th values from the bottom of the
distribution.
9TH Decile
i = (90/100)(20) = 18
Because I = 18, a whole number, D9 is the average of the 18th and 19th values from the bottom of the
distribution.
Percentile
Steps Because I = 6, then P30 is the average of the 6th and 7th values from the bottom of the distribution.
Find P60
P60 is found by: P60 is the average of the 12th and 13th values from the bottom of the distribution
InterQuartile Range is the distance between the first and the third quartile. Based from the above
example where Q1 = 23.00 and Q3 = 58.1, the Interquartile Range is IQR = 35.1.
Module 4
Name __________________________________________________________
Exercise 4.1.
Shown below are the 10 top companies in the computer industry by sales in 2020.
Compute for the Q2, Q3, D7, D8, P40 and P20 and Interpret the value of measures of location of data
Company Sales(Php)
1 50
2 40
3 25
4 38
5 42
6 35
7 30
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 21 of 48
8 47
9 49
10 28
Module 5
Measures of Dispersion
Objective:
A. Ungrouped Data
Lesson 1. Range. Is the difference between the largest (Highest) value of a data set and the smallest (Lowest)
value of a set.
3,4,5,7,8,11,14,15,16,16,16,17,19,20,21
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 22 of 48
Step 1. Determine the highest (H) value and lowest (L) value from the set of data.
H = 21 and L = 3
step 2 Get the Range (R) by difference between the lowest and highest values R = H - L; R = 21 -- 3 = 18;
therefore the range is 18
Lesson 2. Mean Absolute Deviation(MAD) is the average of the absolute values of the deviation around the
mean for a set of numbers.
Formula: MAD =
∑ |x−x́|
N
N = number of values
∑ ❑= summation
X = value (Score)
x́ = mean
X = 5, 9,16,17,18
MAD =
∑ x = 65 =13
N 5
Step 2. Substract mean from each of the value from the following table.
Step 3. Get the sum of the absolute value of the difference between the mean and the corresponding value.
∑|X− x́| = 24
Step 4. Solve for the MAD using the formula:
| X−x́| 24
MAD=∑ = =4.8
N 5
Lesson 3. Variance is the average of the squared deviation about the arithmetic mean for a set of numbers. The
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 23 of 48
population variance is denoted by σ 2
( x−μ )2
Population variance σ 2= ∑
N
μ=
∑ x = 65 =13
N 5
Step 2 Get the difference between the mean and the corresponding value.
x−μ
2 ∑ ( x−μ )2 130
σ = = =26
N 5
Example 3. Using the steps above - cited, the results are summarized in the following table. Using the data from
the following table, determine the variance.
Lesson 4. Standard Deviation is the square root of the variance. The empirical rule is used to state the
approximate percentage of value that lie within a given number of standard deviations from the mean of a set of
data if the data are normally curved.
Example 4. A company produces a lightweight valve that is specified to weigh 1,355 grams. Unfortunately,
because of imperfections in the manufacturing process not all of the valves produced weigh exactly 1,355
grams. In fact, the weights of the valves produced are normally distributed with a mean of 1,365 grams and
standard deviation of 294 grams. Within what range of weights would approximately 95% of the valve weights
fall? Approximately 16% of the weights would be more than what value? Approximately 0.15% of the weights
would be less than what value?
Solution
Because the valve weights are normally distributed, the empirical rule applies. According to the
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 24 of 48
empirical rule, approximately 95% of the weights should fall within μ ±2 σ = 1,365 ±2 (294) = 1,365 ± 588.
Thus, approximately 95% should fall between 777 and 1,953. Approximately 68% of the weights should fall
withinμ ±1 σ , and 32% should fall outside this interval. Because the normal distribution is symmetrical,
approximately 16% should be above μ ±1 σ = 1,365 + 294 = 1,659. Approximately 99.7% of the weights should
fall μ ±3 σ , and .3% should fall outside this interval. Half of these, . 15%, should lie below μ−3 σ = 1,365 –
3(294) = 1,365 – 882 = 483.
POPULATION
Example 1. The computation of population Mean, Class Mark, the deviation from the mean, the squared
deviation and sum of the product of the respective frequency and the squared deviation.
From the frequency distribution, determine the following; a.) Population mean (μ); b.) class mark (M);
c.) difference between the mean and the individual class mark (M --); d.) square the difference (M -- μ)2; e.)
multiply ((M -- μ)2) by the corresponding frequency; f.) get ∑ f ( M −μ)2.
1 2 3 4 5 6 7
Class mark f m fm ( M −μ ) ( M −μ )2 f ( M −μ )2
1–3 4 2 8 – 7.4 54.76 219.04
4–6 12 5 60 19.36 232.32
– 4.4
7–9 13 8 104 1.96 25.48
10 – 12 19 11 209 – 1.4 2.56 48.64
13 – 15 7 14 98 21.16 148.12
1.6
16 – 18 5 17 85 57.76 288.80
4.6
7.6
∑ f =60 ∑ fm=60 ∑ f ( M −μ)2=962.4
Step 1. Determine the class mark (3) by adding the lower limit and the upper limits of each class, then divided
by 2, like 1 + 3 = 4/2 = 2, (class mark the class 1 - 3)
Step 2. Multiply the obtained class mark (M) by their corresponding frequency to obtain (fm) column 4 of the
table.
Step 3. Determine the population mean (μ) by dividing the sum of fm by the total frequency.
μ=
∑ fm ∨μ= 564 =9.4( Population Mean)
N 60
Step 4. Get the difference between the mean and the corresponding class mark (Column 5 of the table)
Step 6. Multiply the result of the step 5 by the corresponding frequency and get the sum. ∑ f ( M −μ)2=962.4
2 ∑ f (M −μ)2 962.4
σ = ∨σ 2 = =16.04(Variance)
N 60
Step 8. Determine the population Standard Deviation by getting the square root of the variance (from step 7)
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 25 of 48
σ =√ σ 2= √16.04=4.00 (Standard Deviation)
Module 5
Measures of Dispersion
Name __________________________________________________________
Exercise 5.1
Determine the sample variance and standard deviation for the following data.
CI F
10-14 5
15-19 20
20-24 35
25-29 25
30-34 15
Total 100
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 26 of 48
Module 6
Hypothesis Testing
Objective At the end of module the students should be able to;
Hypothesis is a tentative, testable assertion regarding the occurrence of certain behaviours, phenomena, or
events; a prediction of study outcomes.
Two types of hypotheses that will be explored here
1. Null hypothesis
2. Alternative hypothesis
Null Hypothesis state that the null condition exists; that is, there is nothing new happening. it is a statement of
what the researcher believes will be the outcomes of an experiment or a study. Before studies are undertaken,
business researchers often have some idea, or theory based on experience or previous work as to how the study
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 27 of 48
will turn out. These ideas, theories, or notions established before an experiment or study is conducted are
research hypotheses. Some examples of research hypotheses:
Decision
Reality Accept Reject Ho Reject Ho
Ho is True Correct Decision Type I Error
Ho is not True Type II Error Correct Decision
Ho : ( μ1=μ 2) : The degree of neighbourliness does not differ before and after the relocation
To test the impact of forced relocation on neighborlines, the researchers interview a random sample of
six individual about their both before and after they were forced to move. Interview yield the following acores
of neighbourliness(higher score from 1 to 4 greater neighborliness)
Respondent Before Move (X1) After Move (X2) Difference (D) Difference (D2)
A 2 1 1 1
B 1 2 −1 1
C 3 1 2 4
D 3 1 2 4
E 1 2 −¿1 1
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 28 of 48
F 4 1 3 9
TOTAL ∑ X 1 = 14 ∑ X 2= 8 ∑ D 2 = 20
The following Steps
Step 1. Find the mean for each point in time.
x́ 1=
∑ x1 = 14 =2.33 ; x´ = ∑ x 2 = 8 =1.33
2
n 6 n 6
Step 2. Find the standard deviation for the difference between time 1 and time 2.
D2
S D=
n √
−( x́1 − x́2 ) 2
Where:
S D = standard deviation of the distribution of before – after difference scores
D = after – move raw score scores subtracted from before – move raw score
n = number of cases or respondents of the study
Step 3. Substitute the value to the formula:
20
S D=
√ 6
−( 2.33−1.33 )
2
S D= √3.33−1
S D= √2.33
S D=1.53
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 29 of 48
Test of Difference Between Proportions
Example 1. A social psychologist is interest in how personality characteristics are expressed in the car someone
drives. He wonders whether men express a greater need for control than women by driving big cars. He takes a
sample of 200 males and 200 females over 18 and determines whether or not they drive a full – size car.
Consequently, the final sample sizes for analysis were as follows; 180 for men and 150 for women. The
following hypotheses were formulated:
Null Hypothesis: Ho : The proportions of men and women who dive big cars are equal.
Research Hypothesis: H1: The proportions of men and women who drive big cars are not equal.
Step 3. Translate the difference between proportions into units of the standard error of the difference.
P1 −P 2 .45−.32
Z=
S P 1−P 2 (
=
.0539
=2.41 )
Step 4 Compare the obtained Z = (2.41) with the critical value of Z = 1.96. Because the obtained value (z =
2.41) is greater than the critical value of Z = (1.96), then we reject the null hypothesis. The difference between
sample proportions was statistically significant; the social psychologist was able to conclude that men and
women generally tend to drive different cars.
p = population proportion
q=1–p
Substitute the value to the formula
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 31 of 48
.209−.17 .039
= =2.44
Z = ( .17 ) (.83) .016
√
550
Step 4. Decision Rule
Reject the null Hypothesis because the observed Z test (2.44) is greater than the critical value of Z = 1.96. The
calculated test statistical is often referred to as the observed value.
Step 5. Business Implication. To make managerial decision, the researcher has enough evidence to reject the
null hypothesis that the breakfast beverage of 17% of children in the city is milk. The researcher can conclude
that the average breakfast beverage of the children is more than 17%.
One Tailed and Two – Tailed Tests
One – Tailed Test. A one – tailed test reject the null hypothesis at only tail of the sampling distribution or
when the rejection region is located at only extreme of the range of value for the test statistics.
Two – Tailed Test. Two – tailed test reject the null hypothesis at both tails of the sampling distribution or when
the rejection region is located at both extremes of the distribution./
One – Tailed Test for:
Step 1. Find the mean for both the before and after tests.
x́ 1=
∑ x1 = 523 =58.11 ; x́ = ∑ x 2 = 595 =66.11
2
N1 9 N2 9
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 32 of 48
Step 2. Find the standard deviation of the difference
∑ D2 −( x´ − x́ )2
S D=
√ N 1 2
1,070
S D=
√ 9
−(58.11−66.11)2
S D= √118.89−64
S D= √ 54.89
Step 7. Decision: Reject the Null Hypothesis since the computed value is more extreme in the negative
direction than that of the value (-- 1.86).
Step 8. Interpretation of the result. Since the null hypothesis is reject, therefore the remedial math program
has produced a statistically significant improvement in math ability of the students.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 33 of 48
Analysis of Variance (ANOVA)
is a statistical test that makes a single overall decision as to whether a significant difference is present among
three or more sample means. This test statistic is used to compare several population means simultaneously. The
result of ANOVA, a statistical technique that indicates the size of the between – groups mean square relative to
the size of the within - groups mean square.
Sample Problem 1.
A company has three manufacturing plants, and company officials want to determine whether there is a
difference in the average age of workers at the three locations. The following data are the ages of five randomly
selected workers at each plant. Perform q one-way ANOVA to determine whether there is a significant
difference in the mean ages of the worker at the three plants.
Solution.
Step 1. State the null and Alternative Hypothesis
Ho: There is no significant difference in the average age of worker at the three locations.
H1: There is significant difference in the average age of worker at the three locations.
Step 2. The appropriate test statistic is the F test calculated from ANOVA
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 34 of 48
Step 3. The level of Significant is 5%.
Step 4. The degree of freedom for this problem 3 – 1 = 2 for the numerator and 15 – 3 = 12
x́ 1=
∑ x1 = 141 =28.3 ;
n 5
160
x́ 2= =32
5
124
x́ 3= =24.8
5
Notice that difference do exist, the tendency for group 2 to have higher age than groups 1 and 3.
Step 5.2. Find the sum of ages, sum of squared ages, number of subjects, and mean ages for all groups
combined.
∑ x total=∑ x1 +∑ x 2 +∑ x 3=141+160+124=425
∑ x total=∑ x21 + ∑ x 22 + ∑ x 23=3,983+5,130+3,078=12,191
N total =N 1 + N 2 + N 3 =5+5+5=15
X
´ = ∑ TOTAL = 425 =28.33
X TOTAL
N TOTAL 15
= 12,191 – 15(28.33)2
=12,191 – 12,038.83
= 152.17
Step 5.4. Find the within groups sum of squares.
2 ❑
SSwithin =∑ x total −∑ N group( x́ ¿ ¿ group 2) ¿
=3–1
Step 5.7 Find the within groups degrees of freedom
df with =N total−k
= 15 – 3
= 12
Step 5.11 compare the obtained F ratio with the appropriate table F ratio. See Table 3 of Appendix A.
Obtained F ratio = 1.294
Table F ratio = 3.88
df = 2 and 12
= .05
Step 6. Formulate the decision rule.
Tp reject the null hypothesis at the 5% significance level with 2 and 12 degree of freedom, our calculated F
ratio must exceed table value 3.88. Because we have obtained an F ratio of 1.294, we cannot reject the null
hypothesis. This results obtained were not statistically significant.
Step 7. Since we obtained F ratio = 1.294 which is less than the critical F ratio (3.88), therefore we can say that
the results were not statistically significant difference in the average of the workers in the three locations.
Sample Example 2.
A professor had students in a large marketing class rate his performance as excellent, good, fair, or poor. A
graduate student collect the rating and assured the students that the professor would not receive them until after
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 36 of 48
course grades had been sent to the records office. The sample information is reported below.
Solution:
Step 1. State the null and the alternative hypotheses
Null Hypothesis: The mean scores are the same for the four rating
Ho: μ1=μ 2=μ3=μ 4
Alternative Hypothesis: The mean scores are not the same for the four rating
Ha: μ1 ≠ μ2 ≠ μ 3 ≠ μ4
If the null hypothesis is not rejected, we conclude that there is no difference in the mean course grades
based on the instructor rating. If Ho is rejected, we conclude that there is a difference in at least one pair
of mean rating, but at this point we do not know which pair or how many pairs differ.
Step 4.2: Find the sum of scores, sum of squared scores, number of subjects, and the
mean for all groups.
∑ x total=349+ 391+510+ 414=1,664
2
∑ x total=30,561+30,811+ 37,338+28,634=127,344
N total =4+5+ 7+6=22
x total =
∑ x total = 1,664 =75.64
N total 22
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 37 of 48
2
SSwithin =∑ x total−∑ N group❑ x́ 2group
= 127,344 – [(4)(87.25)2 + (5)(78.2)2 + (7)(72.86)2 + (6)(69)2]
= 127,344 – [ 30,450.25 + 30,576.2 + 37,160.06 + 28,566]
= 127,344 – 126,752.51
= 591.49
Step 4.5: Find the between groups sum of squares.
SSbetween =126,752.51−22(75.64)2 =126,752.51−125,871.01=881.5
Step 4.6: Find the between group degrees of freedom
df between =k−1=4−1=3
SS between 881.5
MS between = = =293.83
df between 3
Step 4.10: Obtain F ratio.
MS within 293.83
F= = =8.94
MS between 32.86
Step 4.11: Compare the obtained F ratio with the table value of F ration or critical
F.
Obtained F = 8.94
Critical F = 5.09 (Table 3, Appendix A)
df = 3 and 18
= .01 0r 1%
Step 5. Decision Rule
Since the obtained F ratio is (8.94) greater than the critical F ratio (5.09) at 1% level of significance with
df (3 and 18), we can reject the null hypothesis. Therefore the result of the test is statistically significant.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 38 of 48
Chi – Square (X2) Test.
A nonparametric test of significance whereby expected frequencies are compared against observed frequencies.
Example 1. Suppose your instructor return the exam and hands out the answer key. You construct a frequency
distribution of the correct response to the 50 – item test as follows:
Correct answer Fo fe X2
A 12 10 0.4
B 14 10 1.6
C 9 10 0.1
D 5 10 2.5
E 10 10 0.0
TOTAL 50 50 4.5
The one – way chi – square test can be used to determine whether the frequencies we observed previously differ
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 39 of 48
significantly from an even distribution (on any other distribution we might hypothesize).
Null Hypothesis: The instructor shows no tendency to assign any particular correct response from A to E.
Alternative Hypothesis: The instructor shows a tendency to assign particular correct responses from A to E.
Using the formula
( fo−fe)2
X2 = ∑
fe
Where
X2 = chi – square
Fo = observed frequency
Fe = expected frequency
∑ ¿ summation
Political Orientation
Child rearing methods Liberal conservatives Total
Permissive 5 10 15
Not Permissive 15 10 25
Total 20 20 40
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 41 of 48
For the upper left cell in the table, (permissive liberals)
20(15)
fe = =7.5
40
For the upper right cell (permissive conservative)
20(15)
fe = =7.5
40
For the lower left (not permissive liberals)
( 20 ) (25)
fe = =12.5
40
df = (2 – 1) (2 – 1)
Step 6. Compare the obtained X2 = 2.66 with the critical X2 (3.84) (Table 5, Appendix A)
Step 7. Decision: Because the computed value of X2 is less than the critical X2 , we must accept the null
hypothesis and reject the researcher hypothesis. In short, the observed frequencies do not differ enough from the
frequencies expected by chance to indicate that the actual population difference exists.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 42 of 48
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 43 of 48
Module 6
Hypothesis Testing
Name __________________________________________________________
Exercise 6.1.
1. Suppose you are testing Ho = .45 versus Ha > .45 random sample of 310 people produces a value
of ^p = .465. Use = .05 to test this hypothesis.
Black 4 8 7 3 5 4 5 4
White 3 5 4 7 5 5 6 4 3 2
Note: Follow all the steps and use two decimal places, round off if possible.
1. Compute a one – way ANOVA on the following data. Use .05 level of significance
1 2 3
2 5 3
1 3 4
3 6 5
3 4 5
2 5 3
1 5
Determine the observed F value. Compare the observed F value with the critical table F value and
decide whether to reject the null hypothesis.
Objectives
determine the correlation coefficient using the Pearson Product and interpreted the results
use Spearman Rank to find the order correlation coefficient and interpreted the result and interpret
results of regression
Correlation measures the association or the strength of the relationship between two variables say x and y.
Definitions.
Two variable are positively correlated if the value of the two variables both increase.
Two variables are negatively correlated if the values of one variable increase while the values of
the other decrease.
Two variables are not correlated or they have zero correlation if one variable neither increases
nor decreases while the other increases.
Verbal Interpretation
The degree of correlation can determine by correlation coefficient. Its value represents an interpretation as
shown in the table below.
R Verbal Interpretation
0.00 No correlation
± 0.01 to ± 0.20 Slight Correlation
± 0.21 to ± 0.40 Low Correlation
± 0.41 to ± 0.70 Moderate Correlation
± 0.71 to ± 0.80 High Correlation
± 0.81 to ± 0.99 Very High Correlation
± 1.0 Perfect Correlation
The most familiar sort of statistical tool in quantifying the linear relationship between two random
variables, x and y.
Formula
N ∑ xy −∑ x ∑ y
r= 2
√ [ N ∑ x −( ∑ x ) ] ¿ ¿ ¿
2
1. State the null hypothesis (Ho) and the Alternative Hypothesis (Ha)
2. Determine the tabular value (TV), degree of freedom (df) = N – 2.
3. Determined computed value (CV).
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 45 of 48
4. State the conclusion.
a. Decision: i. computed r less than tabular (rc > rt ) (means reject Ho) and ii. rc < rt (means
Accept Ho).
Example 1.
Calculate and analyze the correlation coefficient between the number the number of study hours and the
number of sleeping hours of different students at 0.05 level of significance.
Number of 2 4 6 8 10
study hours (x)
Number of 10 9 8 7 6
sleeping hours
(y)
Solution:
1. Ho: There is no significant relationship between the number of study hours and the number of sleeping
hours of different students.
Ha: There is significant relationship between the number of study hours and the number of sleeping
hours of different students.
2. Tabular value: α =0.05 and df = N – 2 = 5 – 2 = 3: (3, 0.05) = 0.878
3. Computed Value:
Student X y x.y x2 y2
1 2 10 20 4 100
2 4 9 36 16 81
3 6 8 48 36 64
4 8 7 56 48 49
5 10 6 60 100 36
N=5 ∑ x =30 ∑ y=40 xy
∑ =220 ∑ x 2=220 ∑ y2 =330
N ∑ xy −∑ x ∑ y
r= 2
√ [ N ∑ x −( ∑ x ) ] ¿ ¿ ¿
2
4. Conclusion
Based on the result of r = --1 which is less than the tabulated value of 0.878, Do not reject Ho. This
implies that there is no significant relationship between the number of study hours and the number of
sleeping hours of different students. The result of r also implies perfect correlation.
Linear Regression
Regression Analysis is very powerful tool in the field of statistical analysis specifically in predicting the
value of one variable to the given value of another variable, and those variables that are related to each other.
Therefore, it is used when predicting the behaviour of a variable. The regression equation explains the amount
of variation visible in the independent variable x. It is actually an equation an equation of a straight line.
The purpose of regression is to determine the trend of the two variable as related to each other whether the
trend is rising or falling.
Formula;
y = a + bx
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 46 of 48
where: y = criterion measure
x = predictor
a = ordinate or the point where the regression line crosses the y – axis, and
b = beta weight or the slope of the line.
To get the regression equation, the value of a and b are computed using the formula below.
a = ¿¿
b = n¿¿
Example 1.
The data in the table represent the membership at a university mathematics club during the past 5 years.
Form a curve of the form y = a + bx to predict the membership 5 years from now.
Solution:
X Y x2 xy
1 25 1 25
2 30 4 60
3 32 9 96
4 45 16 180
5 50 25 250
∑ x =15 ∑ y=182 ∑ x 2=55 ∑ xy =611
a = ¿¿
b = n¿¿
Since you need to predict the membership five years from now, or at year 10, substitute 10 for x in the equation.
Thus, 5 years from, y = 16.9 + 6.5 (10) = 81.9 ≈ 82.
Therefore, five years from now, the club would have 82 members.
Reference
Dalisay, Clarenz, LPT., et al. (2018) Mathematic in the Modern World, OUR LADY OF FATIMA
UNIVERSITY
Antivola, Hermelita M. et al.(2015)Business Statistics A Modular Approach, Philippines: Books Atbp.
Publishing Corp.
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 47 of 48
Module 7
EXERCISES 7.1
Name __________________________________________________________
1. Solve the following by Pearson r and test the significant correlation at 5% tabular value. Interpret the
results.
X 12 14 10 12 14 12 11
Y 59 60 34 55 77 80 50
2. Use the regression analysis to predict the grade of a student in Mathematics if his grade in Science is:
A. 77
B. 65
C. 89
_______________________________________________________________________________________________________________________________________
_______________________________________________________________________________________________________________________________________
BUSINESS STATISTICS ERIC P. SUPANGA
1st Semester, S.Y. 2020-2021 Instructor
Bachelor of Business Administration Cp#: 09752410538
[email protected]
Page 48 of 48