0% found this document useful (0 votes)
29 views

Unit-4-Normal-Curve-and-linear-fegression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Unit-4-Normal-Curve-and-linear-fegression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Unit 4: Data Management

Topic 4: Measures of Relative Position

Learning Objectives
Upon the completion of this topic, you are expected to:
A. Determine the corresponding z-score of a given raw data;
B. Identify the location of a given data in terms of corresponding z-score and
quantiles;
C. Interpret the location of raw data in the Box Whisker’s plot; and
D. Solve word problems involving the concepts of the measures of relative
position.

Presentation of Content

I. Z-score
What do you know about z-score? Do you know how z-score is determined?
A z-score indicates how many standard deviations a data point is from the
mean. A given raw data can be converted in terms of z-score using the
formula:
x−μ
z=
σ
Where:
x = value of the raw data
µ = mean of the set of data where the given data belongs
σ = standard deviation of the set of data where the given data
belongs
z = corresponding z-score of the raw data

Hence, we can convert a given raw data if you know the mean and standard
deviation of the set where the raw data belongs.
How will you utilize the formula in converting a given raw score into its
corresponding z score?

Interpreting Z-scores
After determining the corresponding z-score of a given raw score, we need to
interpret it to be able to identify its location.

Here is how to interpret z-scores.


1. A z-score less than 0 represents a data below the mean.
2. A z-score greater than 0 represents a data above the mean.
3. A z-score equal to 0 represents a data equal to the mean. Thus, it is
found at the middle of the distribution.

1
Unit 4: Data Management

Example:
a. A z-score equal to 1 represents a data that is 1 standard deviation
above the mean; a z-score equal to 2, 2 standard deviations above the
mean; etc.
b. A z-score equal to -1 represents a data that is 1 standard deviation
below the mean; a z-score equal to -2, 2 standard deviations below the
mean; etc.

II. Quantiles
Do you know anything about quantiles? Aside from z scores, we can use
quantiles as measure of location. It is an extension of median concept where
items in the distributions are divided into equal parts.

Types of Quantiles
There are three types of quantiles namely: quartiles, deciles, and percentiles.
A. Quartiles divide the distribution into four equal parts. The values that
divide the parts are called first, second, and third quartiles. These are
denoted by Q1, Q2, and Q3 respectively. Below shows a representation
of a set of observations divided into quartiles.

Q1 Q2 Q3

B. Deciles divide the distribution into 10 equal parts. The values divide
the parts are called first, second, third, fourth, fifth, sixth, seventh,
eight, and ninth deciles. These are denoted by D1, D2, D3, D4, D5, D6,
D7, D8, and D9 respectively.

D1 D2 D3 D4 D5 D6 D7 D8 D9

C. Percentiles divide the set of observations into 100 divisions. These are
the points or values separating the scores into 100 parts. A percentile
indicates the value below which a given percentage of observations in
a group of observations fall.

P10 P20 P30 P40 P50 P60 P70 P80 P90

2
Unit 4: Data Management

Note: Quantiles are used in reporting scores from norm-referenced tests. For
example, if a score is at the 60th percentile, where 60 is the percentile rank, it
is equal to the value below which 60% of the observations may be found.

Procedure in Determining Quantile Measures


To determine the value of the quantile of interest, the following guidelines can
help you.
1. Arrange the given observations from lowest to highest.
2. Determine the ordinal rank (n) or location by applying the formulas
below:

For Quartiles For Deciles For Percentiles


Q D P
n= ( N +1) n= (N +1) n= (N +1)
4 10 100
Where: Where: Where:
n = the ordinal rank n = the ordinal rank n = the ordinal rank
Q = the nth quartile D = the nth decile P = the nth percentile
N = the number of N = the number of N = the number of
observations Observations observations

3. Locate the score corresponding to the obtained ordinal rank (n) or


location in the distribution.
4. If the obtained location is not a whole number, interpolate.
5. Interpolate by subtracting the values of the upper and lower scores.
6. Multiply the difference by the decimal part of the obtained location.
7. Add the product to the lower score.

III. Box Whisker’s Plot


The Box Whisker’s Plot is a type of graph used to display patterns of
quantitative data. It is a graphical method of displaying variation in a set of
data.
In most cases, a histogram provides a sufficient display; however, a Box
Whisker’s Plot can provide additional detail while allowing multiple sets of
data to be displayed in the same graph. Some types are called Box Whisker’s
Plots with outliers.
It makes use of the median, first quartile, and third quartile. Since Q1 is the
value of the data wherein 25% of the scores are below it and Q3 is above 75%
of the scores when they are arranged in ascending order.

3
Unit 4: Data Management

Below is an Example: of a Box Whisker’s Plot generated using SPSS


(Statistical Package for Social Sciences).

How comparable is the Box Whisker’s Plot to the other measures of relative
location?

Note: The percentage of data between Q1 and Q3 is about 50%. Thus, only
about 25% of the data are found on both ends of the distribution.

Skewness of the Distribution


We can determine the skewness of the distribution depending on the location
of the box on the line.
If the box is situated on the upper portion of the line, then the distribution is
skewed to the left and if it is situated on the lower portion, them the
distribution is skewed to the right. The line inside the box represents the
location of the median of the distribution.
The Box Whisker’s Plots below show distributions with different skewness.

4
Unit 4: Data Management

From the previous Box Whisker’s Plot, we can say that the distribution of
Group A is skewed to the right, the distribution of Group B is symmetric with
the mean, and the distribution of Group C is skewed to the left.

How can we use the Box Whisker’s Plot in determining the location of
observation in the distribution?

Procedure
A Box Whisker’s Plot is developed from five statistics.
1. Minimum value – the smallest value in the data set
2. First quartile – the value below which the lower 25% of the data are
contained
3. Median value – the middle number in a range of numbers
4. Third quartile – the value above which the upper 25% of the data are
contained
5. Maximum value – the largest value in the data set

For example, given the following 16 data points, the five required statistics are
displayed.

Number Raw Data Statistics


1 50 Minimum (50)
2 51
3 52
4 54
1st Quartile (54.5)
5 55
6 55
7 56
8 58
Median (58)
9 58
10 59
11 60
12 62
3rd Quartile (62.5)
13 63
14 63
15 64
16 65 Maximum (65)

5
Unit 4: Data Management

Note: Note that for a data set with an even number of values, the median is
calculated as the average of the two middle values.
From the observations given in the previous page, the values of the five
Statistics are:
1. Minimum value = 50
2. First quartile = 54.5
3. Median value = 58
4. Third quartile = 62.5
5. Maximum value = 65

Here are their representation in Box Whisker’s Plot format.

A boxplot splits the data set into quartiles. The body of the boxplot consists of
a "box" (hence, the name), which goes from the first quartile (Q1) to the third
quartile (Q3).
Within the box, a horizontal line is drawn at the Q2, the median of the data
set. Two vertical lines, called whiskers, extend from the front and back of the
box. The front whisker goes from Q1 to the smallest non-outlier in the data set,
and the back whisker goes from Q3 to the largest non-outlier.

If the data set includes one or more outliers, how will they be plotted on the
Box Whisker’s Plot?

6
Unit 4: Data Management

Application

Activity 1
Directions: Based on what we have learned about z-score, convert the
following raw scores to their corresponding z-scores. Use the formula
presented in the discussion and identify its location.

Raw Score Mean Standard Z-score Location


Deviation
1. 24 20 2 =(24-20)/2
2. 16 16 1
3. 18 24 8

Solution:
After accomplishing the previous activity, compare your answers to the
following solutions.

1. x = 24 μ = 20 σ =2
x−μ
z=
σ
24−20
z=
2
4
z=
2
z=2
Interpretation: The corresponding z-score of the raw score is 2. It represents
that the data can be found 2 standard deviations above the mean.

2. x = 16 μ = 16 σ =1
x−μ
z=
σ
16−16
z=
1
0
z=
1
z=0
Interpretation: The corresponding z-score of the raw score is 0. It represents
that the data is equal to the value of the mean.

3. x = 18 μ = 24 σ =8
x−μ
z=
σ

7
Unit 4: Data Management

18−24
z=
8
−6
z=
8
z=−0.75
Interpretation: The corresponding z-score of the raw score is -0.75. It
represents that the data can be found 0.75 standard deviation below the mean.

Good job! Did you get the same answers? If not, what part do you need to
improve?

Can you determine the raw score given the standard deviation and the z-score?
In what way?

Activity 2
Let us have another activity. This time, you can seek the help of your friends
to answer the following problems.
Directions: Given the mean and standard deviation of the distribution, convert
the following raw scores to their corresponding z scores and interpret their
location relative to the distribution. Good luck!
1. mean = 120 standard deviation = 10 raw score = 100
2. mean = 50 standard deviation = 5 raw score = 55
3. mean = 35 standard deviation = 4 raw score = 40

You just have learned to measure relative position of data through z-score.
Congratulations!

Activity 3
Let us try to follow the procedure in determining a quantile value. Do your
best in answering the following:
1. Find the 30th percentile of the set {12, 15, 17, 20, 25, 27, 29, 30, 30,
34, 36, 36, 37, 38, 39, 40, 41, 42, 43}
2. Determine the 2nd quartile from the set {30, 34, 36, 36, 37, 38, 39, 40,
41, 42, 12, 15, 17, 20, 25, 27, 29, 30}
3. Determine the 2nd decile from the set {20, 25, 27, 29, 30, 30, 34, 36,
36, 12, 15, 17, 37, 38, 39, 40, 41, 42}

Have you determined the values of the quantiles? Good job!

Solutions:
You may compare yours to the following solutions.
1. Note: The observations are already arranged from lowest to highest.
The given are: P = 30
N = 19
n = unknown

8
Unit 4: Data Management

Value of the 30th percentile = unknown


P
n= (N +1)
100
30
n= (19+1)
100
n=6

The 30th percentile is the 6th observation from the set of data which is 27.

2. We arrange the observations from lowest to highest as: {12, 15, 17, 20,
25, 27, 29, 30, 30, 34, 36, 36, 37, 38, 39, 40, 41, 42}.
The given are: Q=2
N = 18
n = unknown
Value of the 2nd quartile = unknown
Q
n= ( N +1)
4
2
n= (18+1)
4
n=9.5

The whole number part of the ordinal part (n) is 9 and the 9th observation is
30. The value next to 30 is 34 and their difference is 4. The product of their
difference and the decimal part of the ordinal rank (n) which is 0.5 is 2. Thus,
the value of the 2nd quartile is 32.

3. We arrange the observations from lowest to highest as: {12, 15, 17, 20,
25, 27, 29, 30, 30, 34, 36, 36, 37, 38, 39, 40, 41, 42}
The given are: D=2
N = 18
n = unknown
Value of the 2nd decile = unknown
D
n= (N +1)
10
2
n= (18+1)
10
n=3.8

The whole number part of the ordinal part (n) is 3 and the 3rd observation is
17. The value next to 17 is 20 and their difference is 3. The product of their
difference and the decimal part of the ordinal rank (n) which is 0.8 is 2.4.
Thus, the value of the 2nd decile is 19.4.

9
Unit 4: Data Management

Activity 4
Let us have another activity to test your understanding and mastery of the
topic.
This time, call for a friend to help you answer the items. Good luck!
1. The following are the scores of 19 students of the College of
Agriculture: 40, 32, 32, 30, 45, 44, 43, 35, 39, 23, 25, 36, 37, 28, 33,
27, 30, 29, and 20. Calculate Q1, D3, and P40.

2. Determine the 3rd quartile, 2nd decile, and 10th percentile of the
number of siblings of the 11 students of the College of Teacher
Education.

2 1 3 7
2 6 4 5
3 4 2

You just have learned to determine the quantiles of observations to identify


their location relative to the set of data. Congratulations!

Activity 5
Identifying the Location of Observation

Given the Box Whisker’s Plot below:

A. Identify the location of the following scores relative to the five


Statistics presented.

10
Unit 4: Data Management

B. Determine the skewness of the distribution.


1. 48
2. 40
3. 35
4. 30
5. 20

Do your best in determining the location of the scores before proceeding so


you can compare your answers to the solution. Good luck!

If you are done answering the activity, you can compare now your answers to
the solutions.

Solution:
A. Identifying the Location of Data
1. 48 is found above the third quartile and below the maximum score
2. 40 is the median of the distribution
3. 35 is located above the first quartile and below the median
4. 30 is positioned just below the first quartile
5. 20 is the lowest score in the distribution

B. Skewness of the Distribution


The box of the Box Whisker’s Plot is situated on the upper portion of
the line. Thus, we can say that the distribution is skewed to the left.

Activity 6
With the concepts that you have learned, interpret the following Box
Whisker’s Plot. This time you can ask the help of your friends in this activity.
Good luck!
1. Determine the skewness of the three groups.
2. Identify the location of the score 20 in the three distributions.

11
Unit 4: Data Management

Feedback/ Assessment

Now, let us test your understanding on the measures of relative position.


Goodluck!

Test I.
Directions: Supply the information being required by each item.
1. Given the mean of the distribution as 30 with a standard deviation of 5,
determine the corresponding z-score of the following raw scores.
a. 15
b. 30
c. 35

2. Interpret the location of the following z-scores.


a. 0.5
b. 2.2
c. 1.8

Test II.
Directions: Determine whether the following are correct or not. Write True if
the statement is true and False if it is false.
_____1. The 3rd quartile corresponds to the 30th percentile.
_____2. The 25th percentile is the observation below which 75% of the
observations may be found.
_____3. The Box Whisker’s Plot uses five Statistics namely: Q1, maximum,
mean, Q3, and minimum.
_____4. If the box of the Box Whisker’s Plot is situated on the upper part of
the line, then the distribution is skewed to the right.
_____5. Outliers can lie inside the box of the Box Whisker’s Plot.

Test III.
Directions: Below is the list of daily allowances (in peso) of 29 first year
students in Cagayan State University. Determine the value of:
1. 10th percentile
2. 3rd decile
3. 1st quartile

50 65 75 95 110 150
55 65 80 95 110 170
55 70 80 100 120 180
60 70 80 100 130 200
60 75 90 100 140

12
Unit 4: Data Management

The rubric below will be used to evaluate your answers.


Exceeds Meets Approaches
Criteria Expectation Expectation Expectation (1
(3 points) (2 points) point)
Understandin The given and the The given were Some of the given
g unknown were identified. were not identified.
identified and
properly labelled.
Solution The problem was The problem was The problem was
solved efficiently solved with the use solved inefficiently
and systematically of appropriate with the use of
with the use of solution. inappropriate
appropriate solution.
solution.
Answer The problem was The requirements The problem was
answered of the problem not answered.
accurately. were provided.

Test IV.
A. Directions: Given the Box Whisker’s Plot in the next page:
1. Identify the location of these observations; and
a.) 30
b.) 35
c.) 40

2. Determine the skewness of the distribution.

The rubric below will be used to evaluate your answers.


Exceeds Approaches
Meets Expectation
Criteria Expectation Expectation (1
(2 points)
(3 points) point)
Identifying Provided a detailed Provided The information is
the Location and accurate information on the insufficient to
of information on the location of the identify the
Observation location of the observation. location of the
observation relative observation.
to the five
Statistics.
Determining Provided a detailed Determined the The information
the and accurate skewness of the provided is
Skewness information on the distribution. insufficient to
skewness of the determine the
distribution. skewness of the
distribution.

13
Unit 4: Data Management

B. Directions: Provide the information required by each item. Show all


pertinent solutions. The rubric below will be used to evaluate your
answers.

Exceeds Meets Approaches


Criteria Expectation Expectation Expectation (1
(3 points) (2 points) point)
Understandin The given and the The given were Some of the given
g unknown were identified. were not identified.
identified and
properly labelled.
Solution The problem was The problem was The problem was
solved efficiently solved with the solved inefficiently
and systematically use of appropriate with the use of
with the use of solution. inappropriate
appropriate solution.
solution.
Answer The problem was The requirements The problem was
answered of the problem not answered.
accurately. were provided.

Problem: Albert’s teacher revealed that the mean score of their previous exam
is 60 with a standard deviation of 10. Instead of their raw scores, she gave the
z-scores instead.
Albert’s got a z-score of –0.5. If the passing score is 50, did he pass the exam?
Why or why not?

14
Unit 4: Data Management

Topic 5: Probabilities and Normal Distribution

Learning Objectives
Upon the completion of this topic, you are expected to:
A. Identify the properties of the normal distribution curve;
B. Determine the areas under the normal distribution curve given a portion of
the z table;
C. Determine the probability of cases in the normal distribution curve; and
D. Solve word problem involving the concepts of normal distribution.

Presentation of Content

I. Normal Distribution
A random variable x whose distribution has the shape of normal curve is
called a normal random variable. Its equation is as follows:

1 −¿¿
f ( x )= e
σ √2 π
Note: The random variable x is said to be normally distributed with mean and
standard deviation if its probability distribution is the above equation.

The normal curve is represented by a bell-shaped curve and its probability


distribution is termed as the normal distribution. The values in the curve are
clustered around the average value and fewer values are found at increasing
distances from the average.

15
Unit 4: Data Management

Properties of a Normal Distribution


The following are the properties of a normal distribution. Do not forget about
them.
1. The mean, median, and mode are equal and are located at the center of
the distribution.
2. The distribution is symmetrical about the mean.
3. The total area under the normal curve is 1 or 100%.
4. The tails extend infinitely but will never touch the horizontal line.
5. The location of the distribution is determined by the mean and the
standard deviation determines the dispersion of the distribution.
6. For a normal curve, the area within:
a. one standard deviation from the mean is about 68%;
b. two standard deviation from the mean is about 95%; and
c. three standard deviations from the mean is about 99%.

-2 -1 0 1 2

Note: The shape of the normal distribution depends only on two parameters:
the population mean and the population standard deviation.

What variables are normally distributed?

You can observe how the mean and standard deviation of different
distributions affect the size and location of the curve.

The first figure in the next page shows normal distributions with the same
mean but different standard deviations while the second figure presents
distributions with different means but the same standard deviation.

16
Unit 4: Data Management

Figure I: Two distributions with equal mean but different standard deviation

Figure 2: Two distributions with different means but the


same standard deviations

How can the mean and standard deviation of the distribution affect its shape?

Standard Scores
It is the position of raw score values in terms of the standard deviation relative
to the mean of the distribution.
Given the raw scores, we can convert them to their corresponding standard
scores or z scores. This means that the empirical distribution will be
standardized to the theoretical normal curve.
We can use the formula:
x−μ
z=
σ
Where:
z = standard score
µ = mean
σ = standard deviation
x = raw score

17
Unit 4: Data Management

Note: This is the same with our previous topic on z scores.


A standard normal curve is a normal distribution with a mean of 0 and a
standard deviation of 1 and all its raw scores are expressed in terms of
standard scores below or above the mean.
When the standard score is positive, it means that the raw score is above or
higher than the mean; if negative, it means that the raw score is below or lower
than the mean of the distribution.

Areas under the Normal Distribution Curve


To determine the areas under the normal curve, we shall convert the raw score
into its corresponding standard or z-score. Again, we will be using the
formula:

x−μ
z=
σ
Where:
z = standard score
μ = mean
σ = standard deviation
x = raw score

After determining the corresponding value of the raw score, we need a z table
to determine the area between the given two values. Here is a portion of the
table.

z (±) 0.00 0.01 0.02 0.03 0.04 0.05


0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368
0.4 0.1554 0.1519 0.1628 0.1664 0.1700 0.1736
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088

How will we use the z table to determine the area under the normal curve?

Application

Activity 1
Directions: Answer the following problem.
Problem: Suppose that in a given test, the mean is 45 and the standard
deviation is 5. If Mario obtained a score of 50, what is his standard score?

18
Unit 4: Data Management

Solution:
Given:
µ = 45 x = 50 σ=5 z = unknown
x−μ
z=
σ
50−45
z=
5
5
z=
5
z=1
Interpretation: A standard score of 1 means that there is 1 standard deviation
between 50 and 45. This further indicates that the given observation is greater
than the mean of the distribution.

Did you get the answer correctly? Good job!

Activity 2
Directions: Given the mean and standard deviation of the distribution as 20
and 4 respectively, convert the following raw scores into their corresponding
standard scores and interpret their location relative to the mean.
1. 18
2. 20
3. 16
4. 24
5. 28

Activity 3
Using the z table, let us determine the areas of the following:
1. Between 0.1 and 0
2. Between 0.03 and 0
3. Between 0.3 and 0
4. Between 0.45 and 0
5. Between 0.32 and 0

Note: Remember that the standard score for the mean is 0.

Answers to Activity 3
Have you tried to answer the activity? Here are the answers.
1. Between 0.1 and 0 = 0.0398
2. Between 0.03 and 0 = 0.0120
3. Between 0.3 and 0 = 0.1179
4. Between 0.45 and 0 = 0.1736
5. Between 0.32 and 0 = 0.1255

Did you get all the items? You’re doing great!

19
Unit 4: Data Management

Guidelines in Determining the Areas under the Normal Distribution Curve


Here are the guidelines to remember when determining areas under the normal
distribution curve. Read them carefully.
1. To determine the area to the right of a positive z score, subtract the
area between the z-score and the mean from 0.5 (The area of the half
of the curve is 0.5).
2. To determine the area to the left of a positive z score, add the area
between the z-score and the mean to 0.5.
3. To determine the area to the right of a negative z score, add the area
between the z-score and the mean to 0.05.
4. To determine the area to the left of a negative z score, subtract the
area between the z-score and the mean from 0.05.
5. To determine the area between two positive z scores, subtract the
areas formed by the two z scores and the mean.
6. To determine the area between two negative z scores, subtract the
areas formed by the two z scores and the mean.
7. To determine the area between a positive and a negative z score, add
the areas formed by the two z scores and the mean.

Can you follow the guidelines? Are there items that you are not sure with?

Activity 4
To understand the guidelines, let us determine the areas of the following. Try
to answer them before comparing your answers to answers provided in the
next page.
1. To the right of 0.1
2. To the left of –0.3
3. Between –0.2 and –0.4

How many of the guidelines did you apply? Congratulations! You can
compare now your answers to see how much you have understood.

Solutions to Activity 4
The following are the solutions to the previous activity.
1. Area to the right of 0.1 = unknown
Area between 0.1 and 0 = 0.0398
Subtract it from 0.5 = 0.4602

2. Area to the left of –0.3 = unknown


Area between 0.3 and 0 = 0.1179
Subtract it from 0.5 = 0.3821

3. Area between –0.2 and –0.4 = unknown


Area between 0.4 and 0 = 0.1554
Area between 0.2 and 0 = 0.0793
Subtract 0.1554 and 0.0793 = 0.0761

20
Unit 4: Data Management

Note: The areas between 0.2 and 0.4 and –0.2 and –0.4 are equal since they
are symmetrical about the mean.

II. Probability Distribution


Do you know that we utilize the concepts of the areas under the normal
distribution curve in determining the proportion of cases in the curve?

Note: The probability of the occurrence of a case is its area under the curve!

Example:
If x is a normal random variable with a mean of 90 and standard deviation of
4, find the probability that x is:
1. Greater than 92
2. Less than 89
3. Between 89 and 92

Solution:
Given: mean = 90 standard deviation = 4
1. Given: mean = 90 standard deviation = 4 P (x > 92) =
unknown
92−90 2
Convert the raw score 92 to z-score: z= = =0.5
4 4
Determine the area to the right of 0.5: P (x > 0.5) = 0.3085
The probability that x is greater than 92 is 30.85%.

2. Given: mean = 90 standard deviation = 4 P (x < 89) =


unknown
89−90 −1
Convert the raw score 89 to z-score: z= = =−0.25
4 4
Determine the area to the left of —0.25: P (x > —0.25) = 0.4013
The probability that x is less than 89 is 40.13%.
3. Given: mean = 90 standard deviation = 4 P (89 < x < 92) =
unknown
89−90 −1
Convert the raw score 89 to z-score: z= = =−0.25
4 4
92−90 2
Convert the raw score 92 to z-score: z= = =0.5
4 4
Determine the area between –0.25 and 0.5: P (–0.25 < x < 0.5) = 0.7098
The probability that x is between 89 and 92 is 70.98%.

Congratulations! You finished learning the topics. Did you enjoy it?
What are other applications of the areas under the normal distribution curve in
the real life setting?

21
Unit 4: Data Management

Feedback/ Assessment

Test I.
Directions: Supply the information being required by each item.
_____1. It has the same value and location as the median and mode of the
normal distribution.
_____2. It is the total area under the normal distribution curve.
_____3. It is the shape of a normal distribution curve.
_____4. The shape of the normal curve depends on these parameters.
_____5. It is the equivalent standard score of the mean of the distribution.

Test II.
Directions: Given a portion of the z table, determine the areas of the
following z scores.
z (±) 0.00 0.01 0.02 0.03 0.04 0.05
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368
0.4 0.1554 0.1519 0.1628 0.1664 0.1700 0.1736
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088
1. Between 0.11 and 0
2. To the right of 0.12
3. Between 0.12 and 0.34

Test III.
Directions: Provide the information required by each item. Show all pertinent
solutions. The rubric below will be used to evaluate your answers.
Exceeds Meets Approaches
Criteria Expectation Expectation Expectation (1
(3 points) (2 points) point)
Understandin The given and the The given were Some of the given
g unknown were identified. were not identified.
identified and
properly labelled.
Solution The problem was The problem was The problem was
solved efficiently solved with the solved inefficiently
and systematically use of appropriate with the use of
with the use of solution. inappropriate
appropriate solution.
solution.
Answer The problem was The requirements The problem was

22
Unit 4: Data Management

answered of the problem not answered.


accurately. were provided.

Problem 1: Juan and Jimmy took a test in Geometry. Their teacher revealed
their scores but in terms of standard scores. Juan has z score of 1 while Jimmy
has 1.5. If the mean score of the class is 60 with a standard deviation of 6, who
got a higher raw score and how much higher?

Problem 2: In the Prelim Examination in the College of Teacher Education,


the mean score of the 30 students of Mr. Aguinaldo is 20 with a standard
deviation of 4. Assuming normality:
1. What is the percentage of cases fall between the mean and 22?
2. What is the probability that a score lie above 21?
3. What is the probability that a score lie between 18 and 21?

23
Unit 4: Data Management

Topic 6: Linear Regression and Correlations

Learning Objectives
Upon the completion of this topic, you are expected to:
A. Recall concepts on linear correlation and least square line;
B. Determine the value of the linear coefficient;
C. Identify what relationship exists between two variables.
D. Estimate a value of the dependent variable that corresponds to the
independent variable.

Presentation of Content

I. Linear Correlation
The coefficient measures the strength and direction of linear coefficient
between two variables (Larson and Farber, 2000; Pagala, 2011). We will use
the formula below to determine the value of linear coefficient.

r =n
∑ xy −¿ ∑ x ∑ y ¿

[n ∑ x −(∑ x) ][n ∑ y −( ∑ y ) ]
2 2 2 2

Where:
n = number of ordered pairs
x = value of independent variable
y = value of dependent variable

How will you use the formula to determine the relationship of two variables?

From the formula, we will follow the following procedure:


1. Multiply x and y values and compute for the sum of the products.
∑ xy
2. Multiply the sum of the products by the number of ordered pairs.
n ∑ xy
3. Determine the sum of x values. ∑ x
4. Determine the sum of y values. ∑ y
5. Multiply the totalled values of x and totalled values of y.∑ x ∑ y
6. Square the values of x and take the sum.∑ x 2
7. Multiply the sum of the squares of the values of x by the number of
ordered pairs.n ∑ x 2
8. Square the values of y and take the sum.∑ y 2
9. Multiply the sum of the squares of the values of y by the number of
ordered pairs.n ∑ y 2
10. Square the total value of x.( ∑ x )
2

2
11. Square the total value of y.( ∑ y )

24
Unit 4: Data Management

12. Substitute the values in the formula to determine the value of the
coefficient.

Note: We can only employ correlation when data are in interval or ratio scale.

II. Simple Regression Analysis


We start with the concept of simple regression analysis.
When only one independent variable is used, the analysis is referred to as
simple regression analysis.
The formal statements of the simple linear regression model is:

y = α + βx + ε
Where:
y = the value of dependent variable
α = the y—intercept
β = the slope of the regression line
x = the value of the independent variable
ε = the random error term

How can we apply the formula to predict values of the dependent variable?

Method of Least Square


Since α and β are generally not known in a regression problem, they must be
estimated from a sample data taken on the dependent variable y for a number
of values of the independent variable x.

Note: The standard approach to estimating α and β is using the least squares
(minimizing the sum of the squared errors for your data points.)

Sample estimates of α and β are denoted by α and β, respectively, and the


resulting regression line is called sample least squares regression equation.

y = α + βx + ε

The sum of the squared deviation between the line and the scatter of points
should be minimized. Statisticians have found that the formulas for α and β
are shown below:

β=
∑ (x −x)(x− y )
∑ (x−x )
a= y−β x
Note: Here, x and y denote the sample means of x and y.

25
Unit 4: Data Management

Alternative Formula
The alternative formulas for α and β are as follows.

β=n ∑ xy −¿ ¿ ¿

a=
∑ y−β ∑ x
n

Application

Activity 1
Now, let us apply what we have learned. Here is an activity where we can
utilize the formula given. Remember to follow the guidelines in determining
the linear coefficient. Try to solve the problem independently before
comparing your answers to the answers provided.

Problem: The list of height and weight of 10 basketball players is given


below. Determine the value of the linear coefficient.

The list of height and weight of 10 basketball players.


X
(Height in 67 70 71 70 66 69 72 78 64 65
Inches)
Y
(Weight in 71 70 69 68 66 65 71 70 64 65
Kilograms)
1. Ho: There is no significant relationship between heights and wieght
2. Ha: There is significant relationship between heights and wieght
3. Alpha: 0.05
4. Statistical value of r = .641
5. Probability value P value (Sig)=.045< .05
6. Interpretation is p -value is less than alpha .05 then reject Ho.
7. If p -value is greater than than alpha .05 then accept Ho.
8. Conclusion: Ha: There is significant relationship between heights
and wieght
. Result that height and wight relationship
Possible Error? Type 2

Have you tried answering the problem? Great! Now, we can compare your
answers.

Solution:
We determine the values of the variables.
Height (X) Weight (Y) XY X2 Y2

26
Unit 4: Data Management

67 71 4,757 4,489 5,041


70 70 4,900 4,900 4,900
71 69 4,899 5,041 4,761
70 68 4,760 4,900 4,624
66 66 4,356 4,356 4,356
69 65 4,485 4,761 4,225
72 71 5,112 5,184 5,041
78 70 5,460 6,084 4,900
64 64 4,096 4,096 4,096
65 65 4,225 4,225 4,225
The values of the variables are:

∑ xy =47,050 n ∑ xy=¿470,500 ∑ x = 692


∑ y=¿ ¿697 ∑ x ∑ y=¿ ¿ 469,868 ∑ x 2=¿ ¿
48,036
n ∑ x =¿ ¿480,360 ∑ y 2=¿ ¿46,169 n ∑ y =¿ ¿
2 2

461,690
( ∑ x ) =¿478,864 ( ∑ y ) =¿46,104
2 2

We are now ready to substitute them in the formula.

r =n
∑ xy −¿ ∑ x ∑ y ¿

[n ∑ x −(∑ x) ][n ∑ y −( ∑ y ) ]
2 2 2 2

( 470,500 )−(692)(697)
r=
√ [480,360−( 692 ) ][461,690−(697) ]
2 2

470,500−469,868
r=
√ [480,360−478,864][461,690−461,041]
632
r=
√(1,496)(649)
632
r=
√ 970,909
632
r=
985.35
r =0.64

The value of the linear coefficient is 0.64.

27
Unit 4: Data Management

What could be the meaning of the value we computed?

28
Unit 4: Data Management

Correlations

Heights Wiehgth

Heights Pearson Correlation 1 .641*

Sig. (2-tailed) .046

N 10 10

Wiehgth Pearson Correlation .641* 1

Sig. (2-tailed) .046

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Interpreting the Correlation Coefficient


After determining the correlation coefficient, we need to interpret the value.
The quantitative interpretation of the degree of linear relationship existing is
shown below.
Values Interpretation
±1.00 Perfect positive/ negative correlation
±0.91 to ±0.99 Very high positive/ negative correlation
±0.71 to ±0.90 High positive/ negative correlation
±0.51 to ±0.70 Moderately positive/ negative correlation
±0.31 to ±0.50 Low positive/ negative correlation
±0.01 to ±0.30 Slight positive/ negative correlation
0 No correlation

From the previous activity, the correlation coefficient is 0.64 which can be
interpreted as a moderately positive correlation. There is a substantial degree
of correlation between the height and weight of the ten basketball players.

Awesome! Keep up the good work!

Activity 2
Let us put your understanding into practice. Below are the test results of 10
students in their Mathematics and English examinations. With a partner,
determine the linear correlation coefficient and interpret its value.
X
(Score in 34 23 45 44 37 46 23 41 40 35
Mathematics)

29
Unit 4: Data Management

Y
(Score in 35 21 43 42 32 45 23 47 43 37
English)

Activity 3
Using the given formulas, try to determine the values of the variables to come
up with the least squares regression equation.

Problem:
The Cagayan State University officials wished to determine if the CSU—
College Admission scores is a good indicator of the Grade Point Average
(GPA) of the 16 scholars selected at random from the first year class. Their
GPA and CSU-CAT scores are shown in the next page:

30
Unit 4: Data Management

Student GPA (y) CAT Score (x)


1 2.45 85
2 2.59 92
3 1.95 57
4 2.11 64
5 1.94 73
6 2.12 62
7 2.71 54
8 2.63 56
9 2.79 59
10 2.05 70
11 2.96 61
12 1.67 106
13 1.79 75
14 1.86 69
15 1.85 76
16 1.52 85

How can one predict and estimate GPA from CAT scores?

Solution
Now, we need to obtain the equation for the line that best fits the sample
data.

CAT Score
Student GPA (y) xy y2 x2
(x)
1 2.45 85 208.25 6.00 7225
2 2.59 92 238.28 6.71 8464
3 1.95 57 111.15 3.80 3249
4 2.11 64 135.04 4.45 4096
5 1.94 73 141.62 3.76 5329
6 2.12 62 131.44 4.49 3844
7 2.71 54 146.34 7.34 2916
8 2.63 56 147.28 6.92 3136

31
Unit 4: Data Management

9 2.79 59 164.61 7.78 3481


10 2.05 70 143.5 4.20 4900
11 2.96 61 180.56 8.76 3721
12 1.67 106 177.02 2.79 11236
13 1.79 75 134.25 3.20 5625
14 1.86 69 128.34 3.46 4761
15 1.85 76 140.6 3.42 5776
16 1.52 85 129.2 2.31 7225
Total 34.99 1,144 2,457.48 79.42 84,984

Solution:
Using the formulas:
34.99
y= =2.187
16
1,144
x= =71.50
16
(1,144)(34.99)
2,457.48−
16
β= 2
=−0.0139
(1,144)
84,984−
16
a=2.187−(−0.0139 ) ( 71.50 )=3.181

The fitted equation describing the relationship between GPA and CAT scores
is: GPA = 3.181— 0.014x

To predict the future GPA of a student with a CAT score of 80:


GPA = 3.181— 0.014(80) = 2.06

Congratulations! You just learned to predict the future Grade Point Average of
the students.

Activity 4
With a partner, determine the equation that would fit the following set of
observations.

Age
10 12 11 26 28 21 22 18 16 15
(x)
Score
32 30 34 39 38 32 29 28 25 20
(y)

32
Unit 4: Data Management

Feedback/ Assessment

We will now test your understanding on linear regression and correlations.


Good luck!
Test I.
Directions: Identify the term being described by each item.
_____1. It is employed to determine the existence of relationship between
variables in interval or ratio scale.
_____2. It is the type of relationship that exists when the value of the
correlation coefficient is zero.
_____3. It is the value of the coefficient with perfect negative correlation.
_____4. It is the symbol used in the formula for correlation coefficient that
represents the independent variable.
_____5. It is the symbol used in the formula for correlation coefficient that
represents the dependent variable.

Test II.
Directions: Write TRUE if the statement is correct and FALSE if the
statement is wrong on the space provided before each question.
_____1. Beta is the y-intercept in regression analysis.
_____2. In the regression analysis, it is the dependent variable that we want to
predict.
_____3. The slope of the regression line is denoted by alpha.
_____4. The ultimate goal of regression analysis is to predict or estimate the
value of one variable corresponding to a given value of another
variable.
_____5. The sample regression equation may be used to predict or estimate
outside the range of values of the independent variable represented in
the sample.

Test III.
Directions: Provide the information required by the problem in the next page.
The rubric below will be used to evaluate your answers.
Exceeds Meets Approaches
Criteria Expectation Expectation Expectation (1
(3 points) (2 points) point)
Understandin The given and the The given were Some of the given
g unknown were identified. were not identified.
identified and
properly labelled.
Solution The problem was The problem was The problem was
solved efficiently solved with the solved inefficiently
and systematically use of appropriate with the use of
with the use of inappropriate

33
Unit 4: Data Management

appropriate solution. solution.


solution.
Answer The problem was The requirements The problem was
answered of the problem not answered.
accurately. were provided.

Problem 1: The raw scores obtained by 10 students in a quiz are given below.
What is the relationship that exist in their performance in Biology and
Chemistry?
X (Biology) 12 11 19 20 15 17 18 12 14 15
Y
16 17 13 19 15 16 19 10 15 13
(Chemistry)

Problem 2: The Dean of the College of Education wants to determine if GPA


could be used to estimate the performance of the students in the Board
Licensure Examination for Professional Teachers. The scores are shown
below.
GPA
89 92 91 86 88 91 92 88 86 85
(x)
BLEPT
Score 80 82 80 85 82 87 88 82 83 81
(y)

Summary

This unit deals with the concepts of data management.


 Averages such as the mean, median, and mode summarize a given set of
data into a single value.
 The extent to which the median and mean are good representatives of the
values in the original dataset depends upon the variability or dispersion in
the original data.
 Datasets are said to have high dispersion when they contain values
considerably higher and lower than the mean value.
 Dispersion within a dataset can be measured or described in several ways
including the range, standard deviation, and variance.
 The location of data in a given distribution can be determined using the z-
score, quantiles, and Box Whisker’s plot.
 The normal curve is represented by a bell-shaped curve and its probability
distribution is termed as the normal distribution.
 The proportion of cases in a distribution can be determined through the
areas under the normal curve.
 Linear correlation tests the direction and strength of relationship of two
quantitative variables.
 Linear regression analysis allows us to predict or estimate the value of a

34
Unit 4: Data Management

given variable that corresponds to another variable.

Reflection

A. How much have you learned in this unit? Are there things that you
didn’t understand?
o I cannot understand the topic on _____________________.
o Now, I understand what the topics are all about.

I think that these topics are:


o Easy
o Moderate
o Difficult

B. Directions: Write your thoughts on the things that you have learned
and what you still need to improve by completing the following.

I have learned that …


_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________

I still need to improve myself on ...


_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________

I can understand the topic better if …


_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________

35
Unit 4: Data Management

_______________________________________________________________
_______________________________________________________________

References

Asaad, A. (2008). Statistics Made Simple for Researchers. Rex Book Store,
Inc.

ASQ (Box and Whisker’s Plot). Retrieved from http://asq.org/learn-about-


quality/data-collection-analysis-tools/overview/box-whisker-plot.html

Bourne, M. (n.d.). Interactive Mathematics (Normal Probability Distributions).


Retrieved from https://www.intmath.com/

Everitt, B. (1999). Chance Rules: An Informal Guide to Probability, Risk, and


Statistics. Copernicus.

Galfian (2016). 50 Popular Quotes and Sayings. Retrieved from


http://www.golfian.com/50-popular-normality-quotes-and-sayings/16-
normality-quotes/

Goldberg, S. (1986). Probability: An Introduction. New York: Dover.

Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan.

Mamhot, M., Mamhot, A., & Adanza, J. (2013). Statistics for General
Education. Purelybooks Trading & Publishing Corp.

Math Warehouse. Normal Distribution Curve and Graph. Retrieved from


https://www.mathwarehouse.com/statistics/normal-distribution-curve-and-
graph.php

Mises, R. (1964). Mathematical Theory of Probability and Statistics. New


York: Academic Press.

Nalangan, L. and Casinillo, M. (2009). Laboratory Manual in Statistics 1


(Elementary Statistics). Rex Book Store, Inc.

Narag, E. (2010). Basic Statistics with Calculator and Computer Application.


Rex Book Store, Inc.

Pagala, R. (2011). Statistics (Revised Edition). Mindshapers Co., Inc.

36
Unit 4: Data Management

Pinterest. Retrieved from


https://www.pinterest.ph/pin/359865826459701508/?lp=true

Socratic Statistics. How do I calculate and interpret a Z-score? Retrieved


from: https://socratic.org/questions/how-do-i-calculate-and-interpret-a-z-score

Star Trek (2018). Measures of Position. Retrieved from


https://stattrek.com/descriptive-statistics/measures-of-position.aspx?
Tutorial=AP

Star Trek (Boxplots). Retrieved from


https://stattrek.com/statistics/charts/boxplot.aspx?Tutorial=AP

Star Trek (Teach Yourself Statistics). Retrieved from:


https://stattrek.com/descriptive-statistics/measures-of-position.aspx?
Tutorial=AP

Statistics How to (Probability and Statistics). Retrieved from


https://www.statisticshowto.datasciencecentral.com/probability-and-
statistics/descriptive-statistics/box-plot/

37

You might also like