0% found this document useful (0 votes)
103 views57 pages

Gec104 M3 03

The document discusses the normal distribution and related statistical concepts. It provides the following key points: - The normal distribution is a bell-shaped curve that is symmetrical about the mean. It describes many natural phenomena. - Properties of the normal distribution include that the mean, median and mode are equal, and it is continuous and unimodal. - Other distributions can be skewed or kurtotic. Skewness indicates a pile up of values on one side of the mean. Kurtosis describes the peakedness of a distribution. - The z-score indicates how many standard deviations a value is from the mean. Common applications involve finding areas under the normal curve using z-scores and tables of the

Uploaded by

xaren carandang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views57 pages

Gec104 M3 03

The document discusses the normal distribution and related statistical concepts. It provides the following key points: - The normal distribution is a bell-shaped curve that is symmetrical about the mean. It describes many natural phenomena. - Properties of the normal distribution include that the mean, median and mode are equal, and it is continuous and unimodal. - Other distributions can be skewed or kurtotic. Skewness indicates a pile up of values on one side of the mean. Kurtosis describes the peakedness of a distribution. - The z-score indicates how many standard deviations a value is from the mean. Common applications involve finding areas under the normal curve using z-scores and tables of the

Uploaded by

xaren carandang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Mathematics in the

Modern World
STATISTICS:
Normal Distribution and Correlation Analysis
Normal Distribution
• Identify the properties of the normal distribution
• Determine normal distributions
• Find the areas under the normal curve
• Transform a random variable to a random normal
variable
• Appreciate the importance of normal distribution
through citing its application in everyday living.
Normal Distribution

• The normal curve represents a symmetrical distribution


of values.
• It is a curve that results in exactly the same proportion of
area under the curve on both sides of the mean.
• The mean, median, and mode are equal in the normal
distribution.
Normal Distribution

• is a continuous probability distribution. This means that it


generally uses either interval or ratio data.
• Histogram is a great approximation of a normal distribution.
• Drawing a bell-shaped curve on the histogram determines
whether the distribution is normal or not.
Normal Distribution
• A normal distribution is bell-shaped.
• The mean, median, and mode are equal and located at the center.
• Unimodal
• The curve is continuous.
• The curve never touches the x-axis.
• The total area under normal distribution is 1 or 100%
Distribution
Skewed Distribution
Normal Distribution - has values whoses frequencies bunch
- bell shaped curve up in one tail and stretch out in the
other
Shape of Distribution (Skewness)
• When raw data deviate from the normal distribution, we have a
skewed distribution.
• The distribution could have a negative skew or a positive skew.
• A negative skew (SK < 0) means that the majority of the
values tend to be at the high end of the x-axis, which results in
the median being greater than the mean and a more
representative measure of central tendency than the mean.
• A positive skew (SK > 0) would indicate that the test
scores were at the lower end of the x- axis, with a mean greater
than the median.
Shape of Distribution (Skewness)

• Negatively skewed (SK < 0) means that there are more scores above the
mean
• Positively skewed (SK > 0) indicates that there are more scores below
the mean
• Normal distribution (SK = 0) means that there are very few low scores
and very few high scores, majority got the average score
Scores Analysis of Marco Scores Analysis of Marvin

Mean 185 Mean 185


Standard Error 16.17611 Standard Error 1.505545
Median (SK > 0) 185 Median 185
Mode Positively skewed 185 Mode 185
Standard Deviation
more scores 39.62323
below the mean Standard Deviation 3.687818
Sample Variance 1570 Sample Variance 13.6
Kurtosis 0.838675 Kurtosis -0.90614
Skewness 0.607634 Skewness 0
Range 115 Range 10
Minimum 135 Minimum 180
Maximum 250 Maximum (SK = 0) 190
Sum Normal Distribution 1110
Sum 1110
Count 6 Count Majority got average scores 6
Kurtosis
• a statistical measure used to describe the degree to which
scores cluster in the tails or the peak of a frequency
distribution.
• The peak is the tallest part of the distribution, and the tails
are the ends of the distribution.
• a distribution might be symmetrical but still depart from
the normal pattern by being flatter or taller than the true
normal curve
Shape of Distribution (Kurtosis)
Kurtosis value can be negative (platykurtic), zero
(mesokurtic), or positive (leptokurtic).
• Platykurtic (flat distribution; K < 0)
• Mesokurtic (normal distribution; K = 0)
• Leptokurtic (thin distribution; K > 0)
Shape of Distribution (Kurtosis)
• platykurtic - flatter and broader than normal curve;
fewer values in the tails and fewer values close to the
mean (the curve has a flat peak and has more dispersed
scores with lighter tails)
• mesokurtic - moderate in breadth and curves with a
medium peaked height; a normal curve is mesokurtic
• leptokurtic - more peaked than normal curve; more
values in the distribution tails and more values close to
the mean (sharply peaked with heavy tails)
Scores Analysis of Marco Scores Analysis of Marvin

Mean 185 Mean 185


Standard Error
(K > 0) 16.17611 Standard Error (K < 0) 1.505545
Median Leptokurtic 185 Median Platykurtic 185
Mode 185 Mode 185
Standard Deviation 39.62323 Standard Deviation 3.687818
Sample Variance 1570 Sample Variance 13.6
Kurtosis 0.838675 Kurtosis -0.90614
Skewness 0.607634 Skewness 0
Range 115 Range 10
Minimum 135 Minimum 180
Maximum 250 Maximum 190
Sum 1110 Sum 1110
Count 6 Count 6
Normal Distribution
• A bell-shaped curve symbolizes that there is one central peak.
• The rest of the data are on either side of the center tapering
off on the extremes.
• The following figure shows the normal distribution.
• It was stated that the normal distribution is symmetric about
the mean.
• This signifies that the areas of a z - value is the same, whether
it is positive or negative. Hence, are -z equal to the area of +z.
Z – Score
• A Z-score is a numerical measurement that describes
a value's relationship to the mean of a group of values.
• Z-score is measured in terms of standard deviations from
the mean.
• If a Z-score is 0, it indicates that the data point's score is
identical to the mean score.
Find the area under Normal Curve
from 0 to any value of Z direct from the table
between 2 -Zs or 2 +Zs get the tabular values and
subtract
from one -Z and one +Z get the tabular values and
add
to the right of +Z get the tabular value and
to the left of -Z subtract from 0.5
to the right of -Z get the tabular value and
to the left of +Z add 0.5
Calculate the area
From 0 to any value of Z Direct from the table

P (0 < z < -0.72)


the probability or area of z between 0 and -0.72

0.2642
Calculate the area
From 0 to any value of Z Direct from the table

P (0 < z < 1.83)


the probability or area of z between
0 and 1.83

0.4664
Calculate the area
From one +Z and one -Z Get the tabular values and add

P (-2.58 < z < 2.58)


the probability or area of z between
Since the mean is included
-2.58 and 2.58
in the shaded region, the
areas must be added.
0.9902

0.4951 + 0.4951 = 0.9902


Calculate the area
to the left of one +Z or Get the tabular values and add
to the right of one -Z 0.5
P (z < 1.44)
the probability or area of z less than Since the mean is included
1.44 or to the left of 1.44 in the shaded region, and
the area to the left of the
mean is shaded, 0.5 must
0.9251 be added to the value from
the table.

0.4251 + 0.5000 = 0.9251


Calculate the area
to the left of one -Z or Get the tabular values and
to the right of one +Z subtract from 0.5
P (z > 1.95)
the probability or area of z greater
than 1.95 or to the right of 1.95 Since the shaded area is on
the extreme right, the area
from the table must be
0.0256 subtracted from 0.5

0.5000 - 0.4744 = 0.0256


Calculate the area
Between 2 -Zs or 2 +Zs Get the tabular values and subtract

P (1.23 < z < 1.90)


the probability or area of z between
1.23 and 1.90

0.0806

0.4713 - 0.3907 = 0.0806


Find the Z score given the area under Normal Curve
area from Z=0 direct from the table
area to the right of Z which is Z is negative, subtract 0.5 and
greater than 0.5 locate the tabular value
area to the left of Z which is Z is positive, subtract 0.5 and
greater than 0.5 locate the tabular value
area to the right of Z which is Z is positive, subtract from 0.5
less than 0.5 and locate the tabular value
area to the left of Z which is Z is negative, subtract from
less than 0.5 0.5 and locate the tabular
value
Find the value of z if areas are given

P (0 < z <z0) = 0.4251


between 0 and z is 0.4251

Since it is area from z = 0,


direct from the table

Obtain the exact or closest


value.

the z score is 1.44


Find the value of z if areas are given

P (z < z0) = 0.9868


to the left of z is 0.9898
Since it is area to the left of z
is greater than 0.5, z is
positive, subtract the 0.5 from
the area given
0.9868 - 0.5 = 0.4868
Obtain the exact or closest
value.
the z score is 2.22
Find the value of z if areas are given

P (z < z0) = 0.0031


to the left of z is 0.0031
Since it is area to the left of z
is less than 0.5, z is negative,
subtract the area given from
0.5
0.5 - 0.0031 = 0.4969
Obtain the exact or closest
value.
the z score is -2.74
Find the value of z if areas are given

P (z > z0) = 0.9911


to the right of z is 0.9911
Since it is area to the right of
z is greater than 0.5, z is
negative, subtract the 0.5 from
the area given
0.9911 - 0.5 = 0.4911
Obtain the exact or closest
value.
the z score is -2.37
Find the value of z if areas are given

P (z > z0) = 0.2734


to the right of z is 0.2734
Since it is area to the right of
z is less than 0.5, z is positive,
subtract the area given from
0.5
0.5 - 0.2734 = 0.2266
Obtain the exact or closest
value.
the z score is 0.60
Find the value of z if areas are given
P (z > z0) = 0.0125
to the right of z is 0.0125
Since the shaded area is at
the extreme right, the area
from the table must be
subtracted from 0.5

0.5000 - 0.0125 = 0.4875


Obtain the exact or closest
value.
the z score is 2.24
Find the value of z if areas are given
Find the values of ± z0 such
that the area is 0.8452

Since the area given is more


than 0.5 and there are two
values of the z0 to be obtained

0.8452 ÷ 2 = 0.4226
Obtain the exact or closest
value.

the z score is ± 1.42


How to get z score

There are various applications of the normal distribution to


real-life problems, as such, these problems are to be
transformed to the standard normal distribution which makes
use the formula:

𝒓𝒂𝒘 𝒔𝒄𝒐𝒓𝒆 − 𝒎𝒆𝒂𝒏


𝒛=
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
Practical Application

The mean and standard deviation grade of 13 students who


took the final exam last term are 34.08 and 7.62, respectively.
What is the probability that Edna will get more than 40 in the
final exam?

𝒓𝒂𝒘 𝒔𝒄𝒐𝒓𝒆 − 𝒎𝒆𝒂𝒏 𝟒𝟎 − 𝟑𝟒. 𝟎𝟖


𝒛= 𝒛= = 𝟎. 𝟕𝟖
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝟕. 𝟔𝟐
Practical Application
z = raw score - mean
standard deviation
• Therefore, the area of
z = 40 - 34.08 = 0.78 0.78 (0.2823) is to be
7.62 subtracted from 0.5.
• The answer is 0.2177.
• This means that Edna
has a 21.77% chance
getting more than 40 in
the final exam.
Practical Application

The mean and standard deviation grade of 13 students who took


the final exam last term are 34.08 and 7.62, respectively.
What is the probability that Edna will get a score between 30
and 40?

z = raw score - mean


standard deviation

z1 = 30 - 34.08 = -0.54 z2 = 40 - 34.08 = 0.78


7.62 7.62
Practical Application
z = raw score - mean
standard deviation • Therefore, the areas
z1 = 30 - 34.08 = -0.54 of -0.54 (0.2054) and
7.62 0.78 (0.2823) are
z2 = 40 - 34.08 = 0.78 added.
7.62 • The answer is
0.4877.
• This means that
Edna has a 48.77%
chance of getting a
score between 30
and 40.
Linear
Correlation
Analysis
• Construct a scatterplot
• Determine the strength and magnitude of correlation
coefficient
• Calculate the Pearson’s correlation coefficient
Correlation Analysis
used to measure the strength of relationship between two or
more variables
Correlation between variables can be seen by scatter diagram
• if the points seem to form a straight line, there is high
correlation
• if the points form a random pattern, there is a low
correlation or no correlation at all
Scatterplot (Scatter Diagram)
visual representation of the linear relationship
between the variables
It is a graph involving the x and y axes.
Positive relationship No relationship Negative relationship
10 9 12 12
9 10 10
8
10 10 9
8 7
8
7 6 8 7 7 8

ENGLISH
7
6 5 6 6 6
5 4 6 6
4 4 4 4
3
4 3 4 3
3 2 2 2 2
2 1 2 1 1 2
1
0 0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 12
Correlation is POSITIVE when the values
increase together.

Correlation is NEGATIVE when one value


decreases as the other increases.
Linear Regression
• used for finding linear relationship between target (outcome)
and one or more predictors.
• commonly used for predictive analysis and modeling.
• For example, it can be used to quantify the relative impacts of age,
gender, and diet (the predictor variables) on height (the outcome
variable).
• also known as multiple regression, multivariate regression,
ordinary least squares (OLS), and regression
Variable X Variable Y

Independent Dependent
Variable Variable
When one changes by a certain amount,
the other changes on an average by a
certain amount
Simple Linear Multiple Linear
Regression Regression
• One quantitative • One quantitative
dependent variable dependent variable

• One quantitative • Many quantitative


independent variable independent variable
yˆ  a  bx
a  y-intercept
  y    x     x   xy 
2


Regression n  x    x
2 2

line b  slope
n   xy     x   y 

 x    x
2 2
n
When the two sets of observations increase or decrease
together, the line slopes upwards from left to right

POSITIVE
RELATIONSHIP
When one set decreases as the other increases, the line
slopes downwards from left to right

NEGATIVE
RELATIONSHIP
NO RELATIONSHIP
Coefficient of Determination (r)
R value indicates strength of relationship.
used to determine if there is a linear relationship between two
variables.
It has a value from -1 to +1.
• r = -1.0 , perfect negative/inverse linear relationship
• r = 0 , no linear relationship
• r = 1.0 , perfect positive/direct linear relationship
Pearson Product Moment Correlation
Pearson - r
Pearson correlation is widely used in statistics
to measure the degree of the relationship
between linear related variables.
n   xy     x   y 
r
n   
    n
   y     y  
2 2

 x2
 x 2

Month Lathe (x) Income (Y)
January 6 6
February 4.5 5.5
March 5.75 4 Assume that a proprietor of a
April 6.25 5 fabrication shop wants to
May 4 3.75 know if there is a
June 4.75 4.5 relationship between the
July 6.25 8 number of hours on the lathe
August 5.5 6.6 machine and the income
September 5 4.95 (Php in hundred thousand)
October 4.5 3.9 for each of a year. The results
November 4.5 4.6 are as follows:
December 5.25 6
Month Lathe (x) Income (Y) 9

January 6 6 8

February 4.5 5.5 6

5
March 5.75 4 4

April 6.25 5 3

May 4 3.75 1

June 4.75 4.5


0
0 1 2 3 4 5 6 7

July 6.25 8
August 5.5 6.6 It can be presumed that there is a
September 5 4.95 positive relationship between the
October 4.5 3.9 number of hours on the lathe
November 4.5 4.6 machine and the income per
December 5.25 6 month.
Month X Y XY X2 Y2
January 6 6 36 36 36 n   xy     x   y 
February 4.5 5.5 24.75 20.25 30.25 r
n   x     x   n   y     y  
2 2 2 2
March 5.75 4 23 33.0625 16 
April 6.25 5 31.25 39.0625 25
May 4 3.75 15 16 14.0625
June 4.75 4.5 21.375 22.5625 20.25
July 6.25 8 50 39.0625 64
August 5.5 6.6 36.3 30.25 43.56
September 5 4.95 24.75 25 24.5025
October 4.5 3.9 17.55 20.25 15.21
November 4.5 4.6 20.70 20.25 21.16
December 5.25 6 31.5 27.5625 36
TOTAL 62.25 62.8 332.175 329.3125 345. 995

As with the scatterplot, the direction of the obtained value is positive.


Therefore, there is a positive relationship between the number of hours
on the lathe machine and the income per month.
Microsoft Excel can also be used to generate the Pearson correlation coefficient.

The generated value is 0.607943039

You might also like