0% found this document useful (0 votes)
19 views39 pages

Lecture 4 - Normal and Nonnormal Dist - HS - 070323en

Biostatistics Lecture Notes UPNG

Uploaded by

Oxy Maine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views39 pages

Lecture 4 - Normal and Nonnormal Dist - HS - 070323en

Biostatistics Lecture Notes UPNG

Uploaded by

Oxy Maine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Normal and Non-Normal

Distribution

Elias Namosha

Division of Public Health, SMHS – UPNG,

Intro to Biostatistics-Health Sciences

07th March 2023


Public health uses epidemiology and
biostatistics to see population patterns
• Population patterns can be seen in different kinds of
distributions of physiological measurements
• When we take physiological measurements, height,
systolic blood pressure, we can present their
distribution and see that many follow a pattern called
the normal distribution. Other measures follow a
non-normal distribution.
• All distributions have some different statistical and
mathematical characteristics which help us
understand the population characteristics.
Mean, Median and Mode
▪Arithmetic mean – average value
▪Median – central value
▪Mode – most common value

▪Mean uses all data, so sensitive to outliers


▪Mean has best statistical properties
▪Mean preferred for normally distributed
data

▪Median preferred for skewed data


Summary
E.g., Length of stay at hospital for
patients with Pneumonia
0, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 10, 10,10, 10, 11,
12, 12, 12, 13, 14, 16, 18, 18, 19, 22, 27, 49

What is the MEAN length of stay @hospital?


What is the MEDIAN?
What is the MODE?

𝑆𝑢𝑚 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛


Mean = = 360/30 = 12
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑚𝑏𝑒𝑟𝑠

n = 30
Median @ 30+1 / 2 = 15.5, i.e., between 15th and 16th position
Value at 15th position = 10
Value at 16th position = 10
So median = 10
Mode= 10
Mode: the most common value
in a children’s age group distribution

Brits H et al., Pan Afr Med J 2020


What is the Normal Distribution?
•The Normal Distribution is a continuous probability
distribution that is symmetrical on both sides of the
mean, so the right side of the center is a mirror
image of the left side.

•The normal distribution is often called the bell


curve because the graph of the its probability density
looks like a bell.

•Normal distribution is also known as Gaussian


distribution, a German mathematician (Karl Friedrich
Gauss) in the 19th century.
3/7/2023
Normal Curve

mean=median=mode
•

3/7/2023
How do you know a
distribution is Normal?
In order to be considered a normal distribution, a data
set (when graphed) must follow a bell shape symmetrical
curve centered around the mean.
-A standard normal distribution has a mean of 0
and a standard deviation of 1.
It must also adhere to the empirical rule that indicates
the percentage of the data set falls within (plus or
minus) 1, 2, and 3 standard deviations of the mean.
-The empirical rule says in a standard normal
distribution, 68% of the data points will fall within ± one
standard deviation from the mean and 95% will fall
within
3/7/2023
± two standard deviations.
Areas under the normal curve that lie between 1, 2
and 3 standard deviations (SD) on each side of the
mean.
Normal curves are symmetric
Skewed vs normal distributions
Left-skewed Right-skewed
Normal Distribution
▪This pattern occurs so often in biological and
natural world that mathematicians have
studied it and found that if the observed
measurement is the sum of many independent
small random factors, the resulting
measurements will take on values that are
distributed normally as the bell-curve above –
Normal or Gaussian distribution

▪Distribution is completely defined by the


population mean or µ, and population
standard deviation or SD.
▪It is all the information one needs to describe
the population fully if the distribution of
values follows a normal distribution
Areas under the normal curve that lie
between 1, 2, and 3 standard deviations
on each side of the mean.

In a normal distribution, about 95% of data values are


contained within the mean plus or minus two SDs. Don’t
worry about the math and the formula here, focus on the
concept.
3/7/2023
The area under the curve from -1*s to + 1* s is 68% of the
total.
The remaining 32% is split evenly between the two tails.
Do you remember standard
deviation? the empirical rule?
the standard normal distribution?

The standard deviation is a measure of how closely


grouped or how widely spaced a set of data appears.
The empirical rule says in a standard normal
distribution, 68% of the data points will fall within ±
one standard deviation from the mean and 95% will
fall within ± two standard deviations. A standard
normal distribution has a mean of 0 and a standard
deviation of 1.
So how DID that happen?
You get an F- on a paper and the teacher stands up in class
and announces, "I grade on the curve - deal with it."
What does it mean to "grade on the curve"?
3/7/2023
Let's say a teacher gives a test to a class of students
and the mean (or average) score is 80 and the
standard deviation is 5.
According to the empirical rule, 68% of the students
should fall within ± one (1) standard deviation of
the mean. If you look at the curve, those students
will get C's (C is meant to show "average
performance".
Move out to two (2) standard deviations away from
the mean, and you have the B's to the right and the
D's to the left. Move out to three (3) standard
deviations from the mean and you have the A's to
the right and the F's to the left.

3/7/2023
70 75 80 85 90
70 75 80 85 90
Are you getting irritated?

•Say you are in a class of 17 students and the mean


and standard deviation follow the curve above.
•17 x 68% = 11.56 or 12 students will get a C and B's,
D's, while A's and F's will be spread among 5 students.
Depending on the instructor's preferences, usually it
works out to one F, one A, two B's and one D.
Real life examples
The normal distribution is important because
lots of variables studied are normally
distributed:
Average height of people
Average age of people
Birthweight of babies
Blood pressure
Marks on the test, etc
3/7/2023
Real life examples
Height
▪ Height of the population is the example
of normal distribution.
▪ Most of the people in a specific
population are of average height.
▪ The number of people taller and
shorter than the average height people
is almost equal, and a very small
number of people are either extremely
tall or extremely short.
▪ However, height is not a single
characteristic, several genetic and
environmental factors influence height.
Therefore, it follows the normal
distribution.

3/7/2023
Real life examples

Age distribution

3/7/2023
Real life examples
Babies Birth Weight
▪ The normal birth weight of a
newborn range from 2.5 to 3.5 kg.
▪ The majority of newborns have
normal birthweight whereas only a
few percentage of newborns have a
weight higher or lower than the
normal.
▪ Hence, birth weight also follows the
normal distribution curve.
▪ In general: Boys are usually a little
heavier than girls.
▪ The average birth weight for babies
is around 3.5 kg

3/7/2023
Non-Normal Distribution
▪ When a population follows a normal distribution,
we can describe its location and variability
completely with the two parameters of the mean
and variability (standard deviation)

▪ When there is evidence that population does not


(roughly) follow a normal distribution, it is more
appropriate to summarize their data using Median
and other percentiles, like upper and lower
quartiles – Median, Q1, Q3

▪ We rarely observe all members of a population,


hence we estimate these parameters from a
sample drawn at random from the population
Non – Normal Distribution
Interquartile Range
Properties / Uses
Used with median
Five-number summary for box-and
whiskers diagram:
– Maximum (100%, largest value)
– Third quartile (75%)
– Median (50%)
– First quartile (25%)
– Minimum (0%, smallest value)
Definition: the central 50% of a
distribution
Box plot and whiskers
▪ In descriptive statistics, a box plot or boxplot is a
method for graphically depicting groups of
numerical data through their quantiles.

• Box plots may also have lines extending from the


boxes (whiskers) indicating variability outside the
upper and lower quartiles, hence the terms box-
and-whisker plot or box-and-whisker diagram.

• When data is not normally distributed (non-


parametric), boxplot & whiskers are used.

• Median are used instead of mean.


A box and whiskers plot shows five
things:
1. the median (line inside the
box);
2-3. the boundaries of the
interquartile range (top and
bottom of the box);
4-5. the boundaries of the
absolute range (far points of the
“whiskers”)

3/7/2023
Interquartile range
A quartile divides the number of data points into four parts,
or quarters, of more-or-less equal size. The data must be ordered
from smallest to largest to compute quartiles; as such, quartiles are
a form of order statistics. The three main quartiles are as follows:

•The first quartile (Q1) is defined as the middle number between


the smallest number or minimum and the median of the data set. It
is also known as the lower or 25th empirical quartile, as 25% of the
data is below this point.

•The second quartile (Q2) is the median of a data set; thus 50% of
the data lies below this point.

•The third quartile (Q3) is the middle value between the median and
the highest value (maximum) of the data set. It is known as
the upper or 75th empirical quartile, as 75% of the data lies below
this point.
Q1 Q2 Q3

Q1 - 1.5 x IQR Q3 + 1.5 x IQR Outlier

7 14

IQR = 7 (Q3-Q1)
Box and Whiskers plots
Three curves with different skewing

Var D Var B Var E

3/7/2023
Boxplots
Max
180

Q3
160

Median Q2
Q1
140

Min
120

• Maximum value seats at top of whisker (approx. 165)


• Minimum value seats at bottom of whisker (approx. 125)
• Whiskers seat on both sides of 25th and 75th percentile
• Median value is represented with a line cutting the middle of the box
For more categories…

Boxplots
180
160
140
120

Male Female
For more categories…
Boxplots
180
160
140
120

Male Female Male Female Male Female


30-45 46-59 60+

NB: Notice among female ages 30-45, there are some values that seat above
and outside the maximum value. These (three dots) are called outliers
3/7/2023
Box and whiskers diagrams tell us a lot about distribution
curves. Note that A, B, and C are all normal curves. Also
note that D and E are similar to B, but skewed right and
left.
THE END
Thank you..

You might also like