0% found this document useful (0 votes)
71 views16 pages

Complete Notes STATS

Epidemiology and biostatistics are quantitative sciences used in public health. Statistics involves collecting, organizing, and analyzing data, while biostatistics applies statistics to biological and health problems. Epidemiology studies the distribution and causes of diseases in populations. Quantitative methods play an important role in public health by addressing hypotheses, conducting studies, collecting and describing data, assessing evidence, and recommending interventions. Descriptive statistics summarize data while inferential statistics make predictions and generalizations to populations.

Uploaded by

silvestre bolos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views16 pages

Complete Notes STATS

Epidemiology and biostatistics are quantitative sciences used in public health. Statistics involves collecting, organizing, and analyzing data, while biostatistics applies statistics to biological and health problems. Epidemiology studies the distribution and causes of diseases in populations. Quantitative methods play an important role in public health by addressing hypotheses, conducting studies, collecting and describing data, assessing evidence, and recommending interventions. Descriptive statistics summarize data while inferential statistics make predictions and generalizations to populations.

Uploaded by

silvestre bolos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

 Epidemiology and biostatistics are the basic sciences of public health

 Statistics is a branch of applied mathematics which deals with the collection, organization, presentation, analysis and interpretation of data.
 Biostatistics is the application of statistics to problems in the biological sciences, health, and medicine
 Epidemiology is the study of the distribution and determinants of health, disease, or injury in human populations and the application of this
study to the control of health problems

ROLE OF QUANTITATIVE METHODS IN PUBLIC HEALTH

1. Address a public health question


 Generate a hypothesis
 Based on scientific rationale
 Based on observations or anecdotal evidence (not scientifically tested)
 Based on results of prior studies
 Examples of a hypothesis
 The risk of developing lung cancer remains constant
 in the last five years
 The use of a cell phone is associated with developing
 brain tumor
 Vioxx increases the risk of heart disease
2. Conduct a study
 Survey study is used to estimate the extent of the disease in the population
 Surveillance study is designed to monitor or detect specific diseases
 Observational studies investigate association between an exposure and a disease outcome
 They rely on “natural” allocation of individuals to exposed or non-exposed groups
 Experimental studies also investigate the association between an exposure, often therapeutic treatment, and disease outcome
 Individuals are “intentionally” placed into the treatment groups by the investigators
3. Collect data
 Numerical facts, measurements, or observations obtained from an investigation to answer a question
 Influences of temporal and seasonal trends on the reliability and accuracy of data
 Examples:
 The number of lung cancer cases from 1960–2000 in the United States
 The number of deaths from cardiovascular diseases in Whites and African Americans from 2000–2004
 The number of people with heart attacks among individuals having used Vioxx before 2004
4. Describe the observation/data
 Descriptive statistical methods provide an exploratory assessment of the data from a study
 Exploratory data analysis techniques
 Organization and summarization of data
 Tables
 Graphs
 Summary measures
5. Assess the strength of evidence for/against a hypothesis; evaluate the data
 Inferential statistical methods provide a confirmatory data analysis
 Generalize conclusions from data from part of a group (sample) to the whole group (population)
 Assess the strength of the evidence
 Make comparisons
 Make predictions
 Ask more questions; suggest future research
6. Recommend interventions or preventive programs
 The study results will prove or disprove the hypothesis, or sometimes fall into a grey area of “unsure”
 The study results appear in a peer-review publication and/or are disseminated to the public by other means
 Consequently, the policy or action can range from developing specific regulatory programs to general personal behavioral changes

TYPES OF STATISTICS

A. Descriptive Statistics
 deals with the collection and presentation of data and collection of summarizing values to describe its group characteristics

B. Inferential Statistics
 deals with predictions and inferences based on the analysis and interpretation of the results of the information gathered by
the statistician

VARIABLES
 numerical characteristics or attribute associated with the population being studied

 Types of Variables:
o Categorical or Qualitative Variables
 example: Gender, Eye color, Blood Type, Civil Status, Socio Economic Status
o Numerical - Valued or Quantitative Variables
 Discrete - is a variable whose values are obtained by counting
 Continuous - is a variable whose values are obtained by measuring such as temperature, distance, area, age,
height

SCALES OF MEASUREMENT

A. Nominal Scale
 Sex, Nationality
B. Ordinal Scale
 ordered but differences between values are not important
 e.g., Likert scales, rank on a scale of 1..5 your degree of satisfaction
 e.g., pain ratings
C. Interval Scale
 ordered, constant scale, but no natural zero
 e.g., temperature (C,F)
D. Ratio Scale
 ordered, constant scale, natural zero
 e.g., height, weight, age, length

SAMPLING TECHNIQUE

 Population
o is defined as groups of people, animals, places, things or ideas to which any conclusions based on characteristics of a sample
will be applied
 Sample
o subgroup of the population
 SLOVIN’S FORMULA:
n= _____N_____
1 + N(e)2
where:
n – sample
N – population
1 – constant
e – sampling error

STAGES IN THE SELECTION OF A SAMPLE

1. Define the target population


2. Select a sampling frame
3. Determine id a probability or nonprobability sampling method will be chosed
4. Plan procedure for selecting sampling units
5. Determine sample size
6. Select actual sampling units
7. Conduct fieldwork

 Types of Sampling Techniques


1. Probability Sampling
 the sample is a proportion (a certain percent) of the population and such sample is selected from the population by
means of some systematic way in which every element of the population has a chance of being included in the sample
 Randomization is a feature of the selection process rather that an assumption about the structure of the population
 More complex, time consuming and more costly
2. Non – Probability Sampling
 The sample is not a proportion of the population and there is no system in selecting the sample. The selection depends
upon the situation.
 No assurance is given that each item has a chance of being included as a sample
 There is an assumption that there is an even distribution of characteristics within the population, believing that any
sample would be representative

 Examples of Probability Sampling:

1. Simple Random Sampling


 Lottery Method
i. This is the most popular and simplest method
2. Stratified Random Sampling
 the population is split into non - overlapping groups (“strata”), then simple random sampling is done on each group to
form a sample
3. Systematic Sampling
 This method is widely employed because of its ease and convenience.
 A frequently used method of sampling when a complete list of the population is available It is also called Quasi - Random
Sampling
4. Cluster Sampling
 When the geograpical area where the study is too big and the target population is too large

 Examples of Non - Probability Sampling:

1. Convenience Sampling
 no system of selection but only those whom the researcher or interviewer meet by chance are include the sample.
 process of picking out people in the most convenient and fastest way to immediately get their reactions to a certain hot
and controversial issue
 not representative of target population because sample are selected if they can be accessed easily and conveniently.
 Advantage: easy to use
 Disadvantage: bias is present
 it could deliver accurate resultwhen the population is homogeneous

2. Purposive Sampling
 the respondents are chosen based on their knowledge of the information desired.
o Quota Sampling
 specified number of persons of certain types are include in the sample.
o Judgement Sampling
 sample is taken based on certain judgements about the overall population
DESCRIPTIVE STATISTICS
✔ Deals with the collection and presentation of data and collection of summarizing values to describe its group
characteristics

DATA
✔ gathered body of facts
✔ central thread of any activity
✔ Understanding the nature of data is most fundamental for proper and effective use of statistical skills

❖ TYPES OF DATA
o According to Source:
▪ Primary Data – interview, registration, experiment, questionnaire, etc.
▪ Secondary Data – book, journal, newpaper, thesis, dissertation, etc.

o According to Functional Relationship


▪ Independent Data – refers to any controlling data
▪ Dependent Data – refers to any data that is affected by controlling data

❖ Methods of Collecting Data


o Objective Methods
o Subjective Methods
o Use of Existing Records

❖ Methods in Presenting Data


o Textual
o Tabular
o Graphical

DESCRIPTIVE STATISTICS MEASURES


❖ Measure of Location
o summarizes a data set by giving a “typical value” within the range of the data values that describes its location
relative to entire data set
▪ Minimum and Maximum – MIN is the smallest value in the data set while MAX is the largest value
in the data set
▪ Mean – It is the average of the data
● Properties of the Mean
o Uniqueness
o Simplicity
o Affected by extreme values
▪ Median - Divides the observations into two equal parts
o If n is odd, the median is the middle number.
o If n is even, the median is the average of the 2 middle number
▪ Mode - Value that occurs most often
o Unimodal - A data set that has only one value that occurs with the greatest frequency
o Bimodal - If a data set has two values that occur with the same greatest frequency, both
values are mode
o Multimodal - If a data set has more than two values that occur with the same greatest
frequency, each value is used as the mode
o No mode - When no data value occurs more than once
▪ Percentiles - values that divide the distribution into 100 equal parts. P10 or tenth percentile locates
the point that is greater than 10 percent of the items in the distribution
▪ Deciles - values that divide a distribution into 10 equal parts. The 1st decile is the 10th percentile;
the 2nd decile is the 20th percentile….
▪ Quartiles - Divide an array into four equal parts, each part having 25% of the distribution of the data
values. The 1st quartile is the 25th percentile; the 2nd quartile is the 50th percentile, also the median
and the 3rd quartile is the 75th percentile.

❖ Measures of dispersion
o single value that is used to describe the spread of the distribution
o A measure of central tendency alone does not uniquely describe a distribution

o Types of Measures of Dispersion


▪ Absolute Measures of Dispersion
▪ Range – difference between the maximum and
minimum value in a data set
● Interquartile Range - distance or range between
the Variance – it measure dispersion to the
scatter of the values about there mean
▪ Standard Deviation – is the square root of variance
● ±1SD = 68.3%
● ±2 SD = 95.4%
● ±3SD = 99.7%

▪ Relative Measure of Dispersion


▪ Coefficient of Variation – is a measure use to compare the dispersion in two sets of data
which is independent of the unit of the measurement

❖ Symmetry
o A distribution is said to be symmetric about the mean, if the distribution to the left of mean is the “mirror image”
of the distribution to the right of the mean
▪ Skewness - measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data
set, is symmetric if it looks the same to the left and right of the center point
● Positively Skew
● Negatively Skew
● Symmetrical Distribution/Equal
▪ Kurtosis - measure of whether the data are peaked or flat relative to a normal distribution.
● Leptokurtic
● Mesokurtic (Normal)
● Platykurtic

PROBABILITY
✔ a branch of mathematics which deals with the study of possible outcomes of an event or set of events together with the
outcomes' relative likelihood and distributions
o Two types of probability:
1. Objective probability
a. Classical Probability – calculated by the process of abstract reasoning
b. Relative Frequency Probability – depends on the repeatability of some process and the
ability to count
2. Subjective probability – based upon an educated guess

❖ Three properties of probability theory


1. Given some process ( or experiment) with n mutually exclusive outcomes ( called events), E1, E2, . . .
, En, the probability of any event Ei is assigned a nonnegative number. That is,
P(Ei) ≥ 0
2. The sum of the probabilities of the mutually exclusive outcomes is equal to 1.
P(E1) + P(E2) + ... + P(En) = 1
✔ This is the property of EXHAUSTIVENESS
3. Consider any two mutually exclusive events, Ei and Ej. The probability of the occurrence of either Ei
or Ej is equal to the sum of their individual probabilities.
P(Ei + Ej) = P(Ei) + P(Ej)

❖ Calculating the probability of an event


1. Conditional Probability
✔ The conditional probability of A given B, denoted P(A\B), is the probability that event A has occurred
in a trial of a random experiment for which it is known that event B has occurred.
2. Joint Probability
✔ Calculates the likelihood of two events occurring together and at the same point in time
3. The Multiplication Rule
4. The Addition Rule
5. Independent Events
✔ When P(A\B) = P(A) * P(B) holds, which in turn is true if and only if P(B\A) = P(B)
6. Complementary Events
7. Marginal Events

PROBABILITY DISTRIBUTION

 Frequency Distribution
 It is a listing of observed / actual frequencies of all the outcomes of an experiment that occurred when
experiment was done

 Probability Distribution
 it is a listing of the probability of all the possible outcomes that could occur if the experiment was done
o It can be described as:
 A diagram (Probability Tree)
 A table
 A mathematical formula

 Types of Probability Distribution


o Discrete Probability Distribution
1. Random variables can take only limited number of values.
2. Ex. No. of heads in two tosses
 Binomial Distribution
 Poisson Distribution
o Continuous Probability Distribution
1. Random variables can take any value
2. Ex. Height of students in the class
 Normal Distribution

BINOMIAL DISTRIBUTION
 There are certain phenomena in nature which can be identified as Bernoulli’s processes, in which:
o There is a fixed number of n trials carried out
o Each trial has only two possible outcomes say success or failure, true or false etc.
o Probability of occurrence of any outcome remains same over successive trials
o Trials are statistically independent
 expresses the probability of one set of alternatives – success (p) and failure (q)
o P (X = x) = nrC pr qn-r (prob. of r successes in a trials)
 n = no. of trials undertaken
 r = no. of successes desired
 p = probability of success
 q = probability of failure

POISSON DISTRIBUTION
 When there is a large number of trials, but a small probability of success, binomial calculation becomes impractical
 If ƛ = mean no. of occurence of an event per unit interval of time/space, then probability that it will occur exactly ‘x’ times
is given by
 P(x) = ƛx e-ƛ where e is napier constant and e = 2.7182
 Characteristics of Poisson Distribution
1. It is a discrete distribution
2. Occurrences are statistically independent
3. Mean no. of occurrences in a unit of time is proportional to size of unit
4. It is always right skewed
5. PD is a good approximation to BD when n > or = 20 and p < or = 0.05

NORMAL DISTRIBUTION
 Also called as Gaussian Distribution
 Develop by eighteenth century mathematician – astronomer Karl Gauss
 It is symmetrical, unimodal (one peak)
 The tails are asymptotic to horizontal axis.
 X axis represents random variable like height, weight etc.
 Y axis represent its probability density function
 The total area under the curve is 1 (or 100%)
 Only two parameters are considered: Mean and Standard Deviation
 Area under the curve tells the probability
o The mean ±1 standard deviation covers approximately 68% of the area under the curve
o The mean ±2 standard deviation covers approximately 95.5% of the area under the curve
o The mean ±3 standard deviation covers approximately 99.7% of the area under the curve

SAMPLING DISTRIBUTION
 distribution of values taken by the statistic in all possible samples of the same size from the same population.

 Population distribution vs. Sampling distributions


o There are three distinct distributions involved when we sample repeatedly and measure a variable of interest.
1. The population distribution gives the values of the variable for all the individuals in the population.
2. The distribution of sample data shows the values of the variable for all the individuals in the
sample.
3. The sampling distribution shows the statistic values from all the possible samples of the same size
from the population.

SAMPLE PROPORTION
 When we want information about the population proportion p of successes, we often take an SRS and use the sample
proportion p ˆ to estimate the unknown parameter p. The sampling distribution of p ˆ describes how the statistic varies in
all possible samples from the population.
 The mean of the sampling distribution of p ˆ is equal to the population proportion p. That is, p ˆ is an unbiased estimator
of p.
 When the sample size n is larger, the sampling distribution of p ˆ is close to a Normal distribution with mean p and
standard deviation
 In practice, use this Normal approximation when both np ≥ 10 and n(1 - p) ≥ 10 (the Normal condition).

SAMPLE MEANS
 Sampling from Normal Population
o We have described the mean and standard deviation of the sampling distribution of the sample mean x but not
its shape. That's because the shape of the distribution of x depends on the shape of the population distribution
o In one important case, there is a simple relationship between the two distributions. If the population distribution
is Normal, then so is the sampling distribution of x . This i s true no matter what the sample size is.

 The Central Limit Theorem


o Most population distributions are not Normal. What is the shape of the sampling distribution of sample means
when the population distribution isn’t Normal?
o It is a remarkable fact that as the sample size increases, the distribution of sample means changes its shape:
it looks less like that of the population and more like a Normal distribution! When the sample is large enough,
the distribution of sample means is very close to Normal, no matter what shape the population distribution has,
as long as the population has a finite standard deviation.
o The central limit theorem (CLT) says that when is large, the sampling distribution of the sample mean x is
approximately normal.
What are Mean Median and Mode?
Mean, median and mode are all measures of central tendency in statistics. In
different ways they each tell us what value in a data set is typical or
representative of the data set.

The mean is the same as the average value of a data set and is found using a
calculation. Add up all of the numbers and divide by the number of numbers in
the data set.

The median is the central number of a data set. Arrange data points from
smallest to largest and locate the central number. This is the median. If there are
2 numbers in the middle, the median is the average of those 2 numbers.

The mode is the number in a data set that occurs most frequently. Count how
many times each number occurs in the data set. The mode is the number with
the highest tally. It's ok if there is more than one mode. And if all numbers occur
the same number of times there is no mode.

How to Find the Mean


1. Add up all data values to get the sum
2. Count the number of values in your data set
3. Divide the sum by the count

The mean is the same as the average value in a data set.

Mean Formula
The mean x̄ of a data set is the sum of all the data divided by the count n.

How to Find the Median


The median i is the data value separating the upper half of a data set from the
lower half.

• Arrange data values from lowest to highest value


• The median is the data value in the middle of the set
• If there are 2 data values in the middle the median is the mean of those 2
values.
Median Example
For the data set 1, 1, 2, 5, 6, 6, 9 the median is 5.

For the data set 1, 1, 2, 6, 6, 9 the median is 4. Take the mean of 2 and 6 or,
(2+6)/2 = 4.

How to Find the Mode


Mode is the value or values in the data set that occur most frequently.

For the data set 1, 1, 2, 5, 6, 6, 9 the mode is 1 and also 6.

Outliers
Potential Outliers are values that lie above the Upper Fence or below the Lower
Fence of the sample set.
Upper Fence = Q3 + 1.5 × Interquartile Range
Lower Fence = Q1 − 1.5 × Interquartile Range

Quartiles
Quartiles mark each 25% of a set of data:

• The first quartile Q1 is the 25th percentile


• The second quartile Q2 is the 50th percentile
• The third quartile Q3 is the 75th percentile

The second quartile Q2 is easy to find. It is the median of any data set and it
divides an ordered data set into upper and lower halves.

The first quartile Q1 is the median of the lower half not including the value of Q2.
The third quartile Q3 is the median of the upper half not including the value of Q2.

How to Calculate Quartiles


1. Order your data set from lowest to highest values
2. Find the median. This is the second quartile Q2.
3. At Q2 split the ordered data set into two halves.
4. The lower quartile Q1 is the median of the lower half of the data.
5. The upper quartile Q3 is the median of the upper half of the data.
If the size of the data set is odd, do not include the median when finding the first
and third quartiles.

If the size of the data set is even, the median is the average of the middle 2
values in the data set. Add those 2 values, and then divide by 2. The median
splits the data set into lower and upper halves and is the value of the second
quartile Q2.

How to Find Interquartile Range


The interquartile range IQR is the range in values from the first quartile Q1 to the
third quartile Q3. Find the IQR by subtracting Q1 from Q3.

• IQR = Q3 - Q1

How to Find the Minimum


The minimum is the smallest value in a sample data set.

Ordering a data set from lowest to highest value, x1 ≤ x2 ≤ x3 ≤ ... ≤ xn, the
minimum is the smallest value x1. The formula for minimum is:

How to Find the Maximum


The maximum is the largest value in a sample data set.

Ordering a data set from lowest to highest value, x1 ≤ x2 ≤ x3 ≤ ... ≤ xn, the
maximum is the largest value xn. The formula for maximum is:
How to Find the Range of a Set of Data
The range of a data set is the difference between the minimum and maximum.
To find the range, calculate xn minus x1.

Coefficient of Variation

The coefficient of variation describes dispersion of data around the mean. It is


the ratio of the standard deviation to the mean. The coefficient of variation is
calculated as the standard deviation divided by the mean.

For a Population

For a Sample

Standard Deviation

Standard deviation is a measure of dispersion of data values from the mean. The
formula for standard deviation is the square root of the sum of squared
differences from the mean divided by the size of the data set.

For a Population
For a Sample.

Variance

Variance measures dispersion of data from the mean. The formula for variance is
the sum of squared differences from the mean divided by the size of the data set.

For a Population

For a Sample

Midrange

The midrange of a data set is the average of the minimum and maximum values.
EXAMPLES

You might also like