0% found this document useful (0 votes)
8 views84 pages

Intro To Statistics

This is my work on intro to statistics. This will give Al round guide towards different approach to statistics.

Uploaded by

jesusebagahj73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views84 pages

Intro To Statistics

This is my work on intro to statistics. This will give Al round guide towards different approach to statistics.

Uploaded by

jesusebagahj73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Concept of frequency distribution.

Overview of Descriptive and


Inferential statistics, Scales of
measurements, Collection,
Organization and Presentation of
data.
Dr. A.G. Omisore.
FWACP, MPH, M.B.Ch.B
STATISTICS
 Statistics is the science of data.
 It involves collecting, analysing/

summarizing, interpreting and


presenting data,
 Using them to estimate the

magnitude (level) of
associations and test
hypothesis.
Statistics

 The discipline concerned with


the treatment of numerical data
derived from groups of
individuals (P. Armitage).
 The science and art of dealing

with variation in data through


collection, classification and
analysis in such a way as to
obtain reliable results ( JM
Last).
STATISTICS
 Broadly divided into two
- Descriptive
- Inferential.
 Descriptive Statistics- deal with
description of characteristics(s)
of a finite population
 Inferential statistics makes

deduction from a sample of a


population to the population.
Methods of Descriptive Statistics
Descriptive statistics are those which
summarize patterns in the responses
of people in a sample.

 Frequency Tables
 Diagrams (Graphs/charts)
 Summary Indices
INFERENTIAL STATISTICS
 Statistical inference is the act of
generalizing from a sample to a
population with calculated degree of
certainty. The importance of inference
during data analysis is important

 The two traditional forms of statistical


inference are estimation and hypothesis
testing. Estimation predicts the most
likely location of a parameter and
hypothesis testing ("significance" testing)
provides an answer to a statistical
question
6
Examples of Inferential
Statistics
 Measures of Association (Chi
square)
 T Test (Paired and Independent)
 One Way Analysis of Variance
 Multi-way ANOVA
 Regression Analysis
 Correlation Analysis
 Factor analysis etc

7
MEDICAL STATISTICS/
BIOSTATISTICS
Provides appropriate
methods for:
Collecting,
Organizing,
Analyzing,
Interpreting and
Presenting Medical and
Health data.
BENEFIT OF STATISTICS
 Has a central role in biomedical
investigations
 Better way of organizing information on a
wider and more formal basis (empirical
evidence) than exchange of anecdotes
and personal experience
 Takes into account the intrinsic variation
inherent in most biological processes (e.g.
blood pressure)
BENEFIT OF STATISTICS IN
HEALTH
 Measurement of population health
 Allow for comparisons of the state of

health between one period or the


other, or between one location and
another
 Thus, statistical presentations/

reports provide the basis for health


planning.
 Actions taken on health issues

usually dependent on relevant


health information
BIOSTATISTICS IN PUBLIC
HEALTH HISTORY
JOHN GRAUNT
 John Graunt- founder of vital statistics
 1662 Publication - quantified pattern of
disease/ mortality in population.
 Publication based on “Bills of mortality” – a
weekly count of people who died since 1592
Observed excess of male births
High infants mortality
Proportional mortality
Seasonal variation in mortality
Urban-rural variation
William Farr
 Compiler of statistical abstracts
in Britain from 1839- 1880.
 Annual counts of births,
marriages& deaths done
 Used these as numerators and
census data as population data.
 Thus, crude rates were
obtained.
 Examined the effects of
altitude, location (densely&
sparsely populated areas)&
◦John Snow in 1854-
 Formulated and tested hypothesis
concerning the origins of Cholera epidemics
in London 20 years before microscope was
discovered
 Described as the father of Field
epidemiology
 Used spot maps to show case distribution
 Demonstrated water was the source of
infection
 Remove handle of pump to control
epidemics
PERTINENT DEFINITIONS IN
STATISTICS
 POPULATION: A set of units (usually People,
Objects, Transactions or Events) that we
are interested in studying.
 SAMPLE: A subset of the units of the

population.
 VARIABLE: is a characteristic or property of

an individual population unit. A variable is


that whose value varies or changes
 DATA: obtained by measuring the values of

one or more variables on the units in a


sample.
Populations and samples
 Usually, data is collected on a sample from
larger group called the population/universe
 Samples are not of interests on their own

rights, but for what they can tell about the


population.
 Statistics allow us to use the sample to

make inferences about the population from


which the sample was drawn.
 Different samples from the same

population may give different results due


to chance or sampling variation.
 Samples have to be drawn in a truly

representative manner
Types of variables:
Quantitative & Qualitative
 Numerical (Quantitative) variables-
measured on a naturally occurring
numerical scale. Two types-
– continuous i.e. measured on a
continuous scale including fractions
and decimals e.g. age, weight, height
etc
- discrete i.e. can only take limited
numbers of values, usually whole
numbers e.g. episodes of diarrhoea in
a child, number of men in a village.
Types of variables
 Categorical (Qualitative) variables
– non numerical e.g. place of birth,
ethnic group, social class, gender
- dichotomous (binary) – has
only 2 possible outcomes e.g. sex
(M/F), survival status (A/D)
- ordered categorical – has a
natural ordering, but not in a
numerical sense e.g. social class (I,
II, III, IV, V)
ELEMENTS OF STATISTICAL
INFERENCE
 Population
 Sample
 One or more variables of
interest
 Reliability
RELIABILITY
 Measures how good a
inference is.
 In using a sample, we

introduce an element of
uncertainty into our
inferences.
 As much as possible, it is

important to determine&
report the reliability of each
RELIABILITY
 Measure of reliability that
accompanies an inference
separates the science of
statistics from the art of
fortune telling.
 Measure of reliability- is a

statement (usually
quantified) about the degree
of uncertainty associated
NATURE OF STATISTICAL
DATA
 Primary and Secondary data
- Primary data: Data originally
collected in the process of any
statistical inquiry
- Secondary data: Data collected by
other individual/people/organization
 Primary source is preferred to

secondary source.

23
SOURCES OF HEALTH DATA
 Census
 Vital Registration systems
 Institutions (school health,

hospitals, health centers,


Veterinary Clinics).
 Notification centers/

Epidemiological surveillance -
(infectious diseases, cancer
registries etc).
CENSUS
 National census- enumeration of the whole
populace in a country, usually done every ten
years.
 The last census in Nigeria was in 206& the popn.

Was 140,003,542
 North West zone- 35,786,944 most populous

followed by South West - 27,581,992


 Osun state has a population of 3,423,535

representing 2.45% of Nigeria’s population.


 Census enables us to calculate crude or total

population rates.
Figure 1: Population pyramid, Oriade
HDSS.
>=105 0.13 0.1
100-104 0.11 0.24
95-99 0.12 0.18
90-94 0.4 0.600000000000001
85-89 0.6 0.730000000000001
80-84 1.29 1.88
A 75-79 1.25 1.29
70-74 2.22 2.63
G 65-69 1.71 2.23
E 60-64 2.47 3.41 FEMALE
55-59 2.05 2.4 MALE
G 50-54 3.82 4.09
45-49 3.62 3.31
R 40-44 5.26 5.24
O 35-39 5.33 5.68
U 30-34 6.15 6.74
P 25-29 6.66 7.81
20-24 7.31 7.98
15-19 10.82 10.12
10_14 12.85 10.95
05_09 14.97 12.81
0-4 10.81 9.59
20 15 10 5 0 5 10 15
Proportionate Percent population by sex
VITAL STATISTICS
 Records of vital events- births, deaths,
marriages& divorces- obtained by
registration.
 Used for generating birth and mortality

rates for whole populations or subgroups.


 Non existing or ineffective “Vital

Registration systems” in Nigeria


 Thus, lack of relevant up to date health and

demographic information.
 Data on important indicators of

development e.g. IMR, U-5MR& MMR are


estimated.
INSTITUTION BASED RECORDS

 School health records


 Pre-employment screening

(occupational setting)
 Hospital based records
SURVEYS.
 Usually ad-hoc, but may be routine.
 Popular National surveys-

- NDHS (National Demographic and


Health Survey)
- HIV Sero prevalence Sentinel
study among pregnant women
- NARHS (National HIV/AIDS
and Reproductive Health Survey).
 Surveys can be conducted by individual or

group of researchers, organizations,


governments etc.
NATIONAL DEMOGRAPHIC HEALTH
SURVEY 2008
 The NDHS is a national sample survey designed to
provide up-to-date information on background
characteristics of the respondents; fertility levels;
nuptiality; sexual activity; fertility preferences;
awareness and the use of family planning methods;
breastfeeding practices; nutritional status of
mothers and young children; early childhood
mortality and maternal mortality; maternal and
child health; and awareness and behaviour
regarding HIV/AIDS and other sexually transmitted
infections.
 The target groups were women age 15-49 years&
men age 15-59 years in randomly selected
households across Nigeria.
 Information about children age 0-5 years was also
collected, including weight and height.
Data Collection and
Organization
Methods of Data Collection
Quantitative and Qualitative
Methodologies
 Quantitative – numbers, percent,
means. Explore what?
 Qualitative- explore why?, how?
 Quantitative: Age at marriage, age,
years of schooling etc.
 Qualitative: Reason for using condoms.

32
Common quantitative
techniques
 Structured Interview using questionnaire
 Service Statistics – Information routinely

collected; reference can be made to existing


records in the system

33
Common Qualitative
Techniques
 Focus Group Discussion
 In-depth Interview
 Observation

- Direct
- mystery client
- ethnological technique

34
Preparing for Data
Collection
 Address any ethical concern
 Prepare written guidelines for how data will

be collected
 Pretest instruments
 Modify Questionnaires
 Train all staff involved

35
Ethical Concern

 Parental Permission
 Informed Consent
 Voluntary Participation
 Confidentiality and Privacy

36
Pilot-Testing Survey Questions
 Detects difficult questions
 Verifies duration to complete questionnaire
 Builds competence in data collector
 Uncover problems in field procedures

37
Processing Quantitative
Data
 Steps to organize your data for analysis.
* Field editing
* Coding
* Data Entry and Tabulation
* Data Cleaning

38
Field Editing
 Involves systematically reviewing
questionnaires for consistencies and
completeness
 Systematic organization of data, recording

date, place of interview, other identifier of


the respondent

39
Coding
 Process of organizing and assigning
meaning to data eg. CGPA

First Class 1
Second Class Upper 2
Second Class Lower 3
Third Class 4
Pass 5

40
Data Entry
 Data will usually be entered into a computer
program prior to analysis.
 Statistical Packages with data entry

modules are:
 Epi-info (both DOS and Windows Versions)
 SPSSPC (DOS)
 SPSS Data Entry Builder (Windows)
 ISSA etc

41
Data Cleaning
 Checking for and correcting errors in data
entry.
 Some software packages have built-in-

systems that check for data entry errors eg


the CHECK PROGRAM in EPI-INFO

42
Types of Data Entry Errors
 Missing data
 Inconsistent data
 Out of range values

43
Missing Data
 Respondent declines to answer a question
 A data collector failed to ask or record a

response
 A data entry clerk skips a question

44
Inconsistent Data
 A respondent may contradict himself
thereby creating inconsistency in reporting

45
Out-of-Range Values
 Impossible or implausible data items eg
30 recorded as number of years of
experience for a respondent who is 25 years
old

46
ANALYSING QUANTITATIVE
DATA
Two Types of Analysis:
* Descriptive
* Inferential
Three levels of Analysis:
* Univariate Level uni = single
variable
* Bivariate Level bi = two variables
* Multivariate Level multi = 3 or more

47
SCALES OF MEASUREMENT
Measurement
 The way variables are
measured is very important.
 Measurement is the

assignment of numbers to a
variable
 Measurement determines the

choice of relevant statistical


method
49
Scales of Measurement
 Nominal

(non-numerical/qualitative)
 Ordinal

(non-numerical/qualitative)
 Interval

(numerical/quantitative)
 Ratio

(numerical/quantitative)
Scales of Measurement
 Nominal scale – lowest level
of measurement. Merely
classifies the measure into
mutually unordered
categories; has no notion of
numerical magnitude e.g.
gender (male, female), blood
group (A, B, AB, O)
Nominal Variable
 Classifies persons or objects into two or
more categories
 Members of a category have at least one
common characteristic.
 We cannot quantify or even rank-order
those category.
 For identification purpose, nominal
variables are often represented by
numbers.
 The values of the scale have no 'numeric'
meaning in the way that you usually think
about numbers.

52
Examples of Nominal
Variables

Variables Categories Assigned


code
Sex Male 1
Female 2
Residence Rural 1
Urban 2
HIV Status Positive 0
Negative 1

53
Scales of Measurement
Ordinal scale – in addition
to its nominal property,
has ability to rank or order
phenomenon. It is defined
by related categories e.g.
grades of pain (mild,
moderate, severe), social
class (I, II, III, IV ,V).
Scales of Measurements
 Intervalscale – measurements
are expressed in numbers
except that the starting point is
arbitrary, depending largely on
the units of measurement.
Meanings can be physically
attached to the difference
between 2 measurements on
this scale, but not to their ratios,
as the ratio of any 2 intervals is
Scales of Measurements
 Ratio scale – has all the 3

properties of nominal, ordinal


and interval scale and in
addition, has a true zero
point. The ratio of any 2
measurements on the scale is
physically meaningful e.g.
height (inc or cm zero is
same), weight (Ibs or Kg,
Scales of Measurements
 From the above properties of the different
scales, one can recognize that arithmetic
operations of addition and multiplication
are not possible on the nominal or ordinal
scales; only addition (subtraction) is
possible on the interval, while all
operations are possible on the ratio scale.
 The mutually exclusive nature of the scales,
not withstanding, it is sometimes possible
or necessary during statistical analysis to
transform data from one scale to another
so as to remove inconvenient properties of
the data that may invalidate statistical
theories
Presentation of Data
PRESENTATION OF DATA
 Tabular Presentation of Data.
 Graphical or diagrammatic presentation of

Data
 Summary indices (next lecture)
TABULAR PRESENTATION.
 Done in form of frequency tables.
 Can be for both quantitative and qualitative

data.
 Definitions for Frequency Table

CLASS- one of the groups into which


data can be classified.
CLASS FREQUENCY (CF)- is
the number of observations (NOB) in
the data set falling in a particular class.
CLASS
RELATIVE FREQUENCY- CF divided by the
total NOB in the data set.
Example of a Frequency Table

CLASS FREQUENCY RELATIVE


FREQUENCY
Level of Number
Education
None 254 0.34
Primary 201 0.27
Secondary 119 0.16

Post secondary 97 0.13

Others 75 0.10
Total 746 1.00 61
CLASS BOUNDARIES& INTERVALS
 A class or group boundary lies midway between the data values.
For example,
 For data in the class or group labelled:
 7.1 – 7.3

(a) The class values 7. 1 and 7.3 are the lower and upper limits of the
class and their difference gives the class width.
(b)The class boundaries are 0.05 below the lower class limit and 0.05
above the upper class limit (because the figures are in 1Decimal
place)
(c) The class interval/ width is the difference between the upper and
lower class boundaries.
(d)Question- What are the class boundaries if the figures are between 7
and 8?
Graphical or diagrammatic
presentation of Data
 Diagrams give a very clear & understandable
picture of data
 Comparison can be made between different

samples very easily.


 Diagrams have impressive value also.
 Can also be used for numerical type of statistical

analysis, e.g. to locate Mean, Mode, Median etc.


 Saves time and energy and is also economical.
 This data is easily remembered.
 Data can be condensed with diagrams.
Guidelines for Diagrammatic
Presentation
 The subject matter must be made clear
under a broad heading.
 The size of the scale should neither be too

big nor too small.


 Diagrams should be absolutely neat and

clean.
 Simplicity- diagram should convey meaning

clearly& easily.
 Scale must be presented along with the

diagram.
 Vertical diagram should be preferred to

Horizontal diagrams.
TYPES OF GRAPHICAL
PRESENTATION
 Dot Plot
 Stem and Leaf Display
 Line graphs
 Bar chart – simple, multiple, component.
 Pie chart.
 Histogram
 Frequency Polygon (Ogives).
 Scatter diagrams
Graphical Presentation cont ....
 Dot Plot and Stem and leaf display to
be demonstrated in class on a white
board.
Bar Charts
A bar chart is made up of
columns plotted on a graph.
 The columns are positioned

over a label that represents a


qualitative (categorical)
variable.
 The height of the column

indicates the size of the group


defined by the column label.
Weight Category of all respondents
100

90
89.4%

80

70

60

Weight Category
50

40

30

20

10 7.7%
2.9%
0
Normal Weight Overweight Obese
Figure 3: Weight category of Osun State in-school
adolescents based on location (Multiple Bar Chart).

100
93.1
90

80
74.0
70

60
53.5

% 50 46.5 Urban
Rural
40

30 26.0

20

10 6.9

0
Normal weight Overweight Obese

Weight Category
Figure 1: Proportion of urban & rural school-
children who experienced bullying in a 1-year
period, Osun State, Nigeria, 2009 (Component
Bar chart).
100
10.8 16.2

80
% of students

60

89.2 83.8
40

20

0
Urban Rural

Bullying No bullying
Source: Omisore et al., 2010
PIE CHART
 Similar to Bar chart but is in
a circle.
 Each category is given a

proportionate portion of the


chart based on the angle
occupied from the total 3600
of the circle.
Weight Category of all respondents
2.9%

7.7%

Normal Weight
Overweight
Obese

89.4%
Histograms
 Like a bar chart, a histogram is made up of
columns plotted on a graph. There is no
space between adjacent columns.
 The columns are positioned over a label

that represents a quantitative variable.


 The column label can be a single value or a

range of values.
 The height of the column indicates the size

of the group defined by the column label.


HISTOGRAM
HISTOGRAM WITH NORMAL
DISTRIBUTION
Frequency Polygon
Frequency Polygon
Scatter Plot
 Used to show the relationship between two
objects- used for quantitative variables
 The magnitude of the relationship is

indicated by how closely the dots


approximate to a straight line
 It can identify non-linear relationships.
 And also detect outliers and skewed

distributions
Scatter Plot- Positive perfect
correlation.

r = +1.00
Scatter Plot- negative
correlation

r = -0.54
Scatter Plot- no correlation
PRACTICAL SESSIONS
CONCLUSION
 We live in a data world- everything we hear,
see or do is often based on data (collected
information).
 It’s vital that we learn the rudiments of

data collection, organization and


presentation from now.
 The process of learning is the process of

doing- there is nothing like an hands on


experience.
 Whatsoever you have learnt- go and put it

into practice.
THANK YOU

You might also like