0% found this document useful (0 votes)
66 views61 pages

MD51 Lecture 1

To train practitioners of the biomedical sciences in the use and interpretation of statistical data analysis. Objectives explore and present data using tables, charts and graphs. To avoid misuse and abuse of statistics.

Uploaded by

Afiq Iqmal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views61 pages

MD51 Lecture 1

To train practitioners of the biomedical sciences in the use and interpretation of statistical data analysis. Objectives explore and present data using tables, charts and graphs. To avoid misuse and abuse of statistics.

Uploaded by

Afiq Iqmal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 61

MD 5108 Biostatistics for Basic Research

Lecturer: Dr K. Mukherjee
Office: S16-06-100 Tel: 874 2764 Email: [email protected]

To train practitioners of the biomedical sciences in the use and interpretation of statistical data analysis.

Objectives

explore and present data using tables, charts and graphs


ability to do simple statistical calculations with a calculator carry out data analysis using a statistical package such as SPSS pick the right procedure for analysing a set of data interpret results correctly and report findings avoid misuse and abuse of statistics understand statistical contents of papers in medical journals judge claims and statements critically discuss and communicate ideas in a quantitative manner

Teaching approach
nonmathematical introduction

explanation of concepts rather than proofs emphasis on methodology and procedures emphasise use of statistical package rather than manual calculation emphasis on choosing the right procedure emphasis on correct interpretation of results examples from clinical research literature

Topic 1: What is statistics?


A branch of mathematics dealing with the analysis and interpretation of masses of numerical data Merrian-Webster Dictionary The field of study that involves the collection and analysis of numerical facts or data of any kind Oxford Dictionary The study of how information should be employed to reflect on, and give guidance for action, in a practical situation involving uncertainty Vic Barnett

Biostatistics: Application of statistical methods to biological, medicine and health sciences

Why the need for Statistics in Biomedicine ?


Two main reasons: Variation
attributes differ not only among individuals but also within the same individual over time

Sampling
biomedical research projects mostly carried out on small numbers of study subjects challenging problem to project results from small samples studies to individuals at large

Biological Variation
Necessitates the use of statistical methods in biomedicine to put numerical data into a context by which we can better judge their meaning

From sample to population


Statistical methods used to produce statistical inferences about a population based on information from a sample derived from that population

Population
inductive statistical methods

sample

Altman (1991) Practical Statistics for Medical Research, Chapman and Hall.

Bailar & Mosteller (1986) Medical Uses of Statistics, NEJM Books.

Many studies have been done on misuse of statistics in medicine

From Altman (1991)

Schor and Karten (1966, J. Am. Med. Assoc.):


149 papers classed as analytical studies in 3 issues of 11 most frequently read medical journals assessment criteria:
Validity with respect to:
Design of experiment? Type of analysis performed? Applicability of statistical test used?

Findings of Schor and Karten:


28% of papers acceptable 68% deficient but acceptable if reviewed 4% unsalvageable

Lesson:
CARE

must be exercised when reading scientific papers in biomedical journals! Knowledge of basic biostatistics is required

There are three kinds of lies: lies, damned lies and statistics Benjamin Disraeli It is easy to lie with statistics, but it is easier to lie without them Frederick Mosteller Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H.G. Wells

Types of statistical methods


1. Descriptive statistical methods data collection and organization summarizing data and describing its characteristics presentation and publication 2. Exploratory data analysis play around and get a feel of the data preliminary analysis, often graphical looking for patterns and possible relationships are assumptions satisfied? which model and procedure to use?

3. Inductive (inferential) statistical methods


Statistical inferences about a population based on information from a sample derived from that population

Population
inductive statistical methods

estimation, confidence intervals hypothesis testing prediction, forecasting classification

sample

Topic 2: Types of data


Sources of data, the raw materials of statistics
Routinely kept records, e.g., hospital medical records Surveys Experiments Clinical trials Data base Published reports

Any characteristic that can be measured or classified into categories is called a variable

Types of variables
(1) Qualitative variables cannot be measured numerically categorical in nature, e.g., gender categories must not overlap and must cover all possibilities w Nominal variables (No inherent ordering of categories) M/F, Yes/No Blood group (A, B, AB, O) Ethnic group (Chinese, Malay, Indian, Others) w Ordinal variables (Categories are ordered in some sense) response to treatment: unimproved, improved, much improved pain severity: no pain, slight pain, moderate pain, severe pain

(2) Quantitative variables can be measured numerically, e.g., weight, height, concentration can be continuous or discrete w a continuous variable can take on any value (subject to precision of measuring instrument) within some range or interval, e.g., weight, height, blood pressure, cholesterol level w a discrete variable is usually a count of something and hence takes on integer values only, e.g., number of admissions to NUH Variable types and measurement types have implications on how data should be displayed or summarized determines the kind of statistical procedures that should be used

SUMMARY Variable

Types of variables

Qualitative or categorical

Quantitative measurement

Nominal (not ordered) e.g. ethnic group

Ordinal (ordered) e.g. response to treatment

Discrete (count data) e.g. number of admissions

Continuous (real-valued) e.g. height

Measurement scales

Topic 3: Presenting data graphically


Advantages of graphical data display

Let data speak for itself Get a good feel of the data before formal analysis Graphs and plots easier to understand and interpret Reveal patterns in data which may shed light on the appropriate model/analysis to use e.g., Skewed or symmetric distribution Multiple peaks / mode Are there any outliers ? Relatioship between variables.

Graphs for categorical data


Bar chart for world pharmaceutical spendings, 1997
35

% of world spendings

30 25 20 15 10 5 0
Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & China USA

Region

Pie chart for world pharmaceutical spendings, 1997


USA (34, 34.0%)

Canada

( 2, 2.0%)

SE Asia & Ch ( 7, 7.0%) Middle East ( 2, 2.0%)

Latin Americ ( 8, 8.0%) Australasia ( 1, 1.0%)

Japan

(16, 16.0%) Europe Af rica ( 1, 1.0%) (29, 29.0%)

Segmented bar chart for world pharmaceutical spending, 1997

100 90

% of world spending

80 70 60 50 40 30 20 10 0

Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & Chin USA

Bar chart for world pharmaceutical spendings, 1997

35 30

% of world spending

25 20 15 10 5 0
Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & China USA

Region

World pharmaceutical spending, 1997


100 Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & Chin USA

Sum of % of world spending

Canada

( 2, 2.0%)

USA

(34, 34.0%)

90 80 70 60 50 40 30 20 10 0

SE Asia & Ch ( 7, 7.0%) Middle East ( 2, 2.0%)

Latin Americ ( 8, 8.0%) Australasia ( 1, 1.0%)

Japan

(16, 16.0%) Europe Af rica ( 1, 1.0%) (29, 29.0%)

Comparison of methods

Bar charts can be read more accurately and offer better distinction between close together values Pie charts especially useful for showing percentage distribution Pie charts can display large and small % simultaneously without scale break A single bar chart is preferable to a single segmented bar chart A series of segmented bar charts is easier to read than a series of pie charts or ordinary bar charts

Bar chart for number of health professionals


6000

Number of workers

5000 4000 3000 2000 1000 0 Dentists Doctors Nurses Pharmacists

Profession

Variation of the basic bar chart


Stacked bar chart for number of health professionals
6000 5000 Private Public

Number of workers

4000 3000 2000 1000 0 Dentists Doctors Nurses Pharmacists

Profession

Clustered bar chart for number of health professionals


4000 Private Public

Number of workers

3000

2000

1000

0 Dentists Doctors Nurses Pharmacists

Profession

Segmented bar charts by profession


100 90 80 Private Public

Percent by sector

70 60 50 40 30 20 10 0 Dentists Doctors Nurses Pharmacists

Profession

Clustered bar chart for number of health professionals

4000

Private Public

Number of workers

3000

2000

1000

0 Dentists Doctors Nurses Pharmacists

Profession

Stacked bar chart for number of health professionals

Segmented bar charts by profession

6000 5000

Private Public

100 90 80

Private Public

Number of workers

4000 3000 2000 1000 0 Dentists Doctors Nurses Pharmacists

Percent by sector

70 60 50 40 30 20 10 0 Dentists Doctors Nurses Pharmacists

Profession

Profession

Plotting by sector rather than by profession Look at the data from a different angle Highlight different aspects of the data
Clustered bar charts of number of health professionals
4000

Number of workers

3000

Dentists Doctors Nurses Pharmacists

2000

1000

0 Private Public

Sector

Stacked bar charts by sector

6000 5000

Number of workers

Dentists Doctors Nurses Pharmacists

4000 3000 2000 1000 0 Private Public

Sector

Percentage bar charts by sector

100 90 80 70 60 50 40 30 20 10 0 Private Public

Percent within sector

Dentists Doctors Nurses Pharmacists

Sector

Segmented bar charts by sector

100 90 80 70 60 50 40 30 20 10 0 Private Public

Percent within sector

Dentists Doctors Nurses Pharmacists

Sector

Clustered bar chart of number of health professionals

Percentage bar charts by sector

4000

Number of workers

Percent within sector

3000

Dentists Doctors Nurses Pharmacists

100 90 80 70 60 50 40 30 20 10 0

Dentists Doctors Nurses Pharmacists

2000

1000

0 Private Public

Sector

Private

Public

Sector Segmented bar charts by sector

Stacked bar charts by sector

6000 5000

Percent within sector

Number of workers

Dentists Doctors Nurses Pharmacists

100 90 80 70 60 50 40 30 20 10 0

Dentists Doctors Nurses Pharmacists

4000 3000 2000 1000 0 Private Public

Private

Public

Sector

Sector

A back to back bar chart

Source: JAMA, 1978, vol 239, no 21

Comparison of methods Stacked bar chart is also a bar chart for the combined data Some of the bars in a stacked bar chart are not aligned Bars in clustered bar charts are aligned but it is harder to visualize how the component bars would stack up Back to back bar charts are applicable when there are 2 groups only, the aggregated bars are not aligned Series of stacked or segmented bar charts useful in showing time trend

Time Trend
Exaggerate visually the increase in # prescriptions written per person by starting at 8 rather than 0

Stacked bar chart of yearly mortality rate per 1000 births

Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.

Response under two treatments

Response to Treatment None Partial Complete Total

Treatment A 3 15 9 27 B 2 22 30 54

A misleading bar chart


30 A B

Frequency

20

10

0 None Partial Complete

Response to treatment

By design, there are twice as many patients receiving treatment B

Can compare the response type percentages for the two treatments Response to
Within treatment percentage
100 90 80 70 60 50 40 30 20 10 0 A B treatment None Partial Complete

Treatment

Stacked bar charts for percentage figures


100 90 80 70 60 50 40 30 20 10 0 A B Response to treatment None Partial Complete

Within treatment percentage

Treatment

Graphs for quantitative data Histogram Frequency polygon Box plot

Histogram Divide the range of the data into a suitably chosen number of intervals/bins, all of the same width The number of observations that fall within each interval is plotted
Relative frequency histogram Plot the proportions of observations that fall within the class intervals

Wild & Seber (2000) Chance Encounters, Wiley.

Histogram of End-Systolic Volume for 45 Male Heart Attack Patients


20

Frequency

10

0 40 60 80 100 120 140 160 180 200 220

Relative frequency polygon for SysVol


40

30

Percent

20

10

0 40 60 80 100 120 140 160 180 200 220

SysVol

Comparison of methods
Histogram good at revealing distributional shape such as symmetry, skewness, number of peaks etc difficult to superimpose or draw side by side Frequency polygons can be superimposed for easy comparison

Wild & Seber (2000, p.59)

Can be superimposed

Pagano & Gauvreau (1999)

Wild & Seber (2000)

Median and quartiles


Sort the data in increasing order

The median is the middle value (if n is odd) or the average of the two middle values (if n is even), it is a measure of the center of the data Quartiles: dividing the set of ordered values into 4 equal parts Q2 = second quartile = median
first 25% second 25% third 25% fourth 25%

Q1

Q2

Q3

IQR = Interquartile range = Q3 Q1

Box plot

Draw a box from the lower quartile to the upper quartile and a line to mark the position of the median Extend from both edges of the box by 1.5 IQR, pull back the lines until they hit observation Observations more than 1.5 IQR away from the lower or upper quartile are marked out as outside values for further investigation and checking

How a boxplot is constructed (Wild & Seber, 2000, p.73)

5-Number Summary: min, lower quartile, median, upper quartile, max

Dotplot for SysVol = End-systolic volume, a measure of the size of the heart

50

100

150

200

SysVol

Boxplot for SysVol

20

120

220

SysV ol

Advantages of box plot

quick visual summary of a data set capture prominent features like location, spread, skewness and outliers can easily draw a series of box plots side by side; not so for histograms

Brand name Type

Taste

$/oz
0.11 0.17 0.11 0.15 0.1 0.11 0.21 0.2 0.14 0.14 0.23 0.25 0.07 0.09 0.1 0.1 0.19 0.11 0.19 0.17 0.12 0.12 0.12 0.1 0.11 0.13 0.1 0.09 0.11 0.15 0.13 0.1 0.18 0.09 0.07 0.08 0.06 0.08 0.05 0.07 0.08 0.08 0.07 0.09 0.06 0.07

$/lbProt Cal
14.23 21.7 14.49 20.49 14.47 15.45 25.25 24.02 18.86 18.86 30.65 25.62 8.12 12.74 14.21 13.39 22.31 19.95 22.9 19.78 14.86 17.32 15.2 14.01 13.92 18.24 14.12 11.83 15.41 17.4 17.32 15.61 20.4 12.65 11.17 11.75 9.49 10.21 6.37 8.42 9.37 9 8.07 9.39 6.59 8.43 186 181 176 149 184 190 158 139 175 148 152 111 141 153 190 157 131 149 135 132 173 191 182 190 172 147 146 139 175 136 179 153 107 195 135 140 138 129 132 102 106 94 102 90 99 107

Sod
495 477 425 322 482 587 370 322 479 375 330 300 386 401 645 440 317 319 298 253 458 506 473 545 496 360 387 386 507 393 405 372 144 511 405 428 339 430 375 396 383 387 542 359 357 528

Prot/Fat
1 2 1 1 1 1 2 2 1 1 1 3 2 1 1 1 2 1 2 2 2 1 1 1 2 1 1 2 1 3 1 1 3 1 1 1 1 2 2 3 3 4 5 5 4 2

Happy Hill Supers Beef Bland Georgies Skinless Beef Beef Bland Special Market's Beef Premium Bland B Spike's Beef Beef Medium Hungry Hugh's Beef Jumbo Medium Beef Great Dinner Beef Beef Medium RJB Kosher Beef Beef Medium Wonder Kosher Beef Skinless Medium Bee Happy FatsBeef Jumbo Beef Medium Midwest Beef Beef Medium General Kosher Beef Beef Medium Wall's Kosher Beef Beef Lower Medium F Hickory Natural Beef Smoke Medium Smith BeefBeef Medium Premium Beef Beef Medium Family StoreSkinless Beef Beef Medium Sam's Kosher Beef Beef Medium Hammer Beef Beef Medium Athens Beef Beef Medium Regents Kosher Beef Beef Scrumpt. Really Big Meat Bland Biggest Jumbo Meat Bland Home Made Meat Bland Martha's Jumbo Meat Dinner Bland Hammer Premium Meat Bland Willie's Wieners Meat Bland Premium Hot Meat Dogs Medium Airport Wieners Meat Medium Judy's Favorite Meat Jumbos Medium Stick Lean Meat Supreme Jumbo Medium Stick Jumbo Meat Medium Fat Jack Jumbo Meat Medium Thin Jack Veal Meat Medium Top Grade Hot Meat Dogs Medium Blended w/Chicken&Beef Meat Scrumpt. Heaven Made Meat Scrumpt. Baked and Meat Smoked Scrumpt. Smart Person Poultry Chicken Bland Woods Park Poultry Chicken Medium Tony Turkey Poultry Medium Rose Garden Poultry Turkey Medium Low Fat Turkey Poultry Medium Special Market's Poultry Turkey Medium Caloryless Poultry Turkey Medium Heaven Made Poultry Lower Fat Medium McDowell'sPoultry Jumbo Chicken Medium

Dataset Hotdogs

Graphical Analysis of the Hotdogs data.

Parallel Box plots Can Be Quite Revealing


Rice (1995) Mathematical Statistics & Data Analysis, Duxbury Press.

1969

1972

Reduction in concentration through time Higher during winter months Skewed toward higher value Spread increases with level

(Parallel histograms much harder to visualise)

You might also like