0% found this document useful (0 votes)
28 views60 pages

Data and Its Presentation

Uploaded by

avirajput1230987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views60 pages

Data and Its Presentation

Uploaded by

avirajput1230987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Overview and scope of

Biostatistics in Medical Research,


Type of data and their properties.

Dr. Jai Kishun


Assistant Professor
Department of BHI,
SGPGIMS, Lucknow
Email: [email protected]
What is Statistics?
• Statistics in singular noun “is science of figures”:
as plural, it means “figures” or numerical data or
information.

• Statistics is a curious combination of


mathematics, logic and judgement. It is logical
process that cause more difficult than
mathematics- the principles of good design, and
the concepts underlying data analysis and
interpretation (Altman, 1991; page 9).
06/30/24 2
Conti..
• Statistics is the science that deals with
collection, classification and tabulation of
numerical facts as the basis for explanation,
description and comparison of phenomenon
( Lovitt).
• Statistics is the science that deals with the
collection, analysis and interpretation of
numerical data (Croxton & Cowden).

06/30/24 3
Medical Statistics
•Deals with application of statistical methods to the
study of disease, disability, efficacy of vaccine, a new
regime etc.
Health Statistics
•Deals with application of statistical methods of
varied information related to public health
importance.
Vital Statistics
•It is the ongoing collection of government agencies
of data relating to vital events such as births, deaths,
marriages, divorces, health and disease related
states and events which are deemed reportable by
local health authorities.
06/30/24 4
What is Biostatistics?
 Statistical methods that are used in biology, medicine
and health sciences.
 Methods we mean, procedures applied to solve
problems & answer questions arising in human
biology and medicine.
 Biostatistics expands statistical theory and adapts to
bring specific importance to the community of
scientists, practitioners and policy makers having
interest in health and other aspects of human
community.
 Part of biostatistics which defined as methods of
collection, compilation, presentation, analysis and
logical interpretation of biological data affected by
multiplicity of factors.
06/30/24 5
The principal application of statistics in public
health are:
1.Population estimation and forecasting
2.Surveys of population characteristics, health
needs and problems
3.Analysis of health trends
4.Epidemiological research
5.Programme evaluation
6.Programme planning
7.Budget preparation and justification
8.Operational and administrative decision making
9.Health education
06/30/24 6
VARIABLE
• Variable: A variable is an attribute that describes a person, place, thing,
or idea. The value of the variable can "vary" from one entity to another.
-- E.g. student grades, weight of a potato, # heads in 10 flips of a coin, etc.
• Typically denoted with a capital letter: X, Y, Z…
• Data are the observed values of a variable.
- E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
• There are two types of Data:
• (1) Qualitative/ Categorical variables (2) Quantitative/
Numerical
• Qualitative. Qualitative variables take on values that are names or
labels.
- For examples, the color of a ball (e.g., red, green, blue) or the breed of
a dog (e.g., collie, shepherd, terrier) would be examples of qualitative
or categorical variables.
06/30/24 7
Cont…..
• Quantitative. Quantitative variables are numeric. They represent a
measurable quantity.
- For example, when we speak of the population of a city, we are
talking about the number of people in the city - a measurable attribute
of the city. Therefore, population would be a quantitative variable.
- Arithmetic operations can be performed on Quantitative Data, thus its
meaningful to talk about 2*Height, or Price + $1, and so on.

- TWO TYPES OF QUANTITATIVE DATA:


1. Discrete: A variable which can take up only exact values and not
any fractional values, is called a ‘discrete’ variable. Ex. Number of
workmen in a hospital, members of a family, students in a class,
number of births in a certain year, number of telephone calls in a
month, etc., are examples of discrete-variable.

06/30/24 8
2. Continuous: If a variable can take on any value between its
minimum value and its maximum value, it is called a
continuous variable. Height, weight, rainfall, time,
temperature, etc., are examples of continuous variables.
Example:

Discrete Series Continuous Series

Marks No. of Students Height [inch] No. of students

40 12 54-60 15

50 15 60-66 14

60 16 66-72 9

70 7 72-78 12
06/30/24 9
Sources of Data
• Census of India
• Vital Registration System/Civil Registration System
• Sample Registration System SRS
• Official Statistics : NSSO
• Central Bureau of Health Intelligence: CBHI
• Ministry of Health, Govt. of India
• Health Surveys
• Reports of International Agencies
• Hospital Records
• Disease Registries and Database
06/30/24 10
Measurement Scales of Variables
• There are four measurement scales:
1. Categorical/Nominal Scale of Measurement
• The categorical/nominal scale of measurement only satisfies the
identity property of measurement. Values assigned to variables
represent a descriptive category, but have no inherent numerical
value with respect to magnitude.
Eg.: Gender is a variable that is measured on a nominal scale.
Individuals may be classified as "male" or "female", but neither
value represents more or less "gender" than the other. Religion
and political affiliation are other examples of variables that are
normally measured on a nominal scale.
Note: Arithmetic operations don’t make any sense (e.g. does
Widowed ÷ 2 = Married?!)

06/30/24 11
Conti….
2. Ordinal Scale of Measurement
• The ordinal scale has the property of both identity and
magnitude. Each value on the ordinal scale has a unique
meaning, and it has an ordered relationship to every other value
on the scale.
Exp: Suppose you have a variable, economic status, with three
categories (low, medium and high). In addition to being able to
classify people into these three categories, you can order the
categories as low, medium and high.
• The spacing between the values may not be the same across the
levels of the variables.

06/30/24 12
Cont..
3. Interval Scale of Measurement: The interval scale of
measurement has the properties of identity, magnitude, and equal
intervals. It has no natural “0”.
• A perfect example of an interval scale is the Fahrenheit scale to
measure temperature. The scale is made up of equal temperature
units, so that the difference between 40 and 50 degrees
Fahrenheit is equal to the difference between 50 and 60 degrees
Fahrenheit.
• With an interval scale, you know not only whether different
values are bigger or smaller, you also know how much bigger or
smaller they are.
- For example, suppose it is 60 degrees Fahrenheit on Monday and
70 degrees on Tuesday. You know not only that it was hotter on
Tuesday, you also know that it was 10 degrees hotter.
06/30/24 13
Conti….
Ratio Scale of Measurement
• The ratio scale of measurement satisfies all four of the
properties of measurement: identity, magnitude, equal
intervals, and a minimum value of zero.
• The weight of an object would be an example of a
ratio scale. Each value on the weight scale has a
unique meaning, weights can be rank ordered, units
along the weight scale are equal to one another, and
the scale has a minimum value of zero.
• Weight scales have a minimum value of zero because
objects at rest can be weightless, but they cannot have
negative weight.
06/30/24 14
Organizing and Displaying Data
Presentation of data

Tabular Graphical

Simple table complex table For quantitative data For qualitative data
1. Histogram 1. Bar chart
2. Frequency polygon 2. Pictogram
3. Frequency curve 3. Pie chart
4. Line chart 4. Map diagram
5. Normal distribution curve
6. Cumulative distribution curve
7.Scatter diagram
Data Tabulation
• Tabulation: Tabulation is the systematic arrangement of the
statistical data in columns or rows. It involves the orderly and
systematic presentation of numerical data in a form designed to
explain the problem under consideration.
• Advantages of tabulation:
1. It is concise
2. There is no repetition of explanatory matter
3. Comparisons can be made easily
4. The important features can be highlighted
5. Errors in the data can be detected.
• Simple Tabulation: The tabulating of results of only one
variable informs you how often each response was given.
Frequency Tables
• Frequency: If the value of a variable, e.g., height, weight, etc.
(continuous), number of students in a class (discrete) etc.,
occurs twice or more in a given series of observations, then the
number of occurrence of the value is termed as the
“frequency” of that value.
• Frequency Distribution: The way of tabulating a pool of
data of a variable and their respective frequencies side by side
is called a ‘frequency distribution’ of those data
• Cumulative frequency distribution: A frequency
distribution becomes cumulative when the frequency of each
class interval is cumulative. Cumulative frequency of a class
interval can be obtained by adding the frequency of that class-
interval to the sum of the frequencies of the preceding class-
intervals.

06/30/24 17
Cont…
There are two types of cumulative frequencies.

1. Less than (or, from below) cumulative frequency: In the less


than type the cumulative frequency of each class-interval is
obtained by adding the frequencies of the given class and all the
preceding classes, when the classes are arranged in the
ascending order of the value of the variable.

2. More than (or, from above) cumulative frequencies: In the more


than type the cumulative frequency of each class interval is obtained
by adding the frequencies of the succeeding classes. For grouped
frequency distribution, the cumulative frequencies are shown against
the class-boundary points.

18
06/30/24
Ex: Construct the cumulative frequency distribution
(both “less than” and “more than” types) from the
following data:

Weight (in 10-19 20-29 30-39 40-49 50-59 Total


kg)
No. of 6 10 25 15 12 68
students

06/30/24 19
Cont…
• Ex: Cumulative Frequency Distribution of Weights of 68 Patients

Class–boundary points Cumulative Frequency


(Weight in kg)
“Less than” “More than”

9.5 0 68
19.5 6 62
29.5 10 + 6 = 16 52
39.5 6 + 10 + 25 = 41 27
49.5 41 + 15 = 56 12
59.5 56 + 12 = 68 0
20
06/30/24
Proportions and Percentages
• Proportion (P): a relative frequency
obtained by dividing the frequency in
f
each category by the total number of
cases. P
• Percentage (%): a relative frequency
obtained by dividing the frequency in
N
each category by the total number of
cases and multiplying by 100.

• N: total number of cases (%)  P (100)


• Proportions and percentages are
relative frequencies
Simple Frequency Distribution Table:
Distribution of 50 patients at the surgical department of
Govt. hospital according to their ABO blood groups.

Blood Frequency Cumulative Relative %


group Frequency frequency
A 12 12 0.24 24
B 18 30 0.36 36
AB 5 35 0.10 10
O 15 50 0.30 30
Total 50 1.0 100
Hypothetical Data
Complex Frequency Distribution Table
Distribution of 20 lung cancer patients at the chest department of
Govt. hospital and 40 controls in May 2008 according to smoking.

Lung cancer
Total
Smoking Cases Control
No. % No. % No. %
Smoker 15 75% 8 20% 23 38.33
Non
smoker 5 25% 32 80% 37 61.67

Total 20 100 40 100 60 100


Hypothetical Data
Graphical Representation of data:
• Graphs are:
• Visual representation of data.
• Gives us the ability to speed up exploration.
• Locate important patterns in the data.
• Graphs are used in many aspects of our day to
day lives such as business presentations,
newspaper articles, etc.

06/30/24 24
Types of graphical representation
• Charts
– Bar
– Histograms
– Pie
– Scatter
– Line
– Flow
– Box plots

06/30/24 25
Bar Graphs
• It is also called a columnar diagram. The bar
diagrams are drawn through columns of equal width.
Following rules were observed while constructing a
bar diagram:
(a)The width of all the bars or columns is similar.
(b)All the bars should are placed on equal
intervals/distance.
(c)Bars are shaded with colours or patterns to make
them distinct and attractive.

06/30/24 26
Types of Bar graphs
• Four Types of Bar graphs are used to represent
different data sets:
• The simple bar diagram
• Compound bar diagram
• Multiple bar diagram
• Polybar diagram

06/30/24 27
The simple bar diagram
• A simple bar diagram is constructed for an
immediate comparison. It is advisable to
arrange the given data set in an ascending or
descending order and plot the data variables
accordingly. However, time series data are
represented according to the sequencing of the
time period.

06/30/24 28
The simple bar diagram

06/30/24 29
Compound bar diagram
• When different components are grouped in
one set of variable or different variables of one
component are put together, their
representation is made by a compound bar
diagram. In this method, different variables are
shown in a single bar with different rectangles.

06/30/24 30
Compound bar diagram

06/30/24 31
Multiple bar diagrams

• Multiple bar diagrams are constructed to represent


two or more than two variables for the purpose of
comparison.
• For example, a multiple bar diagram may be
constructed to show proportion of males and females
in the total, rural and urban population or the share of
canal, tube well and well irrigation in the total
irrigated area in different states.

06/30/24 32
Multiple bar diagrams

06/30/24 33
Polybar diagram
• The line and bar graphs as drawn separately
may also be combined to depict the data
related to some of the closely associated
characteristics such as the climatic data of
mean monthly temperatures and rainfall.

06/30/24 34
Polybar diagram

06/30/24 35
Histograms
• Histograms provide graphical representation
of data with bars whose heights indicate the
number of data in a certain range.
• A histogram is a bar graph that can be
constructed using a frequency table.
• Put the class boundaries on the horizontal axis
• The bars have the same width and always
touch and the edges of the bars are on class
boundaries.
• The height of the bar is the class frequency.
06/30/24 36
Histograms
 The Histogram shows mean
of the systolic blood pressure
and also the Standard Deviation
Histogram
along with total number of
18 observations.
16

14 Since most of the taller bars


12 lie to left, it implies that most of
Frequency

10
Frequency
the peoples have systolic blood
8
pressure between 99.5 to 141.5
6

2
This graph, shows the systolic
0
blood pressure of the peoples,
is skewed right.
Systolic Blood Pressure

06/30/24 37
Frequency Polygon
• A frequency polygon is a graph that displays
the data by using lines that connect points
plotted for frequencies at the midpoint of
classes. The frequencies represent the heights
of the midpoints.
Example of a Frequency Polygon
 Be used to graph only quantitative data
Joining the mid points of bars

Frequency Polygon

5
Frequency

2 5 8 11 14 17 20 23 26

Number of Cigarettes Smoked per Day


Cumulative frequency graph/Ogive

• A cumulative frequency graph or ogive is a


graph that represents the cumulative
frequencies for the classes in a frequency
distribution.
Example of an Ogive

20
Ogive
Cumulative Frequency

10

2 5 8 11 14 17 20 23 26

Number of Cigarettes Smoked per Day


Pie graphs
• Pie diagram is another graphical method of the
representation of data. It is drawn to depict the
total value of the given attribute using a circle.
Dividing the circle into corresponding degrees
of angle then represent the sub– sets of the
data. Hence, it is also called as Divided Circle
Diagram.

06/30/24 42
Pie graphs
 Seven slices of different color represents the 7 different country
We can see the percentages which the slices represent.
It is easy to identify in which of the categories the percentage of respondents
is higher and in which it is lower.

06/30/24 43
Scatter plot
• The scatter pot shows the relationship between
two factors of the experiment. It displays the
relationship between two data's.

06/30/24 44
Scatterplots
Which scatterplots below show a linear trend?
a) c) Negative e)
Correlation

Positive
Correlation

b) d) f)

Constant
Correlation
Line Graph
• The line graphs are usually drawn to represent
the time series data
Example: temperature, rainfall, population
growth, birth rates and the death rates.

06/30/24 46
Line Graph

06/30/24 47
Polygraph
• Polygraph is a line graph in which two or more than
two variables are shown on a same diagram by
different lines. It helps in comparing the data.
• Examples which can be shown as polygraph are:
– The birth rates, death rates and life expectancy in
one diagram.
– Sex ratio in different states or countries in one
diagram.

06/30/24 48
Polygraph

06/30/24 49
Flow Maps/Chart
• Flow chart is a combination of graph and map.
It is drawn to show the flow of commodities or
people between the places of origin and
destination. It is also called as Dynamic Map.
• Transport map, which shows number of
passengers, vehicles, etc., is the best example
of a flow chart.

06/30/24 50
Stem-and-Leaf Displays
• A stem-and-leaf display is a method of
exploratory data analysis that is used to rank-
order and arrange data into groups.
– Stem-and-leaf displays retains original data and
individual information
– Stem-and-leaf diagrams provide a very effective
way of ordering data by hand
– By looking at the display “sideways” you can see
the distribution shape of the data

06/30/24 51
How to Make a Stem-and-Leaf Display
1. Divide the digits of each data value into two parts. The leftmost
part is called the stem and the rightmost part is called the leaf.
– There are no firm rules for selecting the group of digits for the
stem. Whichever group you choose, you must list all the
possible stems from smallest to largest in the data collection.
2. Align all the stems in a vertical column from smallest to largest.
Draw a vertical line to the right of all the stems.
3. Place all the leaves with the same stem in the same row as the
stem, and arrange the leaves in increasing order.
4. Use a label to indicate the magnitude of the numbers in the
display. We use the decimal position in the label rather than with
the stems or leaves.
– Always include a scale/key!
06/30/24 52
Example : Stem-and-Leaf
Display
Marks of 40 students in mid semester
examination were recorded in a table.
Just how much marks are they get?

Stemplots include a key to help the user interpret the display correctly.
The key in the stemplot above indicates that a stem of 2 with a leaf of 7
represents a mark of 27.
Looking at the example above, you should be able to quickly describe
the distribution of marks. Most of the students scores are clustered
between 29 and 38, with the center falling in the neighborhood of 30. The
scores range from a low of 10 (two students have marks less than 10) to a
high of 50.

Marks obtained by students


Back-to-Back Stem Plot
• A back-to-back stem plot is used to compare
to sets of data.
– The stems are in the middle of the diagram with
the leaves extending out on both sides
The back to back stem and leaf plot below shows the exam
grades (out of 100) of two sections. The digit in the stem
represents the tens and the digit in the leaf represents the ones. So
for example 5 | 3 = 53 and so on.

06/30/24 55
Box and whisker plots
• A box and whisker plot is a way of summarising a set of data
measured on an interval scale.
• It is the graphical representation of the range.
• A box plot shows the most extreme values in the data set
(maximum and minimum values), the lower and upper
quartiles, and the median.
•Shows shape of the distribution, central value, and variability.
The line in the middle of the
Box plots q1
min
median
18.5
4
24
boxes represents the median,
meaning half the cases have
max 47
value greater than median or half
Chart 1: Age of respondents, years q3 36.5
of the values are lesser than
median.
Maximum
The upper whisker corresponds
to the maximum age of the
Q3
respondents (47 years)

Median
The lower whisker corresponds
to the minimum age of the
Q1
respondents (4 years)
The bottom of the boxes
Minimum indicate the 1st quartile where
25% of the cases have below it
and the top of the boxes indicates
the 3rd quartile where 25% of the
cases have values above it.
Following summary table were used to know the
appropriate graphical presentation.
Appropriate graphical
Type of Variable
presentation
Nominal/Categorical Bar Chart and Pie Chart
Ordinal Bar Chart and Histogram
Numerical Scale, Histogram, Frequency
Ungrouped Polygons, Box Plots, Time line
Histogram, Bar charts, Pie-
Numerical Scale, Chart, Box Plots (may be
Grouped misleading if the number of
categories is small)
06/30/24 58
Thank You

06/30/24 59
Steps in SPSS
• From the menus choose:
• Graphs
• Legacy Dialogs Bar...
• Click Clustered and then click Define
• Scroll down the source variable list and select
Category Axis variable.

06/30/24 60

You might also like