0% found this document useful (0 votes)
3 views

Module 1

The document provides an overview of the fundamentals of statistics, particularly in the context of biostatistics, covering definitions, types of data, variables, and descriptive statistics. It discusses the importance of statistical methods in the medical field, including data collection techniques and classification of data. Additionally, it outlines the significance of tabulation and frequency distribution in organizing and interpreting statistical data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module 1

The document provides an overview of the fundamentals of statistics, particularly in the context of biostatistics, covering definitions, types of data, variables, and descriptive statistics. It discusses the importance of statistical methods in the medical field, including data collection techniques and classification of data. Additionally, it outlines the significance of tabulation and frequency distribution in organizing and interpreting statistical data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Biostatistics

Deepu Tiwari
Faculty of Biostatistics
Jammu Institute of Ayurveda and Research,
Jammu
Module 1 : Fundamentals of Statistics

1. Definition of Statistics: Fundamentals of Statistics and


its applications to the biomedical field (Biostatistics),
Use and misuse of Statistics.
2. Data – Definition, Types, Classification and presentation
3. Variables- Definition, Types.
4. Descriptive Statistics - Measures of Central tendency –
Mean, Median, Mode, Percentile.
5. Measures of Dispersion- Range, Quartile deviation,
Mean deviation, and Standard deviation and Co-efficient
of variation.

 Maximum Marks of assessment of modules (Formative


assessment) : 25 Marks
 Module Marks for Summative Assessment (University
Examination) : 10 Marks
Overview

 Scope of Statistics and Types of


Data Collection of Data.
 Classification and Tabulation of
Data.
 Diagrammatic and Graphical
Representation of Statistical Data.
 Descriptive Statistics.
 Measures of Dispersion.
Definition of Statistics

Statistics is concerned with scientific method for


Collecting, Organizing, Summarizing, Presenting,
Analyzing and Interpreting of data.

Statistics is used in two different forms - singular


and plural.
In plural form - the numerical figures obtained by
measurement or counting in a systematic manner with a
definite purpose
such as number of accidents in a busy road of a city in a
day, number of people died due to a chronic disease
during a month in a state and so on.
In its singular form - Statistical theories and methods of
collecting, presenting, analyzing and interpreting
numerical figures.
Functions of
Statistics
The functions of statistics can be elegantly expressed as
7 - C’s as :
Scope and Applications

There are two major divisions of statistical


methods called Descriptive Statistics and
Inferential Statistics.
The descriptive statistics is used to consolidate a large
amount of information.
For E.g. Averages, Variance, etc. are descriptive
statistics.

Inferential statistics, on the other hand, are used when


we want to draw meaningful conclusions based on
sample data drawn from a large population.
For E.g. One might want to test whether a recently
developed drug is more efficient than the
conventional drug. Here, we test the efficiency of a
drug only through a sample.
Statistics and Medicine

In medical field, statistical methods are


extensively used. If we look at the medical
journals one can understand to what extent
the statistical techniques play a key role.
Medical Statistics deals with the applications
of statistical methods like Tests of
Significance and Confidence Intervals to
medicine and health science including
Epidemiology, Public Health. Modern
statistical methods helps the medical
practitioners to understand how long a
patient affected by a dreaded disease will
survive and what are the factors that
Variables and Types of Data

Data: Information, especially facts or numbers


collected for decision making is called data. Data
may be numerical or categorical. Data may also be
generated through a variable.
Variable: A variable is an entity that varies from a
place to place, a person to person, a trial to trial
and so on.
For E.g. Height, Domicile etc.
Quantitative Data: Quantitative data (variable)
are measurements that are collected or recorded
as a number. For E.g. Number of births-deaths,
density, BMI etc.
Qualitative Data: Qualitative data are
measurements that cannot be measured on a
natural numerical scale.
E.g. The blood types , Eye Colour etc.
Nominal Scale: Distinct & Unordered
Categories. Any statistical analysis carried out
with the ordering or with arithmetic operations
is meaningless.
For E.g. Gender, Profession etc.
Ordinal Scale: Distinct & Ordered Categories.
For E.g. A Doctor can say the condition of a patient
in the hospital as good, fair, serious and critical and
assign numbers 1 for good, 2 for fair, 3 for serious
and 4 for critical. The value here just indicates the
level of seriousness of the patient only.
Interval Scale: In an interval scale one can also
carryout numerical differences but not the
multiplication and division. An interval variable has
the numerical distances between any two numbers.
For E.g. Temperature Scales.
Ratio
For Scale:
E.g. Continuous
Height, Weight & measurable values.
etc.
Most statistical analysis are performed on
ratio scales.
Collection of Data

‘Statistics is the science of learning from experience’


- Bradley Efron
Categories and Source of Data
There are two categories of data namely primary
data and secondary data.
Primary data are that information which is collected
for the first time, from a Survey,or an observational
study or through experimentation.
Secondary data are that extracted from primary
data.
Methods of collecting Primary data
(i) Direct Method
(ii) Indirect Method
(iii) Questionnaire Method
(iv) Local Correspondents Method
(v) Enumeration Method
Direct Methods:
There are four methods under the direct methods
(a) Personal Contact Method
(b) Telephone Interviewing
(c) Computer Assisted Telephone Interviewing (CATI)
(d) Computer Administered Telephone Survey

Indirect Methods:
The indirect method is used in cases where it is delicate
or difficult to get the information from the respondents
due to unwillingness or indifference. The information
about the respondent is collected by interviewing the
third party who knows the respondent well.
For E.g. information on addiction, marriage proposal,
economic status etc.

Questionnaire Methods:
A questionnaire contains a sequence of questions
relevant to the study arranged in a logical order.
The general guidelines for a good questionnaire
(1) The wording must be clear and relevant to the study.
(2)Ability of the respondents to answer the
questions to be considered.
(3)Ask only the necessary questions so that the
questionnaire may not be lengthy.
(4) Arrange the questions in a logical order.
(5)Questions which hurt the feelings of the
respondents should be avoided.
(6) Calculations are to be avoided.
(7)It must be accompanied by the covering letter
stating the purpose of the survey and guaranteeing the
confidentiality of the information provided.
Local Correspondents Method:
In this method, the investigator appoints local agents or
correspondents in different places. They collect the
information on behalf of the investigator in their locality
and transmit the data to the investigator. This method
is adopted by newspapers and government agencies.
am Manohar Lohia Institute of Medical Sciences,
Enumeration method:
In this method, the trained enumerators or
interviewers take the schedules themselves, contact
the informants, get replies and fill them in their own
hand writing. The voters’ list preparation, information
on ration card for public distribution in India, etc.,
follow this method of data collection.

Secondary Data:
Secondary data is collected and processed by some other
agency but the investigator uses it for his study. They
can be obtained from published sources such as
government reports, documents, newspapers, books
written by economists or from any other source. Before
using the secondary data scrutiny must be done to
assess the suitability, reliability, adequacy, and accuracy
of the data.
Sources of Secondary Data:
The secondary data comes from two main
The published sources include:
(a) Government Publications
(b) International Publications
(c) Publications of Research
institutes
(d) Journals or Magazines or
Newspapers
Classification of Data
‘Knowledge is power, and data is just data. No matter
how much data you have on hand, if you don’t have a
way to make sense of it, you really have nothing at all’ -
Unknown
Classification of Data:
The data that are unorganized or have not been
arranged in any way are called raw data.
Raw data must be presented in a condensed form and
must be classified according to homogeneity for the
purpose of analysis and interpretation.
An arrangement of raw data in an order of
magnitude or in a sequence is called array.
Specifically, an arrangement of observations in an
ascending or a descending order of magnitude is said to
be an ordered array.
Thus, Classification is the process of arranging the
Types of Classification:
The raw data can be classified in various ways
depending on the nature of data.
The general types of classification are:
(i) Classification by Time or Chronological
Classification
(ii) Classification by Space or Spatial Classification
(iii) Classification by Attribute or Qualitative
Classification and
(iv) Classification by Size or Quantitative
Classification.
Rules for Classification:
There are certain rules to be followed for classifying the
data (i)The classes must be exhaustive, i.e., it should be
possible to include each of the data points in one or the
other group or class.
(ii)The classes must be mutually exclusive, i.e.,
there should not be any overlapping.
(iii)It must be ensured that number of classes
should be neither too large or nor too small. (no. of
Tabulation of Data
A logical step after classifying the statistical data is to
present them in the form of tables. A table is a
systematic organization of statistical data in rows and
columns.
Advantages of Tabulation:
(i) is a logical step of presenting statistical data after
classification.
(ii)enables the reader to understand the required
information with ease as the information is contained in
rows and columns with figures.
(iii)enables the investigator to present the data
in a brief or condensed and compact form.
(iv)Comparison is made simple by displaying data to
be compared in a single table.
(v)easy to remember the data points if they are
properly placed in the form of table.
(vi)facilitates easy computation and helps easy
detection of errors and omissions.
am Manohar Lohia Institute of Medical Sciences,
Types of
Table
Statistical tables can be classified under two general
categories, namely, general tables & summary tables.
General tables contain a collection of detailed
information including all that is relevant to the
subject or theme.
Summary tables are designed to serve some specific
purposes. They are smaller in size than general tables,
emphasize on some aspect of data and are generally
incorporated within the text. The summary tables are
also called derivative tables or interpretative tables
because they are derived from the general tables & aims
at analysis and inference.
The statistical tables may further be classified into
two broad classes namely simple tables & complex
tables.
A simple table summarizes information on a single
characteristic and is also called a univariate table.
E.g. The marks secured by a batch of students in a
class test are displayed as

A complex table summarizes the complicated


information and presents them into two or more
interrelated categories.
Components of a Table:
Generally, a table should be comprised of the following
components
(i) Table number and title
(ii) Stub (the headings of rows)
(iii) Caption (the headings of columns)
(iv) Body of the table
(v) Foot notes
(vi) Sources of data
General Precautions for Tabulation
The following points may be considered while
constructing statistical tables:
(i) A table must be as precise as possible and
easy to understand.
(ii)free from ambiguity so that main characteristics
from the data can be easily brought out.
(iii)Presenting a mass of data in a single table should
be avoided. Displaying the data in a single table would
increase the chances for occurrence of mistakes and
would make the table unwieldy. Such data may be
presented in more than one table such that each table
should be complete and should serve the purpose.
(iv)Figures presented in columns for comparison must
be placed as near to each other as possible.
Percentages, totals and averages must be kept close to
each other. Totals to be compared may be given in bold
type wherever necessary.
(v)Each table should have an appropriate short
and self- explanatory title indicating what exactly
(viii)The explanatory notes should always be given
as footnotes and must be complete in order to
understand them at a later stage.
(ix)The column or row heads should indicate the
units of measurements such as monetary units like
Rupees, and other units such as meters, etc. wherever
necessary.
(x)Column heading may be numbered for comparison
purposes. Items may be arranged either in the order of
their magnitude or in alphabetical, geographical, and
chronological or in any other suitable arrangement for
meaningful presentation.
(xi)Figures as accurate as possible are to be entered
in a table. If the figures are approximate, the same may
be properly indicated.
Frequency
Distribution

A tabular arrangement of raw data by a certain number


of classes and the number of items (called frequency)
belonging to each class is termed as a frequency
distribution.
The frequency distributions are of two types, namely,
(i) Discrete frequency distribution
(ii) Continuous frequency distribution.

Discrete Frequency Distribution:


Raw data sometimes may contain a limited number of
values and each of them appeared many numbers of
times. Such data may be organized in a tabular form
termed as a simple frequency distribution. Thus the
tabular arrangement of the data values along with the
frequencies is a simple frequency distribution.
A simple frequency distribution is formed using a tool
called ‘tally chart’.
A tally chart is constructed using the following method:
(i) Examine each data value.
(ii)Record the occurrence of the value with the slash
symbol (/), called tally bar or tally mark.
(iii)If the tally marks are more than four, put a
crossbar on the four tally bar and make this as block
of 5 tally bars (////).
(iv)Find the frequency of the data value as the
total number of tally bars i.e., tally marks
corresponding to that value.
E.g. The marks obtained by 25 students in a test are
given as follows: 10, 20, 20, 30, 40, 25, 25, 30, 40, 20,
25, 25, 50, 15, 25,
30, 40, 50, 40, 50, 30, 25, 25, 15 and 40.
The following discrete frequency distribution represents
the given data:

am Manohar Lohia Institute of Medical Sciences,


Continuous Frequency Distribution:
A large mass of data that is summarized in such a way
that the data values are distributed into groups, or
classes, or categories along with the frequencies is
known as a Continuous or Grouped frequency distribution.
Some terminologies related to a frequency distribution
are
Class: If the observations of a data set are divided into
groups and the groups are bounded by limits, then each
group is called a class. Class limits: The end values of a
class are called class limits. The smaller value of the
class limits is called lower limit (L) and the larger value is
called the upper limit(U).
Class interval: The difference between the upper limit
and the lower limit is called class interval (I). That is, I
= U – L.
Class boundaries: Class boundaries are the midpoints
between the upper limit of a class and the lower limit of
its succeeding class in the sequence. Therefore, each
class has an upper and lower boundaries.
Width: Width of a particular class is the difference
between the upper class boundary and lower class
boundary.
Mid- point: Half of the difference between the
In Example the interval 0 - 4 is a class interval with 0 as
the lower limit nd 4 as the upper limit. The upper
boundary of this class is obtained as midpoint of the
upper limit of this class and lower limit of its succeeding
class. Thus the upper boundary of the class 0 - 4 is 4.5.
The lower class boundary of this is 0 - 0.5 which is - 0.5.
The lower boundary of the class 5 - 9 is clearly 4.5.
Inclusive and Exclusive Methods of Forming
Frequency Distribution:
Formation of frequency distribution is usually done by
two different methods, namely
(i) Inclusive method &
(ii) Exclusive method.
Inclusive method: In this method, both the lower and
upper class limits are included in the classes. Inclusive
type of classification may be used for a grouped
frequency distribution for discrete variable like members
in a family, number of workers etc.,
It cannot be used in the case of continuous variable
like height, weight etc., where integral as well as
fractional values are permissible. Since both upper
limit and lower limit of classes are included for
frequency calculation, this method is called inclusive
method.
Exclusive method : In this method, the values which are
equal to upper limit of a class are not included in that
class and instead they would be included in the next
class. The upper limit is not at all taken into
consideration or in other words it is always excluded from
the consideration. Hence this method is called exclusive
method .
E.g. The marks scored by 50 students in an examination
are given as follows: 23, 25, 36, 39, 37, 41, 42, 22, 26,
35, 34, 30, 29, 27,
47, 40, 31, 32, 43, 45, 34, 46, 23, 24, 27, 36, 41, 43,
39, 38, 28,
32, 42, 33, 46, 23, 34, 41, 40, 30, 45, 42, 39, 37, 38,
42, 44, 46,
29, 37.
It can be observed from this data set that the marks of
50 students vary from 22 to 47. If it is decided to divide
this group into 6 smaller groups, we can have the
boundary lines fixed as 25, 30, 35, 40, 45 and 50 marks.
Then, we form the six groups with the boundaries as 21 -
25, 26 - 30, 31 - 35, 36 - 40, 41 – 45 and 46 -
50.
The continuous frequency distribution formed by
inclusive and exclusive methods are displayed in
True Class Intervals:
In the case of continuous variables, we take the classes
in such a way that there is no gap between successive
classes. The classes are defined in such a way that the
upper limit of each class is equal to lower limit of the
succeeding class. Such classes are known as true
classes.
The inclusive method of forming class intervals are also
known as not true classes. We can convert the not-true
classes into
true-classes by subtracting 0.5 from the lower limit of the
class and adding 0.5 to the upper limit of each class like
19.5 - 25.5, 25.5 - 30.5, 30.5 – 35.5, 35.5 – 40.5, 40.5 -
45.5, 45.5 – 50.5.
Open End Classes: When a class limit is missing either
at the lower end of the first class interval or at the
upper end of the last classes or when the limits are not
specified at both the ends, the frequency distribution is
Guidelines on Compilation of Continuous
Frequency Distribution:
(i)The values given in the data set must be
contained within one (and only one) class and
overlapping classes must not occur.
(ii) The classes must be arranged in the order of their
magnitude.
(iii)Normally a frequency distribution may have 8 to
10 classes. It is not desirable to have less than 5 and
more than 15 classes.
(iv)Frequency distributions having equal class widths
throughout are preferable. When this is not possible,
classes with smaller or larger widths can be used. Open
ended classes are acceptable but only in the first and
the last classes of the distribution.
(v)It should be noted that in a frequency
distribution, the first class should contain the lowest
value and the last class should contain the highest
value.
(vi)The number of classes may be determined
Bivariate Frequency Distribution:
The frequency distribution of a single variable is called
univariate distribution. When a data set consists of a
large mass of observations, they may be summarized by
using a two-way table. A two-way table is associated with
two variables, say X and Y. For each variable, a number of
classes can be defined keeping in view the same
considerations as in the univariate case. When there are
m classes for X and n classes for Y, there will be m n
cells in the two-way table. The classes of one variable
may be arranged horizontally, and the classes of another
variable may be arranged vertically in the two way table.
By going through the pairs of values of X and Y, we can
find the frequency for each cell. The whole set of cell
frequencies will then define a bivariate frequency
distribution.
In other words, a bivariate frequency distribution is the
frequency distribution of two variables.
E.g. Table shows the frequency distribution of two
variables, namely, age and marks obtained by 50
students in an intelligent test. Classes defined for
marks are arranged horizontally (rows) and the classes
defined for age are arranged vertically (columns). Each
cell shows the frequency of the corresponding row and
column values.
Diagrammatic and Graphical Representation
of Data
Diagrams: A diagram is a visual form for presenting
statistical data for highlighting the basic facts and
relationship which are inherent in the data.It attracts the
attention and it is a quicker way of grasping the results
saving the time. It is very much required, particularly, in
presenting qualitative data.

Graphs: The quantitative data is represented by graphs.


Significance of Diagrams and Graphs:
Diagrams and graphs are extremely useful due to the
following reasons:
(i) They are attractive and impressive.
(ii) They make data more simple and intelligible.
(iii) They are amenable for comparison.
(iv) They save time and labour and
(v) They have great memorizing effect.
Types of Diagrams:
There are varieties of diagrams used to present the
data.Some are:
Simple Bar Diagram:
Simple bar diagram can be drawn either on horizontal or
vertical base. But, bars on vertical base are more
common. Bars are erected along the axis with uniform
width and space between the bars must be equal. While
constructing a simple bar diagram, the scale is
determined as proportional to the highest value of the
variable. The bars can be coloured to make the diagram
attractive. This diagram is mostly drawn for categorical
variable.
Multiple Bar Diagram: Multiple bar diagram is used for
comparing two or more sets of statistical data. Bars with
equal width are placed adjacently for each cluster of
values of the variable. There should be equal space
between clusters. In order to distinguish bars in each
cluster, they may be either differently coloured or
shaded. Legends should be provided.
Component Bar Diagram(Sub-divided Bar Diagram):
A component bar diagram is used for comparing two or
more sets of statistical data. But, unlike multiple bar
diagram, the bars are stacked in component bar
diagrams. In the construction of
sub-divided bar diagram, bars are drawn with equal
width such that the heights of the bars are proportional
to the magnitude of the total frequency. The bars are
positioned with equal space.
Each bar is sub-divided into various parts in proportion
to the values of the components. The subdivisions are
distinguished by different colours or shades.
Pie Diagram: The Pie diagram is a circular diagram. As
the diagram looks like a pie, it is given this name. A
circle which has 360 is divided into different sectors.
Angles of the sectors, subtending at the center, are
proportional to the magnitudes of the frequency of the
components.
Types of Graphs: The most commonly used graphs are:
(i) Histogram
(ii) Frequency Polygon
(iii) Frequency Curve
(iv) Cumulative Frequency Curves (Ogives)
Histogram
A histogram is an attached bar chart or graph displaying
the distribution of a frequency distribution in visual form.
Take classes along the X-axis and the frequencies along
the Y-axis.
Corresponding to each class interval, a vertical bar is
drawn whose height is proportional to the class
frequency.
Frequency Polygon: Frequency polygon is drawn after
drawing histogram for a given frequency distribution.
The area covered under the polygon is equal to the
area of the histogram. Vertices of the polygon represent
the class frequencies. Frequency polygon helps to
determine the classes with higher frequencies. It
displays the tendency of the data.
Frequency Curve: Frequency curve is a smooth and free-
hand curve drawn to represent a frequency distribution.
Frequency curve is drawn by smoothing the vertices of
the frequency polygon.
Frequency curve provides better understanding about
the properties of the data than frequency polygon
and histogram.
Thanks
Everyone !

You might also like