0% found this document useful (0 votes)
52 views70 pages

6.descriptve PPHD

The document discusses descriptive statistics and related terminology. It covers topics like central tendency measures including mean, median and mode. It also discusses measures of variability and dispersion such as range, interquartile range, standard deviation and more. Various types of data and variables are also explained in the document.

Uploaded by

Sharad Khatake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views70 pages

6.descriptve PPHD

The document discusses descriptive statistics and related terminology. It covers topics like central tendency measures including mean, median and mode. It also discusses measures of variability and dispersion such as range, interquartile range, standard deviation and more. Various types of data and variables are also explained in the document.

Uploaded by

Sharad Khatake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Descriptive Statistics

- DrSwati Ghonge,MBBS,MD,Community Medicine


Professor,Dr D Y Patil Medical College,Pimpri
Descriptive Statistics
At the end of session ,participants should be able to understand-
1. Some Basic Terminology

2. Types of Data with examples

3. What is Descriptive Statistics vs Inferential Statistics

4. Measures of Central Tendency-Mean ,Median ,Mode

5. Measures of Variability -Range ,Interquartile Range,Average


Deviation, Standard Deviation,Varriance , Coefficient of
Variation,Percentile,Skewness.
Statistics
▪ Definition of statistics :
Statistics is the science , it deals with , collection, analysis ,
presentation and logical interpretation of the numerical facts / data .

▪ Data means observation made at the beginning of experiment or / and


during the experiment or/and at the end of the experiment .

▪ Observation may be number or any other outcome .

▪ Main sources of data :


1. Experiments
2. Surveys
3. Records

▪ Types of data :
1. Qualitative data
2. Quantitative data
Some Basic Terminology-
1.Characteristics-Qualities & Measurements
(Height,Weight,Income,Blood Pressure etc)

a. Attribute=Recorded in data as label or text.


(COUNTING)
Individuals divided into groups or categories based on the
attribute(2 groups-dichotomous-eg Cured or Not Cured)
or (multiple categories-Polychotomous-eg Blood Groups)

b. Variate=Recorded as Numerals (MEASUREMENTS)


Continuous variate –can have value in fractions
Eg-Ht,Wt,Hb
Discrete variate –cannot assume fractions,Round Figure
Eg-Family Size, Admission in hospitals.
Some Basic Terminology-
POPULATION SAMPLE
Sum total of all persons,objects or Portion of the population selected by
events about which we want to Pre-decided & scientific methods in
obtain information. a way that it REPRESENTS all units.
PARAMETER PP STATISTIC SS
Only one Parameter for a As many Statistics as many samples
Population for a given characteristics drawn from a population.
X,P x,p
Dimensions of Population- Samples give predictions &
Size,Geographic Area,Time ESTIMATES with reasonable
Frame,Nature ,Structure accuracy about POPULATION to be
(Homogenous-very little variation, studied when sample is drawn out
Heterogenous-wide variation in from P, with adequate SAMPLE
population) SIZE & using correct SAMPLING
METHODS.
Variables=Language of Statistics which is key to
understanding.
-Every measurement or feature is Variable.(We are
surrounded by variables).
-Assessing the characteristics of a variable,
comparing & contrasting two variables , seeking
association between multiple variables.
Attributes of a variable are its type
=>(Continuous, Discrete, Nominal ,Ordinal etc)

=>Dependent Variable or Response variable.


Independent or Exploratory variable.
TYPES OF DATA-
1. Qualitative data :It is a Enumeration data(Countable).It will have
only discrete figures.
Example: No.s of patients cured, ABO Blood Group System,Religion,
Gender,Diabetics-Non Diabetics, Urban-Poor.

2. Quantitative data :It is a measurement data. They can be


Continous (in fractions,decimels) or discrete.
Example : Pulse rate ,BP, Blood Sugar levels, GFR, Hb,Height, Weight,
Temperatures.

▪ Methods of presentation :
1. Tabulation method
2. Graphical method
Data
-Why it is essential to know type of data (types of variable
& scale of measurement) ?

=Because they require separate statistical treatment.


(Inferential Statistics)

=Helps selection of Test of Significance.

=Helps selection of Data Presentation method.


Data

Quantitative (Measurements in Units) Variables---


Interval & Ratio Scale.

Qualitative(Categorical, Countable) Variable----


Nominal & Ordinal Scale.
Data
Nominal-A nominal variable consists of named
categories ,with no implied order among the
categories.(Existential variables-property either
exits or doesn’t exits).

Ex-Males-Females,Cancer present absent,Heart


Disease Present Absent,Dead-Alive,Hair
Colour,Religions.(coding can be 1,2,3 etc)
Data
Ordinal-Ranked/Ordered, where the differences
between categories cannot be considered to be
equal.

Ex-
Grades of students-
Excellent,Satisfactory,Unsatisfactory.
Grades of Cancers-I,II,III,IV(Roman numbers)
Data
Interval- An interval variable ,in addition to ordinal
levels of measurement, has equal & fixed
distances(intervals) between values. The origin is
arbitrary.

Ex-Temperatures ,IQ(Ratio of IQ 100 & IQ 130 is


not meaningful ie Zero is not meaningful here)
No implication of Ratio in the sense that
30degrees F is not twice as hot as 15 degrees F.
Data
Ratio-In addition to the interval level of
measurement ,it has true zero point as its origin
therefore ratio is meaningful.

Ex-Weight in Kg, Most Laboratory values.


Data
VARIABLE TYPE ASSUMPTION
NOMINAL NAMED CATEGORIES
ORDINAL Same as Nominal plus ORDERED
CATEGORIES

INTERVAL Same as Ordinal plus EQUAL


INTERVALS

RATIO Same as Interval plus


MEANINGFUL ZERO
Descriptive Statistics
Inferential Statistics
• Measures of Central Tendency are statistical measures
which describe the position of a distribution.

• They are also called statistics of location, and are the


complement of statistics of dispersion, which provide
information concerning the variance or distribution of
observations.

• In the univariate context, the mean, median and mode


are the most commonly used measures of central
tendency.
CENTRAL TENDENCY-The figure which
represents the whole series is neither the lowest value
in the series nor the highest it lies somewhere between
these two extremes.
1.The average represents all the measurements made on
a group, and gives a concise description of the group
as a whole.

2.When two are more groups are measured, the central


tendency provides the basis of comparison between
them.

3. A measure of central tendency is a typical value


around which other figures congregate.”
1.Arithmetic mean is a mathematical
average and it is the most popular measures
of central tendency. It is frequently referred to
as ‘mean’ it is obtained by dividing sum of the
values of all observations in a series (ƩX) by
the number of observations (N) constituting the
series.
Thus, mean of a set of numbers X1, X2, X3,
………..Xn denoted by…..and is defined as

Mean=Sum of Observations /Number of Observations


=Ʃ X / N
Q. Calculate the mean, median and mode .
The diastolic blood pressure of 10 individuals were ,
83,75,81,79,71,95,75,77,84 and 90

Mean= Ʃ X /n
=83+75+……+90 /10
=810/10
=81
Advantages of a Mean
• It is easy to understand & simple calculate.
• It is based on all the values.
• It is rigidly defined .
• It is easy to understand the arithmetic average
even if some of the details of the data are
lacking.
• It is not based on the position in the series.
Disadvantages of a Mean
• It is affected by extreme values.
• It cannot be calculate for open end classes.

• It cannot be located graphically .


• MEDIAN is a central value of the distribution, or the
value which divides the distribution in equal parts, each
part containing equal number of observations. Thus it is the
central value of the variable, when the values are arranged
in order of magnitude.

• Connor has defined as “ The median is that value of the


variable which divides the group into two equal parts, one
part comprising of all values greater, and the other, all
values less than median”
Calculation of Median –Discrete series :
i.Arrange the data in ascending or descending
order.
ii.Calculate the cumulative frequencies.

iii. Apply the formula


Ex1-Find Median :81,75,77,71,75,95,83,84,79.

Ans 1-Arrange the observations in the increasing order of


magnitude.

Observations :

71,75,75,77, 79 ,81,83,84 and 95

N=9

Middle observation is 79,which is the median.


Ex 2-The following are the pulse rate per minute of 10
healthy individuals
82,79,60,76, 63,81,68,74,60,75.

Ans: First arrange the given data in ascending order of


magnitude as below
60,60,63,68,74,75,76,79,81,82
Here n =10
74+75
Median = ----------= 74.5
2
Advantages of Median
Median can be calculated in all distributions.

Median can be understood even by common people.

Median can be ascertained even with the extreme


observations.

It can be located graphically.

It is most useful when dealing with qualitative data.


Disadvantages of Median
It is not based on all the values.

It is not capable of further mathematical treatment.

It is affected by fluctuation of sampling.

In case of even no. of values it may not be the value from


the data.
MODE (MODEL-MOST FASHIONABLE)

Mode is the most frequently occurred observation in the


given series.
Used mostly by-Shoe-makers, Cloth Makers.

Find Mode : 71,75,75,77,79,81,83,84 and 95


It is most frequently , repeated observation in the given
series of observations .

It is 75 , is mode in this example .


Measures of Variability / Dispersion/
Scatteredness
TYPES OF VARIABILITY
1) Biological variability :
Ex: The same individuals show variation in pulse rate ,B.P.,
temperature .

2) Real variability :
Ex: Higher rate of coronary diseases in bus drivers than that in
conductors , may be due to strain or tension involved in driving .

3) Experimental variability:
It may be due to methods / materials ,defective weighing
machine.
It needs : Trained Interviewer , observer .
Untrained may give variability / error .

It is non- sampling error .


Measures of Variability
• Range

• Interquartile range ,Percentile

• Mean(Average) deviation

• Standard deviation (S.D.) & Variance

• S.D.
• C.V. = ------- X 100
• Mean

• Shape of normal distribution or normal curve OR


Skewness
Measures of Variability
• Standard error of mean.

• Standard error of difference between two means .

• Standard error of proportion.

• Standard error of difference between two proportions.

• Correlation coefficient
• RANGE-
Lowest observation to the highest observations in the given
series of observation. Very poor measure of variation in a
sample.

• Ex :
1. Systolic blood pressure
100-140 mm
2. Diastolic blood pressure
80-90 mm
3. Fasting blood sugar
80-120mm
• INTERQUARTILE RANGE –Arrange the data in
ascending or descending order & find median & further
divide lower half & upper half.
Mean deviation (MD) = Ʃ I x-m I / n

Variance = Ʃ ( x – x )² /n-1

√Ʃ ( x – m )² /n−1
Standard deviation (SD) =

Variance = ( S.D )²
Ex-
• Height of all students of a class-
• 100 students
• Mean height=1,74,522/100=174.52 cm
• Dispersion
• -Range is 160-182 cm
• -Standard Deviation is 11.5 cm

• We report this as 174.5 +- 11.5 (Mean+-SD)

• SD gives the reader an idea about the variation in sample.


3. Standard Deviation ( S.D. )
If x1 , x2 , x3 …….. xn are n observations then S.D. is defined as
2
n

 { xi - x }
i

S D = --------------- , where x is mean of given set of data.


n

n n


i
x i - {  xi }2/n
2

SD = ----------------------
n
It is a square root of the average of squares of the
deviations measured from the mean.
It measures the spread ness of the data about mean. S.D.
increases as spread ness about mean increases. If S.D. is
0, indicates all the observations are identical.
Note: If n is less than 30 then replace n in the denominator
by ( n – 1).
Merits of S.D.
• It is based on all the observations.
• It is a better measure of dispersion than Range
and M.D.(Mean Deviation)
• It is least affected by fluctuation of sampling.
• It is independent of change of origin.
• Ex. The Following are the height in cms of 5 students –
167,170,168,175,172. Find SD.

Sr No. x x-m (x-m)2

1. 167 -3.4 11.56

2. 170 -0.4 0.16

3. 168 -2.4 5.76

4. 175 4.6 21.16

5. 172 1.6 2.56

TOTAL 852 41.2

m=852/5=1 SD=Square
70.4 Root of
41.2/4=3.20
Applications of S.D.
=Most Commonly used measure of variation.
=Also used to measure Variance ,CV.
=Higher the SD, Higher the variation, provided the unit of
measurement are same.
=Presuming that the data shows normal (Gaussian)
distribution, we can make certain predictions about the
distribution of values within the sample. The approximate
predictions are below-
68% values will be within MEAN+-1SD
98% values will be within MEAN+-2SD
99% values will be within MEAN +-3SD.
This range is callled CONFIDENCE INTERVAL for the
sample.
Applications of S.D.
=To determine the precision/consistency/reliability of the
instrument. Reliability of the instrument can be determined
by calculating S.D. of repeated measurements on the same
subject by the same instrument.

=S.D. provides basis for the most of the statistical inference


procedures.
• 4. Coefficient of Variation:
Whenever we want to compare the variability in two or
more series of data, which differ in their averages or
measured in different units of measurement, Coefficient
of Variation is used.

It measures the variation/ spread ness in the data relative


to the size of the mean. It is independent of unit of
measurement.
It can also be used to compare the variability between
two or more characteristic/attribute within the series of
data .

S.D.
• C.V. = ------- X 100
Mean
the following table shows mean with SD for 10 subjects for the variables height and
weight.
Variable Mean SD
Height(cms) 175.5 7.29
Weight(Kgs) 72.4 12.27
Which parameter is more variable, weight or height?
C.V. ( height )= 4.15 C.V. ( weight )=16.94 . Here weight is more variable than
height.
Thanks
Summarizing Data by Graphical
method & Interpretation by Data
Presentation
Frequency distribution table
Table No.s 3
Simple bar diagram
Multiple bar diagram
Histogram
• Interval estimation can be contrasted with point
estimation.
• A point estimate is a single value given as the estimate of
a population parameter that is of interest, for example, the
mean of some quantity.
• An interval estimate specifies instead a range within
which the parameter is estimated to lie.
• Confidence intervals are commonly reported in tables or
graphs along with point estimates of the same
parameters, to show the reliability of the estimates
• The range within which the expected/ predicted value falls
is called the ‘precision’ of prediction and the chances of
predicted value falling in the range is called the ‘reliability’
of prediction.
• The reliability is expressed as confidence level and
the converse of it is significance level.

• That is, if the confidence level is 98%, the significance


level is 2%. The confidence level tells how sure we can be
and it is expressed as a percentage and represents how
often the prediction lies within the confidence interval (i.e.,
range).
• So any prediction should balance between the precision
and confidence level.
Reference-
• 1. Textbook of Statistics by Mahajan.
• 2. Bare’s Essentials of Biostatistics
THANKS

You might also like