Basics of Statistics
Basics of Statistics
• Nominal scale- the categories are merely names. They do not have a
natural order.
– Eg- male/female, yes/no
• Ordinal scale- the categories can be put in order. But the difference
between the two may not be same as other two.
– Eg- mild/ Moderate/ Severe.
• Interval scale- the differences between variables are comparable. The
variable does not has absolute zero.
– Eg- temperature, time
• Ratio scale- the variable has absolute zero as well as difference between
variables are comparable..
– Eg- stress using PSS, insomnia using ISI
Interquartile
Median Kurtosis
Range
Standard
Mode
Deviation
Measures of center
• Central tendency- In any distribution, majority of the observations pile
up, or cluster around in a particular region.
– Includes- Mean, Median & Mode.
• Outlier- observation that falls far from the rest of the data. Mean gets
highly influenced by the outlier.
• We use sample mean, median & mode to estimate the population mean,
median & mode.
Measures of dispersion
• Dispersion- It is the spread/ variability of values about the measures of
central tendency. They quantify the variability of the distribution.
• Measures include-
– Range
– Sample interquartile range
– Standard deviation
• Range- difference between the largest observed value in the data set and
the smallest one.
– So, while considering range great deal of information is ignored.
• Interquartile range- difference between the first & third quartiles of the
variable.
– Percentile- divides the observed values into hundredths/ 100 equal
parts.
– Deciles- divides the observed values into tenths/ 10 equal parts
– Quartiles- divides the observed values into 4 equal parts. Q1 divides
the bottom 25% of observed values from top 75%...
• Properties-
– Mean, median & mode fall at different points.
– Quartiles are not equidistant from median.
– Curve is not symmetrical but stretched more to one side.
• Distribution may be positively or negatively skewed. Limits for
coefficient of skewness is ± 3.
• Test statistics- statistic calculated from the sample data to test the null
hypothesis.
• p-value- is the probability, if H0 were true, that the test statistic would fall
in this collection of values. The smaller the p-value, the more strongly
the data contradicts H0.
• When p-value ≤ 0.05, data sufficiently contradicts H0.
Types of error
• Type I/ α error- Rejecting true null hypothesis.
– We may conclude that difference is significant, when in fact there is no
real difference.
– It is popularly known as p-value. Maximum p-value allowed is called
as level of significance. Being serious p-value is kept low, mostly less
than 5% or p<0.05.
• Not possible to reduce both type I & II, So α error is fixed at a tolerable
limit & β error is minimized by ↑ sample size.
Estimation of Sample size
• Small sample- fails to detect clinically important effects (lack of Power)
• Large sample- identify differences which has no clinical relevance.
Nominal Mode
Chi-Square test
Ordinal Mode/ Median
Interval
Mean, Standard t-test, ANOVA, Post hoc,
Deviation Correlation, Regression,
Ratio
One sample Independent Dependent
t-Test t- test t- test
• Limitation- it does not say anything about Cause & Effect relationship.
– Beware of spurious/ non sense correlation.
• Correlation-
– Strength/ degree of association.
• Regression-
– Nature of association (eg- if x & y related, it means if x changes by
certain amount then y changes on an average by certain amount).
– Expresses the linear relationship between variables.
– Regression coefficient- β
– Types- Linear, Non linear, Stepwise
• Advantages-
– When data does not follow normal distribution.
– When the average is better represented by median.
– Sample size is small.
– Presence of outliers.
– Relatively simple to conduct
Tests
Characters Parametric test Non Parametric test
Testing mean, a
One sample t test Sign test
hypothesized value
Comparison of means of
Independent t test Mann Whitney U test
2 groups
Means of related Wilcoxon Signed rank
Paired t test
samples test
Comparison of means of
ANOVA Kruskal Wallis test
> 2 groups
Comparison of means of Repeated measures of
Friedman’s test
> 2 related groups ANOVA
Assessing the
relationship between 2 Pearson’s correlation Spearman’s correlation
quantitative variables
Chi-Square test
• Used for analysis of categorical data.
• Other tests- Fisher exact probability test, McNemar’s test.
• Requirements of Chi-Square-
– Sample should be independent
– Sample size should be reasonably large (n >40)
– Expected cell frequency should not be < 5.
• Designs used-
– Case studies
– Comparative designs
– Snapshots
– Retrospective & Longitudinal studies
Statistical software packages
Quantitative research Qualitative research
• SPSS by IBM • ATLASti
• R by R Foundation • NVIVO