0% found this document useful (0 votes)

10 views40 pages

Basics of Statistics

Uploaded by

aayetg8386

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views40 pages

Basics of Statistics

Uploaded by

aayetg8386

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Basics of statistics

Conducted by Dept. of Biostatistics, NIMHANS

From 28 to 30 Sept, 2015

"It is easy to lie with statistics,

But it is hard to tell the truth without statistics."
–Andrejs Dunkels
Topics covered
• Introduction • Estimation of sample size
• Types of statistics • Various tests to be used
• Definitions • Central limit theorem
• Variable & Types • Parametric tests- t-test, ANOVA,
• Variable scales Post Hoc, Correlation &
• Description of data Regression
• Distribution of sample & • Non Parametric tests
population • Tests for categorical data
• Measures of center, dispersion & • Summary of tests to be used
shape • Qualitative vs Quantitative
• Properties of Normal distribution research
• Testing of hypothesis • Qualitative research
• Types of error • Software packages
Statistics
• Consists of a body of methods for collecting and analyzing data.

• It provides methods for-

– Design- planning and carrying out research studies

– Description- summarizing and exploring data

– Inference- making predictions & generalizing about phenomena

represented by data.
Types of statistics
• 2 major types of statistics
• Descriptive statistics- It consists of methods for organizing and
summarizing information.
– Includes- graphs, charts, tables & calculation of averages, percentiles

• Inferential statistics- It consists of methods for drawing and measuring

the reliability of conclusions about population based on information
obtained.
– Includes- point estimation, interval estimation, hypothesis testing.

• Both are interrelated. Necessary to use methods of descriptive statistics to

organize and summarize the information obtained before methods of
inferential statistics can be used.
Population & Sample
• Basic concepts in statistics.

• Population- It is the collection of all individuals or items under

consideration in a statistical study

• Sample- It is the part of the population from which information is

collected.

• Population always represents the target of an investigation. We learn about

population by sampling from the collection.
• Parameters- used to summarize the features of the population under
investigation.

• Statistic- it describes a characteristics of the sample, which can then be

used to make inference about unknown parameters.
Variable & types
• Variable- a characteristic that varies from one person or thing to another.

• Types- Qualitative/ Quantitative, Discrete/ Continuous, Dependent/

Independent

• Qualitative data- the variable which yield non numerical data.

– Eg- sex, marital status, eye colour

• Quantitative data- the variables that yield numerical data

– Eg- height, weight, number of siblings.
• Discrete variable- the variable has only a countable number of distinct
possible values.
– Eg- number of car accidents, number of children

• Continuous variable- the variable has divisible unit.

– Eg- weight, length, temperature.

• Independent variable- variable is not dependent on other variable.

– Eg- age, sex.

• Dependent variable- depends on the independent variable.

– Eg- weight of a newborn, stress
Variable scales
• Variables can also be described according to the scale on which they are
defined.

• Nominal scale- the categories are merely names. They do not have a
natural order.
– Eg- male/female, yes/no

• Ordinal scale- the categories can be put in order. But the difference
between the two may not be same as other two.
– Eg- mild/ Moderate/ Severe.
• Interval scale- the differences between variables are comparable. The
variable does not has absolute zero.
– Eg- temperature, time

• Ratio scale- the variable has absolute zero as well as difference between
variables are comparable..
– Eg- stress using PSS, insomnia using ISI

• Nominal & Ordinal scales are used to describe Qualitative data.

• Interval & Ratio scales are used to describe Quantitative data.
Describing data
• Qualitative data-

– Frequency- number of observations falling into particular class/

category of the qualitative variable.

– Frequency distribution- table listing all classes & their frequencies.

– Graphical representation- Pie chart, Bar graph.

– Nominal data best displayed by pie chart

– Ordinal data best displayed by bar graph
• Quantitative data-
– Can be presented by a frequency distribution.
– If the discrete variable has a lot of different values, or if the data is a
continuous variable then data can be grouped into classes/ categories.

– Class interval- covers the range between maximum & minimum

values.
– Class limits- end points of class interval.
– Class frequency- number of observations in the data that belong to
each class interval.

– Usually presented as a Histogram or a Bar graph.

Population & Sample distribution
• Population distribution- frequency distribution of the population.
• Sample distribution- frequency distribution of the sample.

• Sample distribution is a blurry photo of the population distribution.

• As the sample size ↑, the sample distribution becomes closer representative
of the population distribution.

• Sample of population distribution can be summarized by describing its

shape (based on the graph).

• It can be Symmetric or Nonsymmetric/ Skewed to left/ right based on its

tail.
Properties of
Numerical data &
Measures

Central tendency Dispersion Shape

Mean Range Skewness

Interquartile
Median Kurtosis
Range

Standard
Mode
Deviation
Measures of center
• Central tendency- In any distribution, majority of the observations pile
up, or cluster around in a particular region.
– Includes- Mean, Median & Mode.

• Mean- sum of observed values in a data divided by the number of

observations
• Median- observation in the data set that divides the data set into half.
• Mode- value of the data set which occurs with greatest frequency

• Mean & Median can be applied only to Quantitative data

• Mode can be used either to Qualitative or Quantitative data.
What to choose?
• Qualitative variable- Mode.
• Quantitative with symmetric distribution- Mean.
• Quantitative with skewed distribution- Median.

• Outlier- observation that falls far from the rest of the data. Mean gets
highly influenced by the outlier.

• We use sample mean, median & mode to estimate the population mean,
median & mode.
Measures of dispersion
• Dispersion- It is the spread/ variability of values about the measures of
central tendency. They quantify the variability of the distribution.
• Measures include-
– Range
– Sample interquartile range
– Standard deviation

• Mostly used for quantitative data

• Range- difference between the largest observed value in the data set and
the smallest one.
– So, while considering range great deal of information is ignored.
• Interquartile range- difference between the first & third quartiles of the
variable.
– Percentile- divides the observed values into hundredths/ 100 equal
parts.
– Deciles- divides the observed values into tenths/ 10 equal parts
– Quartiles- divides the observed values into 4 equal parts. Q1 divides
the bottom 25% of observed values from top 75%...

• Standard deviation- it is a kind of average of the absolute deviation of

observed values from the mean of the variable.
– It is defined using the sample mean & values get strongly affected by
few extreme observations.
Shape
• Skewness- Lack of symmetry in distribution. It can be interpreted from
frequency polygon.

• Properties-
– Mean, median & mode fall at different points.
– Quartiles are not equidistant from median.
– Curve is not symmetrical but stretched more to one side.
• Distribution may be positively or negatively skewed. Limits for
coefficient of skewness is ± 3.

• Kurtosis- convexity of a curve.

– Gives an idea about the flatness/ peakedness of the curve.
Normal distribution
• Bell shaped symmetric distribution.
• Why is it important?
– Many things are normally distributed, or very close to it.
– It is easy to work with mathematically
– Most inferential statistical methods make use of properties of the
normal distribution.

• Mean = Median = Mode

• 68.2% of the values lie within 1SD.

• 95.4% of the values lie within 2SD.
• 99.7% of the values lie within 3SD.
Tests to check normal distribution
1. Checking measures of Central tendency, Skewness & Kurtosis.
2. Graphical evaluation- normal plot, frequency polygon.
3. Statistical tests-
– Kolmogorov-Smirnov test
– Shapiro-Wilk test
– Lilliefor’s test
– Pearson’s chi-squared test

• Shapiro-Wilk has the best power for a given significance.

• If not normally distributed?- correction by transformation of the data- log

transformation, square root transformation.
Hypothesis testing
• Aim of doing a study is to check whether the data agree with certain
predictions. These predictions are called hypothesis.

• Hypothesis arise from the theory that drives the research.

• Significance test- it is a way of statistically testing a hypothesis by

comparing the data values.
– It consists of two hypothesis- Null (H0) & Alternative hypothesis (H1).
– Null hypothesis is usually a statement that the parameter has value
corresponding to, in some sense, no effect.
– Alternative hypothesis is a hypothesis contradicts null hypothesis.
– Hypothesis are formulated before collecting the data.
• Significance test analyzes the strength of sample evidence against the
null hypothesis.
• The test is conducted to investigate whether the data contradicts the null
hypothesis, suggesting alternative hypothesis is true.

• Test statistics- statistic calculated from the sample data to test the null
hypothesis.

• p-value- is the probability, if H0 were true, that the test statistic would fall
in this collection of values. The smaller the p-value, the more strongly
the data contradicts H0.
• When p-value ≤ 0.05, data sufficiently contradicts H0.
Types of error
• Type I/ α error- Rejecting true null hypothesis.
– We may conclude that difference is significant, when in fact there is no
real difference.
– It is popularly known as p-value. Maximum p-value allowed is called
as level of significance. Being serious p-value is kept low, mostly less
than 5% or p<0.05.

• Type II/ β error- Accepting false null hypothesis.

– We may conclude that difference is not significant, when in fact there
is real difference.
– It is also called as Power of the test & indicates sensitivity of the test.

• Not possible to reduce both type I & II, So α error is fixed at a tolerable
limit & β error is minimized by ↑ sample size.
Estimation of Sample size
• Small sample- fails to detect clinically important effects (lack of Power)
• Large sample- identify differences which has no clinical relevance.

• Calculation is based on (not included formulas)-

– Estimation of mean
– Estimation of proportions
– Comparison in two means
– Comparison in two proportions

• Checklist- level of significance, power, study design, statistical procedure.

• Minimum sample size required for statistical analysis- 50.

Basic theorem in statistics
• Central limit theorem-
– States that the distribution of the sum/ average of a large number of
independent, identically distributed variables will be approximately
normal.

• Why is this important?

– Basis of many statistical procedures.
Parametric tests
• These are statistical tests that makes assumptions about the parameters
(defining properties).

• Assumptions made are-

– Data follows normal distribution.
– Sample size is large enough for Central limit theorem to lead to
normality of averages.
– Data is not normal, but can be transformed.

• Some situations where data does not follow normal distribution-

– Outcome is an ordinal variable.
– Presence of definite outliers
– Outcome has clear limits of demarcation.
Tests to be used
Scale type Permissible statistics

Nominal Mode
Chi-Square test
Ordinal Mode/ Median

Interval
Mean, Standard t-test, ANOVA, Post hoc,
Deviation Correlation, Regression,
Ratio
One sample Independent Dependent
t-Test t- test t- test

Compares the sample Compares the means of Compares the means of

mean with the two independent paired samples
population mean samples (before-after, pre-post)
ANOVA
• t- Test- difference between 2 means.
– If there are more than 2 means, then doing t test increases the α & β
error. Which creates a serious flaw.

• So when there are >2 means to be compared we use ANOVA.

• Types-
– One way- study effects of one factors.
– Two way- study effects of multiple factors.

• Assumptions of ANOVA- Normality, Linearity.

• ANCOVA- It is a blend of ANOVA & Regression. In other words,

measures how much 2 variables change together & how strong is the
relationship.
Post Hoc
• Latin phrase, means- “after this” or “after the event”

• Why do Post hoc tests?

– ANOVA tells whether there is an overall difference between groups,
but it does not tell which specific group differed.
– Post hoc tests tell where the difference occurred between groups.

• Different Post hoc tests-

– Bonferroni
– Fisher’s least significant difference (LSD)
– Tukey’s honestly significant difference (HSD)
– Scheffe post hoc tests
Correlation & Regression
• Correlation- denotes association between 2 quantitative variables.
– Assume that the association is linear (i.e.., one variable ↑/ ↓ a fixed
amount for a unit ↑/ ↓ in the other).
– Degree of association is measured by a correlation coefficient, r.
– r is measured on a scale from -1 through 0 to +1.
– When both variables ↑, then r is + & when 1 variable ↑ and other
decreases, then r is -.

• Graphically- Scatter diagrams, usually independent variable is plotted

against x-axis & dependent against y-axis.

• Limitation- it does not say anything about Cause & Effect relationship.
– Beware of spurious/ non sense correlation.
• Correlation-
– Strength/ degree of association.

• Regression-
– Nature of association (eg- if x & y related, it means if x changes by
certain amount then y changes on an average by certain amount).
– Expresses the linear relationship between variables.
– Regression coefficient- β
– Types- Linear, Non linear, Stepwise

• Regression coefficient gives a better summary of the relationship between

the two variables than Correlation coefficient.
Non Parametric tests
• Also called as “Distribution free tests”, because they are based on fewer
assumptions.

• Advantages-
– When data does not follow normal distribution.
– When the average is better represented by median.
– Sample size is small.
– Presence of outliers.
– Relatively simple to conduct
Tests
Characters Parametric test Non Parametric test
Testing mean, a
One sample t test Sign test
hypothesized value
Comparison of means of
Independent t test Mann Whitney U test
2 groups
Means of related Wilcoxon Signed rank
Paired t test
samples test
Comparison of means of
ANOVA Kruskal Wallis test
> 2 groups
Comparison of means of Repeated measures of
Friedman’s test
> 2 related groups ANOVA
Assessing the
relationship between 2 Pearson’s correlation Spearman’s correlation
quantitative variables
Chi-Square test
• Used for analysis of categorical data.
• Other tests- Fisher exact probability test, McNemar’s test.

• Requirements of Chi-Square-
– Sample should be independent
– Sample size should be reasonably large (n >40)
– Expected cell frequency should not be < 5.

• Yate’s correction- if expected cell frequency is < 5

• Fisher exact probability test- used when sample size is small (n < 20)
• McNemar’s test- used when there are two related samples or there are
repeated measurements
RR & OR
• Relative Risk (RR)-
– It is the ratio of incidence rate among exposed to the incidence rate
among not exposed.
– used in RCTs & Cohort studies
– Values- <1 - risk of disease is less among exposed
– >1 – risk of disease is more among exposed
– =1 – equal risk among exposed & non exposed
• Odds Ratio (OR)-
– Ratio of odds of exposure among the cases to odds of exposure among
controls. Used for rare diseases/ events
– Used in case control & retrospective studies (no meaning in calculating
the risk of getting the disease)
– Values- >1- more among cases, <1- more among controls
Qualitative v/s Quantitative
Qualitative research Quantitative research
• Seeks to confirm hypothesis • Seeks to explore phenomena

• Highly structured methods used • Semi-structured methods used

• Uses closed ended, numerical • Uses open ended, textual methods

methods of collecting data

• Study design is fixed & subject to • Study design is flexible, iterative

statistical assumptions & subject to textual analysis
Qualitative research
• Provides complex descriptions & information about issues such as
contradictory behavior, belief, opinions, emotions & relationships.

• Methods used are-

– Phenomenology
– Ethnography
– Grounded theory

• Designs used-
– Case studies
– Comparative designs
– Snapshots
– Retrospective & Longitudinal studies
Statistical software packages
Quantitative research Qualitative research
• SPSS by IBM • ATLASti

• R by R Foundation • NVIVO

• GenStat by VSN International • MAXQDA

• Mathematica by Wolfram • NUDist

research
• ANTHTOPAC
• Minitab, MATLAB, Nmath Stats
etc..,
Thank you

"An approximate answer to the right problem is worth a good deal,

more than an exact answer to an approximate problem." -- John Tukey

Solutions: Solutions Manual For Introduction To The Thermodynamics of Materials 6Th Edition Gaskell
75% (4)
Solutions: Solutions Manual For Introduction To The Thermodynamics of Materials 6Th Edition Gaskell
228 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
BCM-Blood Circulatory Massager - TIEN'S Presentation
75% (8)
BCM-Blood Circulatory Massager - TIEN'S Presentation
52 pages
R Max Powered Running Manual
100% (2)
R Max Powered Running Manual
40 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Basic Concepts in Biostatistics-1
No ratings yet
Basic Concepts in Biostatistics-1
40 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
1-Introduction To Statistics
100% (1)
1-Introduction To Statistics
19 pages
1 - 2 Biostatistics
No ratings yet
1 - 2 Biostatistics
24 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
100% (1)
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
28 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
Biostatistics Notes-Numbered
No ratings yet
Biostatistics Notes-Numbered
21 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
Understandingstatisticsinresearch 151026064600 Lva1 App6892
No ratings yet
Understandingstatisticsinresearch 151026064600 Lva1 App6892
37 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Lecture 01
No ratings yet
Lecture 01
12 pages
AEB02 - Basic Biostatistics (FE)
No ratings yet
AEB02 - Basic Biostatistics (FE)
36 pages
Math Notes Module 4A
No ratings yet
Math Notes Module 4A
4 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
No ratings yet
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
129 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Biostat Aguila Mission Solis
No ratings yet
Biostat Aguila Mission Solis
44 pages
Statistics
No ratings yet
Statistics
11 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Elementary Statisctics Reviewer
No ratings yet
Elementary Statisctics Reviewer
5 pages
Basic Concepts in Biostatistics-2
No ratings yet
Basic Concepts in Biostatistics-2
35 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Organization of Data
No ratings yet
Organization of Data
6 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Chapter 1 Descriptivestatistics
No ratings yet
Chapter 1 Descriptivestatistics
21 pages
W1 Lesson 1 - Basic Statistical Concepts - Module PDF
No ratings yet
W1 Lesson 1 - Basic Statistical Concepts - Module PDF
11 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Biostatistics 1
No ratings yet
Biostatistics 1
120 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Variables & Data Presentation
No ratings yet
Variables & Data Presentation
39 pages
Gned 3 Finals Reviewer
No ratings yet
Gned 3 Finals Reviewer
5 pages
Introduction
No ratings yet
Introduction
10 pages
1-STAT-302 - Spring 2019 (4 Slides Per Page Can Be Printed)
No ratings yet
1-STAT-302 - Spring 2019 (4 Slides Per Page Can Be Printed)
25 pages
Updated - BCSC 108 MAY 24 Introduction To Statistics
No ratings yet
Updated - BCSC 108 MAY 24 Introduction To Statistics
69 pages
Review of Islanding Detection Using Advanced Signal Processing Techniques
No ratings yet
Review of Islanding Detection Using Advanced Signal Processing Techniques
22 pages
EV Juices - Pixelmon Wiki
No ratings yet
EV Juices - Pixelmon Wiki
1 page
Life Science Book
67% (3)
Life Science Book
448 pages
AN240P
No ratings yet
AN240P
5 pages
CS Project File
No ratings yet
CS Project File
8 pages
Tajneen Islam - Write-Up 3 Soils
No ratings yet
Tajneen Islam - Write-Up 3 Soils
9 pages
Quantitative Investigation
No ratings yet
Quantitative Investigation
10 pages
Transcripts
No ratings yet
Transcripts
3 pages
gooFSM Research Full Chapters
No ratings yet
gooFSM Research Full Chapters
79 pages
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
No ratings yet
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
32 pages
Strategic Marketing Plan For Black Thunder
No ratings yet
Strategic Marketing Plan For Black Thunder
24 pages
Test of Difference and Friedman
No ratings yet
Test of Difference and Friedman
11 pages
RRB ALP 2024 CBT-1 and CBT-2 Complete Syllabus
No ratings yet
RRB ALP 2024 CBT-1 and CBT-2 Complete Syllabus
5 pages
Frequency Spectrum Analysis and Applications To Steam Turbine Vibrations
No ratings yet
Frequency Spectrum Analysis and Applications To Steam Turbine Vibrations
3 pages
Behavioral Pragmatism Barnes Holmes
No ratings yet
Behavioral Pragmatism Barnes Holmes
12 pages
SKF WWW - Ihb.ch e Bearings in El Motors and Generators-Komprimiert
No ratings yet
SKF WWW - Ihb.ch e Bearings in El Motors and Generators-Komprimiert
122 pages
SOM - by EasyEngineering - Net 01 PDF
No ratings yet
SOM - by EasyEngineering - Net 01 PDF
129 pages
Root Cause Analysis Enhancing Safety in Chemical Processing Environments
100% (1)
Root Cause Analysis Enhancing Safety in Chemical Processing Environments
91 pages
Strength and Durability of Mortar and Concrete Containing Rice Husk Ash: A Review
No ratings yet
Strength and Durability of Mortar and Concrete Containing Rice Husk Ash: A Review
15 pages
2025 Applicationguideline E-25
No ratings yet
2025 Applicationguideline E-25
1 page
Memento CRT
No ratings yet
Memento CRT
4 pages
Golden Rice - A Case Study in Intellectual Property Management and
No ratings yet
Golden Rice - A Case Study in Intellectual Property Management and
23 pages
Fluvial Processes
No ratings yet
Fluvial Processes
35 pages
10TH B Test Series 2024-2025 1ST Round Front Page
No ratings yet
10TH B Test Series 2024-2025 1ST Round Front Page
2 pages
7673 Final Report - ST-2016-7673-1
No ratings yet
7673 Final Report - ST-2016-7673-1
58 pages
12V Power Supply Manual
No ratings yet
12V Power Supply Manual
1 page
Mock Analysis
No ratings yet
Mock Analysis
1 page

Basics of Statistics

Uploaded by

Basics of Statistics

Uploaded by

Basics of statistics

Conducted by Dept. of Biostatistics, NIMHANS

"It is easy to lie with statistics,

• It provides methods for-

– Design- planning and carrying out research studies

– Description- summarizing and exploring data

– Inference- making predictions & generalizing about phenomena

• Inferential statistics- It consists of methods for drawing and measuring

• Both are interrelated. Necessary to use methods of descriptive statistics to

• Population- It is the collection of all individuals or items under

• Sample- It is the part of the population from which information is

• Population always represents the target of an investigation. We learn about

• Statistic- it describes a characteristics of the sample, which can then be

• Types- Qualitative/ Quantitative, Discrete/ Continuous, Dependent/

• Qualitative data- the variable which yield non numerical data.

• Quantitative data- the variables that yield numerical data

• Continuous variable- the variable has divisible unit.

• Independent variable- variable is not dependent on other variable.

• Dependent variable- depends on the independent variable.

• Nominal & Ordinal scales are used to describe Qualitative data.

– Frequency- number of observations falling into particular class/

– Frequency distribution- table listing all classes & their frequencies.

– Graphical representation- Pie chart, Bar graph.

– Nominal data best displayed by pie chart

– Class interval- covers the range between maximum & minimum

– Usually presented as a Histogram or a Bar graph.

• Sample distribution is a blurry photo of the population distribution.

• Sample of population distribution can be summarized by describing its

• It can be Symmetric or Nonsymmetric/ Skewed to left/ right based on its

Central tendency Dispersion Shape

Mean Range Skewness

• Mean- sum of observed values in a data divided by the number of

• Mean & Median can be applied only to Quantitative data

• Mostly used for quantitative data

• Standard deviation- it is a kind of average of the absolute deviation of

• Kurtosis- convexity of a curve.

• Mean = Median = Mode

• 68.2% of the values lie within 1SD.

• Shapiro-Wilk has the best power for a given significance.

• If not normally distributed?- correction by transformation of the data- log

• Hypothesis arise from the theory that drives the research.

• Significance test- it is a way of statistically testing a hypothesis by

• Type II/ β error- Accepting false null hypothesis.

• Calculation is based on (not included formulas)-

• Checklist- level of significance, power, study design, statistical procedure.

• Minimum sample size required for statistical analysis- 50.

• Why is this important?

• Assumptions made are-

• Some situations where data does not follow normal distribution-

Compares the sample Compares the means of Compares the means of

• So when there are >2 means to be compared we use ANOVA.

• Assumptions of ANOVA- Normality, Linearity.

• ANCOVA- It is a blend of ANOVA & Regression. In other words,

• Why do Post hoc tests?

• Different Post hoc tests-

• Graphically- Scatter diagrams, usually independent variable is plotted

• Regression coefficient gives a better summary of the relationship between

• Yate’s correction- if expected cell frequency is < 5

• Highly structured methods used • Semi-structured methods used

• Uses closed ended, numerical • Uses open ended, textual methods

• Study design is fixed & subject to • Study design is flexible, iterative

• Methods used are-

• GenStat by VSN International • MAXQDA

• Mathematica by Wolfram • NUDist

"An approximate answer to the right problem is worth a good deal,

You might also like