0% found this document useful (0 votes)
137 views19 pages

Mod 3 Statistical Methods

PPT Mod 3 Statistical Methods

Uploaded by

Dr Rakesh Thakor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views19 pages

Mod 3 Statistical Methods

PPT Mod 3 Statistical Methods

Uploaded by

Dr Rakesh Thakor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Course Name: RESEARCH METHODOLOGY

Module 3: Statistical Methods

1
Statistics

• Statistics is a form of mathematical analysis that uses

quantified models, representations and synopses for a given

set of experimental data or real-life studies.

• Statistics studies methodologies to gather, review, analyze and

draw conclusions from data.

2
Statistical Methods
Distributions

• A distribution in statistics is a function that shows the possible


values for a variable and how often they occur.
• The distribution of a statistical data set (or a population) is a
listing or function showing all the possible values (or intervals)
of the data and how often they occur. When a distribution of
categorical data is organized, you see the number or percentage
of individuals in each group.
• The distribution of an event consists not only of the input values
that can be observed, but is made up of all possible values.
• the distribution in statistics is defined by the underlying
probabilities and not the graph. The graph is just a visual
representation.

3
Statistical Methods
Distributions - Types

Probability Distribution - A probability distribution is a function that describes the likelihood of obtaining the
possible values that a random variable can assume. In other words, the values of the variable vary based on the
underlying probability distribution.
A function ( or mapping ) of events to probabilities

Motivation:
Using historical data and experience (or assumptions), a convenient way to estimate or predict probabilities of
events

Methods:
• Using histograms
• Using probability density functions
• Using cumulative distribution functions

4
Statistical Methods
Distributions - Types

Probability distributions are generally divided into two classes.

A discrete probability distribution (applicable to the scenarios where the set of possible outcomes is
discrete, such as a coin toss or a roll of dice) can be encoded by a discrete list of the probabilities of the
outcomes, known as a probability mass function.

A continuous probability distribution (applicable to the scenarios where the set of possible outcomes can
take on values in a continuous range (e.g. real numbers), such as the temperature on a given day) is typically
described by probability density functions (with the probability of any individual outcome actually being 0).

• Basic probability distributions which can be shown on a probability distribution table.

• Binomial distributions, which have “Successes” and “Failures.”

• Normal distributions, sometimes called a Bell Curve.

5
Statistical Methods
Statistical Inference

Statistical inference is the process of using data analysis to deduce properties of an underlying distribution
of probability.
Statistical inference makes propositions about a population, using data drawn from the population with
some form of sampling. Given a hypothesis about a population, for which we wish to draw inferences,
statistical inference consists of (first) selecting a statistical model of the process that generates the data and
(second) deducing propositions from the model.
Inferential statistical analysis infers properties of a population, for example by testing hypotheses and
deriving estimates. It is assumed that the observed data set is sampled from a larger population.
Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the
assumption that the data come from a larger population.
The conclusion of a statistical inference is a statistical proposition.

6
Statistical Methods
Statistical Inference

Some common forms of statistical proposition are the following:


• A point estimate, i.e. a particular value that best approximates some parameter of interest;
• An interval estimate, e.g. a confidence interval (or set estimate), i.e. an interval constructed using a dataset
drawn from a population so that, under repeated sampling of such datasets, such intervals would contain the
true parameter value with the probability at the stated confidence level;
• A credible interval, i.e. a set of values containing, for example, 95% of posterior belief
• Rejection of a hypothesis;
•Clustering or classification of data points into groups.

The ingredients used for making statistical inference are:


• Sample Size
• Variability in the sample
• Size of the observed differences.

7
Statistical Methods
Statistical Inference Procedure

The procedure involved in inferential statistics are:


• Begin with a theory
• Create a research hypothesis
• Operationalize the variables
• Recognize the population to which the study results should apply
• Formulate a null hypothesis for this population
• Accumulate a sample of children from the population and continue the study
• Conduct statistical tests to see if the collected sample properties are adequately different from what would
be expected under the null hypothesis to be able to reject the null hypothesis

8
Statistical Methods
Importance of Statistical Inference

• Inferential Statistics is important to examine the data properly.


• To make an accurate conclusion, proper data analysis is important to interpret the research results.
• It is majorly used in the future prediction for various observations in different fields.
• It helps us to make inference about the data.
• The statistical inference has a wide range of application in different fields such as:
• Business Analysis
• Artificial Intelligence
• Financial Analysis
• Fraud Detection
• Machine Learning
• Share Market
• Pharmaceutical Sector

9
Statistical Methods
Analysis of variance

• Analysis of variance (ANOVA) is a collection of statistical models


and their associated estimation procedures (such as the
"variation" among and between groups) used to analyze the
differences among group means in a sample.
• ANOVA was developed by statistician and evolutionary biologist
Ronald Fisher.
• The ANOVA is based on the law of total variance, where the
observed variance in a particular variable is partitioned into
components attributable to different sources of variation.
• In its simplest form, ANOVA provides a statistical test of whether
two or more population means are equal, and therefore
generalizes the t-test beyond two means.

10
Statistical Methods
Analysis of variance

ANOVA is a form of statistical hypothesis testing heavily used in the analysis of experimental data.
A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is
deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis.
A statistically significant result, when a probability (p-value) is less than a pre-specified threshold (significance
level), justifies the rejection of the null hypothesis, but only if the a priori probability of the null hypothesis is
not high.
In the typical application of ANOVA, the null hypothesis is that all groups are random samples from the same
population.
For example, when studying the effect of different treatments on similar samples of patients, the null
hypothesis would be that all treatments have the same effect (perhaps none).
Rejecting the null hypothesis is taken to mean that the differences in observed effects between treatment
groups are unlikely to be due to random chance.

11
Statistical Methods
Analysis of variance

ANOVA is the synthesis of several ideas and it is used for multiple purposes.
It is difficult to define concisely or precisely.

"Classical" ANOVA for balanced data does three things at once:


• As exploratory data analysis, an ANOVA employs an additive data decomposition, and its sums of squares
indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear
model).
• Comparisons of mean squares, along with an F-test ... allow testing of a nested sequence of models.
• Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors.

ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.

12
Statistical Methods
Analysis of variance

 It is computationally elegant and relatively robust against violations of its assumptions.


 ANOVA provides strong (multiple sample comparison) statistical analysis.
 It has been adapted to the analysis of a variety of experimental designs.
 ANOVA "has long enjoyed the status of being the most used (some would say abused) statistical
technique in psychological research.
 ANOVA "is probably the most useful technique in the field of statistical inference.
 ANOVA is difficult to teach, particularly for complex experiments, with split-plot designs being
notorious.

13
Statistical Methods
Classes of models

Fixed-effects models
The fixed-effects model (class I) of analysis of variance applies to situations in which the
experimenter applies one or more treatments to the subjects of the experiment to see whether the
response variable values change. This allows the experimenter to estimate the ranges of response
variable values that the treatment would generate in the population as a whole.
Random-effects models
Random-effects model (class II) is used when the treatments are not fixed. This occurs when the
various factor levels are sampled from a larger population. Because the levels themselves are
random variables, some assumptions and the method of contrasting the treatments (a multi-
variable generalization of simple differences) differ from the fixed-effects model.[19]

14
Statistical Methods
Classes of models

• Mixed-effects models
• A mixed-effects model (class III) contains experimental factors of both fixed and
random-effects types, with appropriately different interpretations and analysis for
the two types.
• Example: Teaching experiments could be performed by a college or university
department to find a good introductory textbook, with each text considered a
treatment. The fixed-effects model would compare a list of candidate texts. The
random-effects model would determine whether important differences exist among
a list of randomly selected texts. The mixed-effects model would compare the (fixed)
incumbent texts to randomly selected alternatives.

15
Statistical Methods
Analysis of variance

• One-way ANOVA is used to test for differences among two or more independent groups (means), e.g.
different levels of urea application in a crop, or different levels of antibiotic action on several different
bacterial species,or different levels of effect of some medicine on groups of patients. However, should these
groups not be independent, and there is an order in the groups (such as mild, moderate and severe disease),
or in the dose of a drug (such as 5 mg/mL, 10 mg/mL, 20 mg/mL) given to the same group of patients, then a
linear trend estimation should be used. The one-way ANOVA is used to test for differences among at least
three groups, since the two-group case can be covered by a t-test. When there are only two means to
compare, the t-test and the ANOVA F-test are equivalent; the relation between ANOVA and t is given by F = t2.
• Factorial ANOVA is used when the experimenter wants to study the interaction effects among the
treatments.
• Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a
longitudinal study).
• Multivariate analysis of variance (MANOVA) is used when there is more than one response variable.

16
Statistical Methods
One-way ANOVA

One-way analysis of variance (abbreviated one-way ANOVA) is a technique that can be used to
compare means of two or more samples (using the F distribution).
This technique can be used only for numerical response data, the "Y", usually one variable, and
numerical or (usually) categorical input data, the "X", always one variable, hence "one-way".
The one-way ANOVA is used to test for differences among at least three groups, since the two-group
case can be covered by a t-test.
When there are only two means to compare, the t-test and the F-test are equivalent; the relation
between ANOVA and t is given by F = t2.
An extension of one-way ANOVA is two-way analysis of variance that examines the influence of two
different categorical independent variables on one dependent variable.

17
Statistical Methods
Multivariate Analysis of Variance (MANOVA)

• Multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample


means.
• As a multivariate procedure, it is used when there are two or more dependent variables.
• Is often followed by significance tests involving individual dependent variables separately.
• MANOVA is a generalized form of univariate analysis of variance (ANOVA)
• It uses the covariance between outcome variables in testing the statistical significance of the
mean differences.

18
Statistical Methods
THANK YOU

You might also like