Unit I II III IV
Unit I II III IV
1 Introduction
The ....................
1. Descriptive Statistics
3. Correlation
4. Regression Analysis
7. Chi-Square Test
Each of these statistical procedures helps in making informed decisions based on data.
They are widely used in various domains such as healthcare, finance, social sciences, and
marketing to understand relationships, test hypotheses, and make predictions.
The representation of data makes complex statistical results easier to interpret, communi-
cate, and apply to decision-making processes. It helps transform raw data into meaningful
information through visual and numerical formats.
3 Normal Distribution
The normal distribution is a bell-shaped, symmetric probability distribution that is widely used
in statistics. It is also known as the Gaussian distribution. The key properties of a normal
distribution are:
• Mean (µ): The peak of the bell curve occurs at the mean. The data is symmetrically
distributed around the mean, which means that most of the values cluster around it.
• Standard deviation (σ): This determines the spread or width of the bell curve. A
smaller standard deviation results in a narrow curve, while a larger standard deviation
spreads the curve out.
The probability density function (PDF) of a normal distribution is given by:
1 −(x − µ)2
f (x) = √ exp (1)
σ 2π 2σ 2
Where:
• µ is the mean
Some of the important properties of the normal distribution are listed below:
• There should be exactly half of the values are to the right of the centre and exactly half
of the values are to the left of the centre.
• The normal distribution should be defined by the mean and standard deviation.
• The normal distribution curve must have only one peak. (i.e., Unimodal)
• The curve approaches the x-axis, but it never touches, and it extends farther away from
the mean.
4 Measure of Relatioship
There are populatios having oe variable i.e. univariate similarly having bivariate and multi-
variate depending on the number of variable of study. A scatter plot is the useful graphical
representation to have some approximate idea of relationship between two variables.
5 Covariance
In a bivariate population, where two variables are represented byX and Y and the paired obser-
varions are represented by (x1 , y1 ), (x2 , y2 ).....(xn , yn ) thenthecovariancebetweeX andY is
n
1X
Cov(X, Y ) = σXY = (xi − x̄)(yi − ȳ) (2)
n
i=0
• Covariance can take any value from -∞. Negative values indicates negative relationship
and positive values indicates positive relationship. In case of zero covariance indicates
norelationship or some no-linear relationship.
• r = +1: Perfect positive linear relationship (as one variable increases, the other increases
proportionally).
• r = -1: Perfect negative linear relationship (as one variable increases, the other decreases
proportionally).
Sampling simple random sampling, stratified random sampling, Systematic sampling. Testing
of Hypothesis tests. χ(Chi-square), t and F -tests; Analysis of Variance; Covariance; Principal
component analysis, Experimental design: Completely randomized block design, Randomized
block design, Latin square design.
7 What is a hypothesis?
A hypothesis is a testable statement that provides an explanation for observed phenomena or
predicts the outcome of an experiment. It is often framed as an ”if-then” statement, indicating
the expected relationship between variables. For instance, ”If the temperature increases, then
the rate of photosynthesis will increase” illustrates a cause-and-effect relationship
Scientific Definition: A scientific hypothesis is defined as a tentative, testable statement
that explains a phenomenon in the natural world. It is formulated based on prior knowledge,
observations, and existing theories. Importantly, a hypothesis must be structured in such a way
that it can be supported or refuted through experimentation or observation, embodying the
principles of testability and falsifiability.
7.1.1 Testability
A hypothesis must be testable, meaning it should be possible to demonstrate its validity through
empirical evidence. This characteristic ensures that the hypothesis can be supported or refuted
based on observation or experimentation.
The language used in a hypothesis must be clear and precise. Ambiguities can lead to misinter-
pretation and unreliable conclusions. A well-formulated hypothesis articulates the relationship
between variables in straightforward terms, making it comprehensible for researchers and stake-
holders alike.
7.1.3 Specificity
A good hypothesis should be specific, focusing on particular variables and their relationships.
This specificity allows researchers to design targeted experiments and collect relevant data,
enhancing the reliability of the results.
7.1.4 Falsifiability
A hypothesis must be falsifiable, meaning there should be a possibility to prove it wrong. This
characteristic is essential for scientific inquiry, as it allows for the testing of predictions against
actual observations.
7.1.5 Simplicity
While a hypothesis should be simple, this simplicity does not diminish its significance or com-
plexity in terms of implications. A straightforward hypothesis facilitates understanding and
communication among researchers.
Hypotheses should be grounded in existing knowledge or observations. They often arise from
previous studies, theories, or general patterns observed in data, ensuring that they are relevant
and informed by prior research.
An effective hypothesis should allow for additional testing and exploration. It should not only
address the current research question but also open avenues for future investigations.
If the hypothesis is relational, it must clearly state the expected relationship between indepen-
dent and dependent variables. For instance, it might predict how changes in one variable affect
another.
Some hypotheses may specify the direction of the relationship (directional hypotheses), while
others may simply indicate that a relationship exists without specifying its nature (non-directional
hypotheses). This characteristic helps clarify the research focus.
A good hypothesis should align with known facts and established scientific principles. It should
not contradict existing laws of nature or widely accepted theories unless there is strong evidence
to suggest otherwise.
Conclusion: A well-constructed hypothesis is essential for guiding scientific research effec-
tively. Its characteristics-testability, clarity, specificity, falsifiability, simplicity, empirical ground-
ing, scope for further testing, relational clarity, directionality, and consistency with established
facts-collectively contribute to its reliability and utility in advancing knowledge through system-
atic investigation.
8 Testing of Hypothesis tests
Hypothesis testing is a fundamental statistical method used to evaluate assumptions about
population parameters based on sample data. This process involves several critical steps and
concepts that guide researchers in determining the validity of their hypotheses.
Hypothesis testing is a key component of statistical inference, which allows researchers to
draw conclusions about populations based on sample data. It relies on probability theory to
quantify the uncertainty associated with sampling.
Hypothesis testing involves making an initial assumption about a population parameter (the
null hypothesis) and then using sample data to assess the validity of this assumption against an
alternative hypothesis.
If we are to compare method A with method B about its superiority and if we proceed on the
assumption that both methods are equally good, then this assumption is termed as the null
hypothesis. That means there is no effect or difference (e.g., the population mean equals a
specific value).
Suppose we want to test the hypothesis that the population mean (µ) is equal to the hypoth-
esised mean (µH0 ) = 100. Then we would say that the null hypothesis is that the population
mean is equal to the hypothesised mean and expressed as :
H0 : µ = µH0 = 100
However, we may think that the method A is superior or the method B is inferior, we are
then stating what is termed as alternative hypothesis. This indicates that there is an effect
or difference (e.g., the population mean does not equal that value). If our sample results do
not support this null hypothesis, we should conclude that something else is true. What we
conclude rejecting the null hypothesis is known as alternative hypothesis. In other words, the
set of alternatives to the null hypothesis is referred to as the alternative hypothesis. If we accept
H0 , then we are rejecting Ha and if we reject H0 , then we are accepting Ha . For H0 : µ = µH0
= 100 , we may consider three possible alternative hypotheses as in Table. 2:
[Note: If a hypothesis is of the type µ = µH0 , then we call such a hypothesis as simple (or
specific) hypothesis but if it is 0 of the type µ 6= µH0 or µ > µH0 or µ > µH0 , then we call it a
composite (or nonspecific) hypothesis.]
The null hypothesis and the alternative hypothesis are chosen before the sample is drawn
(the researcher must avoid the error of deriving hypotheses from the data that he collects and
then testing the hypotheses from the same data). In the choice of null hypothesis, the following
considerations are usually kept in view:
Figure 2: Table-2
(a) Alternative hypothes is usually the one which one wishes to prove and the null hypothes
is the one which one wishes to disprove. Thus, a null hypothesis represents the hypothesis we
are trying to reject, and alternative hypothesis represents all other possibilities.
(b) If the rejection of a certain hypothesis when it is actually true involves great risk, it is
taken as null hypothesis because then the probability of rejecting it when it is true is α (the
level of significance) which is chosen very small.
(c) Null hypothesis should always be specific hypothesis i.e., it should not state about or
approximately a certain value.
Generally, in hypothesis testing we proceed on the basis of null hypothesis, keeping the
alternative hypothesis in view. Why so? The answer is that on the assumption that null
hypothesis is true, one can assign the probabilities to different possible sample results, but this
cannot be done if we proceed with the alternative hypothesis. Hence the use of null hypothesis
(at times also known as statistical hypothesis) is quite frequent.
The level of significance, often denoted as alpha (α), is a critical concept in hypothesis testing.
It represents the threshold for deciding whether to reject the null hypothesis. In other word,
it is the probablity that we make the errorto reeject the null hupothesis (H0 ) when the null
hypothesis is true.
The level of significance is set. specifically to increase or decrease theprobablity to reject H0
in error.
Small ∝ Less likely to incorrectly reject H0
Large ∝ More likely to incorrectly reject H0
It is always some percentage (usually 5%) which should be chosen wit great care, thought
and reason. In case we take the significance level at 5 per cent, then this implies that H0 will
be rejected. In other words, the 5 percent level of significance means that researcher is willing
to take as much as a 5 percent risk of rejecting the null hypothesis when it (H0 ) happens to be
Figure 3: The level of significance
true.
A decision rule is a set of criteria used to determine whether to reject or fail to reject the null
hypothesis based on the test statistic. It typically involves comparing the test statistic to a
critical value.
Critical value: This is a value determined from the distribution of the test statistic under
the null hypothesis. If the test statistic is more extreme than the critical value, we reject the
null hypothesis.
Example: Suppose we want to test if a new fertilizer increases crop yields. We can set up
the following hypotheses:
H0 : The new fertilizer has no effect on crop yields.
Ha : The new fertilizer increases crop yields.
We collect data on crop yields from plots treated with the new fertilizer and plots treated
with a control fertilizer. We can use a t-test to determine if the difference in mean yields is
statistically significant.
The decision rule might be:
If the t-statistic is greater than the critical value at a significance level of 0.05, reject the
null hypothesis. Otherwise, fail to reject the null hypothesis.
If we reject the null hypothesis, we can conclude that there is evidence to support the claim
that the new fertilizer increases crop yields.
In summary, decision rules and hypothesis testing are essential tools for making informed de-
cisions based on data. By carefully formulating hypotheses and applying appropriate statistical
tests, we can draw meaningful conclusions about populations.
In hypothesis testing, there are two types of errors that can occur:
• Type I Error
A Type I error occurs when the null hypothesis is falsely rejected. This means we
conclude that there is a significant difference or effect when, in reality, there is none. It is
also known as a false positive.
Example: A medical test incorrectly indicates that a person has a disease when they
actually do not.
• Type II Error
A Type II error occurs when the null hypothesis is falsely accepted. This means we
fail to conclude that there is a significant difference or effect when, in reality, there is one.
It is also known as a false negative.
Example: A medical test fails to detect a disease that a person actually has.
The relationship between Type I and Type II errors is inverse. Increasing the probability of
detecting a true effect (reducing the Type II error rate) often increases the probability of falsely
rejecting a true null hypothesis (increasing the Type I error rate).
• Significance level (α): A lower α reduces the probability of a Type I error but increases
the probability of a Type II error.
• Sample size: A larger sample size reduces the probability of both Type I and Type II
errors.
• Effect size: A larger effect size (difference between the null and alternative hypotheses)
makes it easier to detect a true effect, reducing the probability of a Type II error.
• Power of the test: The power of a test is the probability of correctly rejecting a false
null hypothesis. A higher power reduces the probability of a Type II error.
Choosing the appropriate balance between Type I and Type II errors depends on the
specific context and the consequences of each type of error. For example, in medical
testing, a Type I error (false positive) might lead to unnecessary treatments, while a Type
II error (false negative) might miss a serious disease.
A two-tailed test rejects the null hypothesis if, say, the sample mean is significantly higher or
lower than the hypothesised value of the mean of the population. Such a test is appropriate
when the null hypothesis is some specified value and the alternative hypothesis is a value not
equal to the specified value of the null hypothesis.
Figure 4: Two Tailed Test
8.2 FLOW DIAGRAM FOR HYPOTHESIS TESTING
The above stated general procedure for hypothesis testing can also be depicted in the from of a
flow- chart for better understanding as shown in the Fig. 5
Chi-squre test is a statistical measure used in the context of sampling analysis for comparing
a variance to a theoretical variance. The chi-square test is a statistical test commonly used to
analyze categorical data. It’s designed to determine if there is a significant difference between
observed and expected frequencies in one or more categories.
8.3 Types of Chi-Square Tests
• Chi-Square Test of Independence: This test assesses whether two categorical variables
are independent of each other. For instance, it can evaluate if there is a relationship
between gender and voting preference.?
While useful, chi-square tests have limitations: They require large sample sizes for validity.
They cannot be used with small expected frequencies; alternative methods like Fisher’s Exact
Test may be more appropriate in such cases. The results can be sensitive to sample size; even
trivial associations may appear significant with large samples. In summary, the chi-square
test is an essential tool for analyzing categorical data, allowing researchers to draw conclusions
about relationships and distributions within their datasets while adhering to specific statistical
assumptions and methodologies.
t-Test
• A t-test evaluates whether the means of two samples are statistically different from each
other. It uses the t-statistic, which follows a t-distribution under the null hypothesis that
assumes no difference between the group means. The test is applicable when data sets are
normally distributed and when sample sizes are typically less than30.
• One-Sample T-Test: Compares the mean of a single sample to a known value (e.g.,
population mean). Example: Testing if the average height of students in a class is greater
than a specified height.
• Paired Sample T-Test: Compares means from the same group at different times or
under different conditions. Example: Measuring blood pressure before and after treatment
in the same patients.
• x̄ = sample mean
• n = sample size
• Shape: The t-distribution is symmetric about zero and has heavier tails compared to
the normal distribution. This reflects greater uncertainty in estimates derived from small
samples.
• Degrees of Freedom (DF): The shape of the t-distribution changes based on the degrees
of freedom, which is typically calculated as n-1 for a one-sample t-test, where n is the
sample size. As DF increases, the t-distribution approaches a normal distribution.
F-test
The F-test is a statistical method used to compare the variances of two populations or
samples to determine if they are significantly different from each other. It is particularly useful
in various applications, including analysis of variance (ANOVA) and regression analysis.
8.8 Key Features of the F-Test
8.8.1 Purpose:
• The F-test assesses whether the variances of two populations are equal, which is a crucial
assumption in many statistical analyses.
8.8.2 Hypotheses
• Null Hypothesis (H0 ): The variances of the two populations are equal (i.e.,
• bf Alternative Hypothesis (H1 ): The variances of the two populations are not equal (i.e.,
for a two-tailed test).
Unit - IV
Integration: Trapezoidal, Simpson, Weddle’s and Gaussian Quadrature methods;
Differentiation: Numerical derivative; Root Finding: Bisection and Newton-Raphson Method;
Differential Equation: (1st and 2nd order): Euler’s method, Runge-Kutta Method (4th order
algorithm), Least square fitting of a set of points to a straight line, Quadratic equation.
9 Integration
9.1 Trapezoidal rule
The trapezoidal rule, also known as the trapezium rule, is a method used in mathematics to
approximate the definite integral in numerical analysis. By splitting a curve into tiny trapezoids,
the trapezoidal rule is an integration technique that determines the area under a curve. The
area under the curve can be found by adding up all of the little trapezoids’ areas. In other word,
it is an integration rule used to calculate the area under a curve by dividing the curve into small
trapezoids.
By approximating the region under the graph of the function f(x) as a trapezoid and com-
Rb
puting its area, the trapezoidal rule is used to solve the definite integral of the form a f (x)dx.
Instead of using rectangles to calculate the area under a curve, the trapezoidal rule divides the
overall area into little trapezoids.
This rule is used for approximating the definite integrals where it uses the linear approximations
of the functions. The trapezoidal rule takes the average of the left and the right sum.
Let y = f(x) be continuous on [a, b]. We divide the interval [a, b] into n equal subintervals,
each of width, h = (b - a)/n,
such that a = x0 < x1 < x2 < .... < xn = b
h
Area = 2 [y0 + 2(y1 + y2 + y3 + ..... + yn−1 ) + yn ]
where, y0 , y1 , y2 .... are the values of function at x = 1, 2, 3, .... respectively as shown in the
Fig. 7.
9.1.2 Derivation
1. Define the Integral: Consider a continuous function f(x) defined on the interval [a,b].
We want to approximate the integral:
Rb
I = a f (x)dx
2. Divide the Interval: Divide the interval [a,b] into n equal subintervals. The width of
each subinterval is given by:
b−a
h= n
Figure 7: trapizoidal
3. Approximate Each Subinterval: The area under f (x) from xi to xi+1 can be approx-
imated by the area of a trapezoid:
4. Sum Up All Trapezoids: To find the total area under the curve from a to b, sum up
all the trapezoidal areas: I ≈ n−1
P Pn−1 h
i=0 Ai = i=0 2 f (xi ) + f (xi+1 )
6. Final Formula: Thus, the final formula for the Trapezoidal Rule becomes:
Pn−1
I = b−a
2n [f (a) + 2 i=1 f (a + ih) + f (b)
The Trapezoidal Rule is a straightforward yet powerful technique for numerical integration,
providing a way to estimate areas under curves by approximating them with trapezoids.
Its effectiveness depends on how well the linear segments fit the actual curve of f(x). For
smoother functions, this method yields reasonably accurate results, especially as n, the
number of subdivisions, increases.
9.1.3 Examples
1. Using Trapezoidal Rule Formula find the area under the curve y = x2 between x = 0 and
x = 4 using the step size of 1. Calculate the same uisng step size 0.5 and find out the
percentage difference in area between these two cases.
Solution:
Given: y = x2
h=1
Find the values of ’y’ for different values of ’x’ by putting the value of ’x’ in the equation
y = x2
x 0 1 2 3 4
y=x2 0 1 4 9 16
To derive Simpson’s 1/3 Rule, we will explore how this numerical integration method approx-
imates the area under a curve by fitting a quadratic polynomial to the function over a
specified interval. This rule is particularly effective for functions that can be well-approximated
by parabolas.
Simpson’s 1/3 rule assumes 3 equispaced data/interpolation/integration points
(endpoints f0 and f2 , and the midpoint)
• The integration rule is based on approximating using Lagrange quadratic (second degree)
interpolation.
• The sub-interval is defined as [x0 , x2 ] and the integration point to integration point spacing
x2 −x0
equals, h = 2 .
R x2 R x2
I= x0 f (x) = x0 g(x)dx
R x2 =2h
I= x0 =0 {f0 [I] + f1 [II] + f2 [III]}dx
h
I= 3 [f0 + 4f1 + f2 ]
Simpson’s 3/8 Rule is a numerical integration method used to approximate the value of a
definite integral. It is based on cubic interpolation and is generally more accurate than the
simpler Simpson’s 1/3 Rule, especially for functions that can be well-approximated by cubic
polynomials.