0% found this document useful (0 votes)
15 views23 pages

Unit I II III IV

Uploaded by

nanditasahoo9938
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views23 pages

Unit I II III IV

Uploaded by

nanditasahoo9938
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Research Methodelogy (MSc, 3rd Semester )

Dr. Kishora Nayak


Assistant Professor (Stage-I)
Department of Physics, Panchayat College, SU, Bargarh 768028, Odisha,India

26th June, 2024

Unit-I: Application of statistical concept / procedures, data, diagrammatic, rep-


resentation of data, probability, Measure of central tendency, Measures of disper-
sion, Skewness and kurtosis. Normal distribution: Simple correlation, multiple
correlation, regression analysis.

1 Introduction
The ....................

2 Application of statistical concept and


Statistical concepts are applied in a wide range of fields, from business and economics to
medicine, engineering, and social sciences. Here are a few examples of how statistical concepts
are applied in various domains. In the context of the statistical procedures, representation
of data refers to the various ways in which data is organized, displayed, and communicated to
help understand patterns, relationships, and key insights. The choice of how data is represented
depends on the nature of the data and the statistical method being applied. Here’s how data
can be represented in the different contexts.

1. Descriptive Statistics

• Application: Summarizing Data


– Descriptive statistics, such as mean, median, mode, variance, and standard de-
viation, are used to summarize and describe the main features of a dataset. The
mean (average) and standard deviation are used to represent the central ten-
dency and variability of a dataset. These values are typically presented as simple
numbers.
– Example: A company might use descriptive statistics to understand customer
satisfaction ratings by calculating the average score and identifying the most
common ratings.
2. Inferential Statistics

• Application: Drawing Conclusions from Samples


– Inferential statistics allows researchers to make predictions or inferences about a
population based on a sample of data.
– Example: In clinical trials, inferential statistics are used to determine whether
a new drug is more effective than a placebo based on a sample of patients.

3. Correlation

• Application: Understanding Relationships


– Correlation measures the strength and direction of the relationship between two
variables. The correlation can be represented via Scatter Plots and Line of Best
Fit.
– Example: A marketing firm might examine the correlation between advertising
spending and sales to assess how closely the two variables are related.

4. Regression Analysis

• Application: Predictive Modeling


– Regression analysis estimates the relationships between variables and is often
used to make predictions. A line of best fit (regression line) can be added to
show the predicted relationship.
– Example: A real estate company could use regression analysis to predict house
prices based on variables like location, size, and age of the property.

5. Time Series Analysis

• Application: Forecasting Future Events


– Time series analysis analyzes data points collected or recorded at specific time
intervals to forecast future trends. Time series data is typically represented using
line graphs, where the x-axis represents time and the y-axis represents the variable
being measured.
– Example: Economists use time series analysis to forecast future economic indi-
cators like unemployment rates, inflation, or stock prices.

6. ANOVA (Analysis of Variance)

• Application: Comparing Multiple Groups


– ANOVA is used to compare the means of three or more groups to determine if
they are statistically different. ANOVA results are often represented with box
plots or bar charts, showing the differences in the means of multiple groups.
– Example: In agricultural studies, ANOVA might be used to test whether different
fertilizer treatments result in significantly different crop yields.

7. Chi-Square Test

• Application: Testing Relationships between Categorical Variables


– The Chi-square test is used to determine if there is an association between two
categorical variables. In a chi-square test, data is typically represented in a con-
tingency table, which shows the frequency distribution of two categorical vari-
ables.
– Example: A sociologist might use a chi-square test to investigate whether edu-
cation level is related to voting preference.

Each of these statistical procedures helps in making informed decisions based on data.
They are widely used in various domains such as healthcare, finance, social sciences, and
marketing to understand relationships, test hypotheses, and make predictions.

• Common Forms of Data Representation:


– Tables: Organizing raw data or summary statistics in rows and columns for easy
comparison.
– Charts/Graphs: Visual tools like bar charts, pie charts, histograms, and line
graphs to show patterns or trends.
– Box Plots: Display the spread of data points, showing medians, quartiles, and
outliers.
– Heatmaps: Use color to represent the magnitude of values, often used in large
datasets.

The representation of data makes complex statistical results easier to interpret, communi-
cate, and apply to decision-making processes. It helps transform raw data into meaningful
information through visual and numerical formats.

3 Normal Distribution
The normal distribution is a bell-shaped, symmetric probability distribution that is widely used
in statistics. It is also known as the Gaussian distribution. The key properties of a normal
distribution are:

• Mean (µ): The peak of the bell curve occurs at the mean. The data is symmetrically
distributed around the mean, which means that most of the values cluster around it.

• Standard deviation (σ): This determines the spread or width of the bell curve. A
smaller standard deviation results in a narrow curve, while a larger standard deviation
spreads the curve out.
The probability density function (PDF) of a normal distribution is given by:

1  −(x − µ)2 
f (x) = √ exp (1)
σ 2π 2σ 2
Where:

• µ is the mean

• σ is the standard deviation

Figure 1: Gaussian Distribution

3.1 Normal Distribution Properties

Some of the important properties of the normal distribution are listed below:

• In a normal distribution, Mean = Median = Mode.

• The total area under the curve should be equal to 1.

• The normally distributed curve should be symmetric at the centre.

• There should be exactly half of the values are to the right of the centre and exactly half
of the values are to the left of the centre.

• The normal distribution should be defined by the mean and standard deviation.

• The normal distribution curve must have only one peak. (i.e., Unimodal)

• The curve approaches the x-axis, but it never touches, and it extends farther away from
the mean.
4 Measure of Relatioship
There are populatios having oe variable i.e. univariate similarly having bivariate and multi-
variate depending on the number of variable of study. A scatter plot is the useful graphical
representation to have some approximate idea of relationship between two variables.

5 Covariance
In a bivariate population, where two variables are represented byX and Y and the paired obser-
varions are represented by (x1 , y1 ), (x2 , y2 ).....(xn , yn ) thenthecovariancebetweeX andY is

n
1X
Cov(X, Y ) = σXY = (xi − x̄)(yi − ȳ) (2)
n
i=0

5.1 Properties of Covariance

• Cov(X+a, Y+b) = Cov(X, Y),where a, b are any costants.

• Cov(aX, bY) = abCov(X, Y)

• Covariance can take any value from -∞. Negative values indicates negative relationship
and positive values indicates positive relationship. In case of zero covariance indicates
norelationship or some no-linear relationship.

6 Simple Correlation or Karl Pearson coefficient of correlation


Simple correlation (also called Karl Pearson correlation coefficient or linear correlation coeffi-
cient) measures the strength and direction of a linear relationship between two variables (X and
Y). It is denoted by r and ranges from -1 to +1.

• r = +1: Perfect positive linear relationship (as one variable increases, the other increases
proportionally).

• r = -1: Perfect negative linear relationship (as one variable increases, the other decreases
proportionally).

• r = 0: No linear relationship between the variables.

The formula for the Pearson correlation coefficient is:


Pn
− x̄)(yi − ȳ)
i=1 (xi Cov(X, Y ) σXY
r = pPn pPn = = (3)
i=1 (xi − x̄)
2
i=1 (yi − ȳ)
2 ST DEV (X).ST DEV (Y ) σX .σY
Unit - II

Sampling simple random sampling, stratified random sampling, Systematic sampling. Testing
of Hypothesis tests. χ(Chi-square), t and F -tests; Analysis of Variance; Covariance; Principal
component analysis, Experimental design: Completely randomized block design, Randomized
block design, Latin square design.

7 What is a hypothesis?
A hypothesis is a testable statement that provides an explanation for observed phenomena or
predicts the outcome of an experiment. It is often framed as an ”if-then” statement, indicating
the expected relationship between variables. For instance, ”If the temperature increases, then
the rate of photosynthesis will increase” illustrates a cause-and-effect relationship
Scientific Definition: A scientific hypothesis is defined as a tentative, testable statement
that explains a phenomenon in the natural world. It is formulated based on prior knowledge,
observations, and existing theories. Importantly, a hypothesis must be structured in such a way
that it can be supported or refuted through experimentation or observation, embodying the
principles of testability and falsifiability.

7.1 Characteristics of a Hypothesis

7.1.1 Testability

A hypothesis must be testable, meaning it should be possible to demonstrate its validity through
empirical evidence. This characteristic ensures that the hypothesis can be supported or refuted
based on observation or experimentation.

7.1.2 Clarity and Precision

The language used in a hypothesis must be clear and precise. Ambiguities can lead to misinter-
pretation and unreliable conclusions. A well-formulated hypothesis articulates the relationship
between variables in straightforward terms, making it comprehensible for researchers and stake-
holders alike.

7.1.3 Specificity

A good hypothesis should be specific, focusing on particular variables and their relationships.
This specificity allows researchers to design targeted experiments and collect relevant data,
enhancing the reliability of the results.

7.1.4 Falsifiability

A hypothesis must be falsifiable, meaning there should be a possibility to prove it wrong. This
characteristic is essential for scientific inquiry, as it allows for the testing of predictions against
actual observations.

7.1.5 Simplicity

While a hypothesis should be simple, this simplicity does not diminish its significance or com-
plexity in terms of implications. A straightforward hypothesis facilitates understanding and
communication among researchers.

7.1.6 Empirical Basis

Hypotheses should be grounded in existing knowledge or observations. They often arise from
previous studies, theories, or general patterns observed in data, ensuring that they are relevant
and informed by prior research.

7.1.7 Scope for Further Testing

An effective hypothesis should allow for additional testing and exploration. It should not only
address the current research question but also open avenues for future investigations.

7.1.8 Relationship Between Variables

If the hypothesis is relational, it must clearly state the expected relationship between indepen-
dent and dependent variables. For instance, it might predict how changes in one variable affect
another.

7.1.9 Directionality (if applicable)

Some hypotheses may specify the direction of the relationship (directional hypotheses), while
others may simply indicate that a relationship exists without specifying its nature (non-directional
hypotheses). This characteristic helps clarify the research focus.

7.1.10 Consistency with Established Facts

A good hypothesis should align with known facts and established scientific principles. It should
not contradict existing laws of nature or widely accepted theories unless there is strong evidence
to suggest otherwise.
Conclusion: A well-constructed hypothesis is essential for guiding scientific research effec-
tively. Its characteristics-testability, clarity, specificity, falsifiability, simplicity, empirical ground-
ing, scope for further testing, relational clarity, directionality, and consistency with established
facts-collectively contribute to its reliability and utility in advancing knowledge through system-
atic investigation.
8 Testing of Hypothesis tests
Hypothesis testing is a fundamental statistical method used to evaluate assumptions about
population parameters based on sample data. This process involves several critical steps and
concepts that guide researchers in determining the validity of their hypotheses.
Hypothesis testing is a key component of statistical inference, which allows researchers to
draw conclusions about populations based on sample data. It relies on probability theory to
quantify the uncertainty associated with sampling.
Hypothesis testing involves making an initial assumption about a population parameter (the
null hypothesis) and then using sample data to assess the validity of this assumption against an
alternative hypothesis.

8.1 Basic concepts concerning testing of hypothesis

8.1.1 Null hypothesis (H0 ) and alternative hypothesis (Ha ):

If we are to compare method A with method B about its superiority and if we proceed on the
assumption that both methods are equally good, then this assumption is termed as the null
hypothesis. That means there is no effect or difference (e.g., the population mean equals a
specific value).
Suppose we want to test the hypothesis that the population mean (µ) is equal to the hypoth-
esised mean (µH0 ) = 100. Then we would say that the null hypothesis is that the population
mean is equal to the hypothesised mean and expressed as :

H0 : µ = µH0 = 100

However, we may think that the method A is superior or the method B is inferior, we are
then stating what is termed as alternative hypothesis. This indicates that there is an effect
or difference (e.g., the population mean does not equal that value). If our sample results do
not support this null hypothesis, we should conclude that something else is true. What we
conclude rejecting the null hypothesis is known as alternative hypothesis. In other words, the
set of alternatives to the null hypothesis is referred to as the alternative hypothesis. If we accept
H0 , then we are rejecting Ha and if we reject H0 , then we are accepting Ha . For H0 : µ = µH0
= 100 , we may consider three possible alternative hypotheses as in Table. 2:
[Note: If a hypothesis is of the type µ = µH0 , then we call such a hypothesis as simple (or
specific) hypothesis but if it is 0 of the type µ 6= µH0 or µ > µH0 or µ > µH0 , then we call it a
composite (or nonspecific) hypothesis.]
The null hypothesis and the alternative hypothesis are chosen before the sample is drawn
(the researcher must avoid the error of deriving hypotheses from the data that he collects and
then testing the hypotheses from the same data). In the choice of null hypothesis, the following
considerations are usually kept in view:
Figure 2: Table-2

(a) Alternative hypothes is usually the one which one wishes to prove and the null hypothes
is the one which one wishes to disprove. Thus, a null hypothesis represents the hypothesis we
are trying to reject, and alternative hypothesis represents all other possibilities.
(b) If the rejection of a certain hypothesis when it is actually true involves great risk, it is
taken as null hypothesis because then the probability of rejecting it when it is true is α (the
level of significance) which is chosen very small.
(c) Null hypothesis should always be specific hypothesis i.e., it should not state about or
approximately a certain value.
Generally, in hypothesis testing we proceed on the basis of null hypothesis, keeping the
alternative hypothesis in view. Why so? The answer is that on the assumption that null
hypothesis is true, one can assign the probabilities to different possible sample results, but this
cannot be done if we proceed with the alternative hypothesis. Hence the use of null hypothesis
(at times also known as statistical hypothesis) is quite frequent.

8.1.2 The level of significance

The level of significance, often denoted as alpha (α), is a critical concept in hypothesis testing.
It represents the threshold for deciding whether to reject the null hypothesis. In other word,
it is the probablity that we make the errorto reeject the null hupothesis (H0 ) when the null
hypothesis is true.
The level of significance is set. specifically to increase or decrease theprobablity to reject H0
in error.
Small ∝ Less likely to incorrectly reject H0
Large ∝ More likely to incorrectly reject H0
It is always some percentage (usually 5%) which should be chosen wit great care, thought
and reason. In case we take the significance level at 5 per cent, then this implies that H0 will
be rejected. In other words, the 5 percent level of significance means that researcher is willing
to take as much as a 5 percent risk of rejecting the null hypothesis when it (H0 ) happens to be
Figure 3: The level of significance

true.

8.1.3 Decision rule or test of hypothesis

A decision rule is a set of criteria used to determine whether to reject or fail to reject the null
hypothesis based on the test statistic. It typically involves comparing the test statistic to a
critical value.
Critical value: This is a value determined from the distribution of the test statistic under
the null hypothesis. If the test statistic is more extreme than the critical value, we reject the
null hypothesis.
Example: Suppose we want to test if a new fertilizer increases crop yields. We can set up
the following hypotheses:
H0 : The new fertilizer has no effect on crop yields.
Ha : The new fertilizer increases crop yields.
We collect data on crop yields from plots treated with the new fertilizer and plots treated
with a control fertilizer. We can use a t-test to determine if the difference in mean yields is
statistically significant.
The decision rule might be:
If the t-statistic is greater than the critical value at a significance level of 0.05, reject the
null hypothesis. Otherwise, fail to reject the null hypothesis.
If we reject the null hypothesis, we can conclude that there is evidence to support the claim
that the new fertilizer increases crop yields.
In summary, decision rules and hypothesis testing are essential tools for making informed de-
cisions based on data. By carefully formulating hypotheses and applying appropriate statistical
tests, we can draw meaningful conclusions about populations.

8.1.4 Type I and Type II errors

In hypothesis testing, there are two types of errors that can occur:
• Type I Error
A Type I error occurs when the null hypothesis is falsely rejected. This means we
conclude that there is a significant difference or effect when, in reality, there is none. It is
also known as a false positive.
Example: A medical test incorrectly indicates that a person has a disease when they
actually do not.

• Type II Error
A Type II error occurs when the null hypothesis is falsely accepted. This means we
fail to conclude that there is a significant difference or effect when, in reality, there is one.
It is also known as a false negative.

Example: A medical test fails to detect a disease that a person actually has.

The relationship between Type I and Type II errors is inverse. Increasing the probability of
detecting a true effect (reducing the Type II error rate) often increases the probability of falsely
rejecting a true null hypothesis (increasing the Type I error rate).

Factors affecting Type I and Type II error rates:

• Significance level (α): A lower α reduces the probability of a Type I error but increases
the probability of a Type II error.

• Sample size: A larger sample size reduces the probability of both Type I and Type II
errors.

• Effect size: A larger effect size (difference between the null and alternative hypotheses)
makes it easier to detect a true effect, reducing the probability of a Type II error.

• Power of the test: The power of a test is the probability of correctly rejecting a false
null hypothesis. A higher power reduces the probability of a Type II error.
Choosing the appropriate balance between Type I and Type II errors depends on the
specific context and the consequences of each type of error. For example, in medical
testing, a Type I error (false positive) might lead to unnecessary treatments, while a Type
II error (false negative) might miss a serious disease.

8.1.5 Two-tailed (and One-tailed tests):

A two-tailed test rejects the null hypothesis if, say, the sample mean is significantly higher or
lower than the hypothesised value of the mean of the population. Such a test is appropriate
when the null hypothesis is some specified value and the alternative hypothesis is a value not
equal to the specified value of the null hypothesis.
Figure 4: Two Tailed Test
8.2 FLOW DIAGRAM FOR HYPOTHESIS TESTING

The above stated general procedure for hypothesis testing can also be depicted in the from of a
flow- chart for better understanding as shown in the Fig. 5

Chi-Square (χ2 ) Test

Chi-squre test is a statistical measure used in the context of sampling analysis for comparing
a variance to a theoretical variance. The chi-square test is a statistical test commonly used to
analyze categorical data. It’s designed to determine if there is a significant difference between
observed and expected frequencies in one or more categories.
8.3 Types of Chi-Square Tests

• Chi-Square Test of Independence: This test assesses whether two categorical variables
are independent of each other. For instance, it can evaluate if there is a relationship
between gender and voting preference.?

• Chi-Square Test of Goodness of Fit: This test determines if the distribution of a


categorical variable matches an expected distribution. For example, it can be used to
test if a die is fair by comparing the observed frequency of each face with the expected
frequency.

8.3.1 Assumptions of the Chi-Square Test

– Independence: The observations must be independent of each other.


– Expected frequencies: No expected frequency should be less than 5.

8.3.2 Formula and Calculation

The chi-square statistic is calculated using the formula:


P (Oi −Ei )2
χ2 = Ei
where: Oi = observed frequency Ei = expected frequency

8.3.3 When to Use a Chi-Square Test?

A chi-square test is appropriate under certain conditions:


The data consists of categorical variables.
The sample is randomly selected.
Each category has at least five expected observations to ensure reliability.
The observations are independent.
Figure 5: FLOW DIAGRAM FOR HYPOTHESIS TESTING
8.3.4 Limitations

While useful, chi-square tests have limitations: They require large sample sizes for validity.
They cannot be used with small expected frequencies; alternative methods like Fisher’s Exact
Test may be more appropriate in such cases. The results can be sensitive to sample size; even
trivial associations may appear significant with large samples. In summary, the chi-square
test is an essential tool for analyzing categorical data, allowing researchers to draw conclusions
about relationships and distributions within their datasets while adhering to specific statistical
assumptions and methodologies.

t-Test

• A t-test is a statistical method used to determine if there is a significant difference between


the means of two groups. It is particularly useful in hypothesis testing when the sample
sizes are small and the population variances are unknown. Here’s a detailed overview of
the t-test, including its types, assumptions, and applications.

• A t-test evaluates whether the means of two samples are statistically different from each
other. It uses the t-statistic, which follows a t-distribution under the null hypothesis that
assumes no difference between the group means. The test is applicable when data sets are
normally distributed and when sample sizes are typically less than30.

• The t-distribution, also known as Student’s t-distribution, is a type of probability distribu-


tion that is symmetric and bell-shaped, similar to the normal distribution but with thicker
tails. This characteristic makes it particularly useful for smaller sample sizes where the
population variance is unknown.

8.4 Types of t-Tests

• One-Sample T-Test: Compares the mean of a single sample to a known value (e.g.,
population mean). Example: Testing if the average height of students in a class is greater
than a specified height.

• Independent Two-Sample T-Test: Compares the means of two independent groups.


Example: Comparing test scores between two different classes.

• Paired Sample T-Test: Compares means from the same group at different times or
under different conditions. Example: Measuring blood pressure before and after treatment
in the same patients.

8.5 Assumptions of T-Tests

• Data should be normally distributed.


Figure 6: t-distribution

• Samples should be independent (for independent t-tests).

• Homogeneity of variance (equal variances) for independent samples.

• The data should be continuous and measured on an interval or ratio scale

8.6 t-test formula

The t-value is calculated using the formula:


x̄−µ
t= √
s/ n
where:

• x̄ = sample mean

• µ = populationmean (under the null hypothesis)

• s = sample standard deviation

• n = sample size

8.7 Key Features of the t-Distribution

• Shape: The t-distribution is symmetric about zero and has heavier tails compared to
the normal distribution. This reflects greater uncertainty in estimates derived from small
samples.

• Degrees of Freedom (DF): The shape of the t-distribution changes based on the degrees
of freedom, which is typically calculated as n-1 for a one-sample t-test, where n is the
sample size. As DF increases, the t-distribution approaches a normal distribution.
F-test

The F-test is a statistical method used to compare the variances of two populations or
samples to determine if they are significantly different from each other. It is particularly useful
in various applications, including analysis of variance (ANOVA) and regression analysis.
8.8 Key Features of the F-Test

8.8.1 Purpose:

• The F-test assesses whether the variances of two populations are equal, which is a crucial
assumption in many statistical analyses.

• It can be one-tailed or two-tailed depending on the hypothesis being tested.

8.8.2 Hypotheses

• Null Hypothesis (H0 ): The variances of the two populations are equal (i.e.,

• bf Alternative Hypothesis (H1 ): The variances of the two populations are not equal (i.e.,
for a two-tailed test).
Unit - IV
Integration: Trapezoidal, Simpson, Weddle’s and Gaussian Quadrature methods;
Differentiation: Numerical derivative; Root Finding: Bisection and Newton-Raphson Method;
Differential Equation: (1st and 2nd order): Euler’s method, Runge-Kutta Method (4th order
algorithm), Least square fitting of a set of points to a straight line, Quadratic equation.

9 Integration
9.1 Trapezoidal rule

The trapezoidal rule, also known as the trapezium rule, is a method used in mathematics to
approximate the definite integral in numerical analysis. By splitting a curve into tiny trapezoids,
the trapezoidal rule is an integration technique that determines the area under a curve. The
area under the curve can be found by adding up all of the little trapezoids’ areas. In other word,
it is an integration rule used to calculate the area under a curve by dividing the curve into small
trapezoids.
By approximating the region under the graph of the function f(x) as a trapezoid and com-
Rb
puting its area, the trapezoidal rule is used to solve the definite integral of the form a f (x)dx.
Instead of using rectangles to calculate the area under a curve, the trapezoidal rule divides the
overall area into little trapezoids.

9.1.1 Formulation of Trapezoidal rule

This rule is used for approximating the definite integrals where it uses the linear approximations
of the functions. The trapezoidal rule takes the average of the left and the right sum.
Let y = f(x) be continuous on [a, b]. We divide the interval [a, b] into n equal subintervals,
each of width, h = (b - a)/n,
such that a = x0 < x1 < x2 < .... < xn = b
h
Area = 2 [y0 + 2(y1 + y2 + y3 + ..... + yn−1 ) + yn ]
where, y0 , y1 , y2 .... are the values of function at x = 1, 2, 3, .... respectively as shown in the
Fig. 7.

9.1.2 Derivation

1. Define the Integral: Consider a continuous function f(x) defined on the interval [a,b].
We want to approximate the integral:
Rb
I = a f (x)dx

2. Divide the Interval: Divide the interval [a,b] into n equal subintervals. The width of
each subinterval is given by:
b−a
h= n
Figure 7: trapizoidal

This creates points x0 , x1 , x2 ....., xn where:, h = a + ih (i = 0, 1, 2, ..., n)

3. Approximate Each Subinterval: The area under f (x) from xi to xi+1 can be approx-
imated by the area of a trapezoid:

• The height of the trapezoid at xi and xi+1 is f (xi ), f (xi+1 ), respectively


h
• The area of each trapezoid is given by: Ai = 2 f (xi ) + f (xi+1 )

4. Sum Up All Trapezoids: To find the total area under the curve from a to b, sum up
all the trapezoidal areas: I ≈ n−1
P Pn−1 h
i=0 Ai = i=0 2 f (xi ) + f (xi+1 )

5. Rearranging the Summation: This can be rewritten as:


I = h2 f (x0 ) + 2f (x1 ) + 2f (x2 ) + .... + 2f (xn−1 ) + f (xn )
 

6. Final Formula: Thus, the final formula for the Trapezoidal Rule becomes:
Pn−1
I = b−a
2n [f (a) + 2 i=1 f (a + ih) + f (b)

The Trapezoidal Rule is a straightforward yet powerful technique for numerical integration,
providing a way to estimate areas under curves by approximating them with trapezoids.
Its effectiveness depends on how well the linear segments fit the actual curve of f(x). For
smoother functions, this method yields reasonably accurate results, especially as n, the
number of subdivisions, increases.

9.1.3 Examples

1. Using Trapezoidal Rule Formula find the area under the curve y = x2 between x = 0 and
x = 4 using the step size of 1. Calculate the same uisng step size 0.5 and find out the
percentage difference in area between these two cases.
Solution:
Given: y = x2
h=1
Find the values of ’y’ for different values of ’x’ by putting the value of ’x’ in the equation
y = x2

x 0 1 2 3 4
y=x2 0 1 4 9 16

Area = (h/2) [y0 + yn + 2 (y1 + y2 + y3 + ..... + yn−1 )]


= (1/2) [0 + 16 + 2 (1 + 4 + 9)]
= 0.5 [16 + 28]
= 22
Answer: Therefore, the area under the curve is 22 sq units.

9.2 Simpson rule

9.2.1 Simpson’s 1/3 rule

To derive Simpson’s 1/3 Rule, we will explore how this numerical integration method approx-
imates the area under a curve by fitting a quadratic polynomial to the function over a
specified interval. This rule is particularly effective for functions that can be well-approximated
by parabolas.
Simpson’s 1/3 rule assumes 3 equispaced data/interpolation/integration points
(endpoints f0 and f2 , and the midpoint)
• The integration rule is based on approximating using Lagrange quadratic (second degree)
interpolation.
• The sub-interval is defined as [x0 , x2 ] and the integration point to integration point spacing
x2 −x0
equals, h = 2 .

Figure 8: Simpson 1/3 rule


• Lagrange quadratic interpolation over the sub-interval:
g(x) = f0 V0 (x) + f1 V1 (x) + f2 V2 (x)
where,

(x−x1 )(x−x2 ) x2 −3hx+2h2


V0 (x) = (x0 −x1 )(x0 −x2 ) = 2h2

(x−x0 )(x−x2 ) 4hx−2x2


V1 (x) = (x1 −x0 )(x1 −x2 ) = 2h2

(x−x0 )(x−x1 ) x2 −hx


V2 (x) = (x2 −x0 )(x2 −x1 ) = 2h2

• Integration rule is obtained by integrating g(x)

R x2 R x2
I= x0 f (x) = x0 g(x)dx
R x2 =2h
I= x0 =0 {f0 [I] + f1 [II] + f2 [III]}dx
h
I= 3 [f0 + 4f1 + f2 ]

9.2.2 Simpson’s 3/8 rule

Simpson’s 3/8 Rule is a numerical integration method used to approximate the value of a
definite integral. It is based on cubic interpolation and is generally more accurate than the
simpler Simpson’s 1/3 Rule, especially for functions that can be well-approximated by cubic
polynomials.

You might also like