0% found this document useful (0 votes)
5 views

Lecture_6_Python

The document covers data modeling and analysis techniques, focusing on interpolation, regression, and curve fitting. It explains various methods such as linear and spline interpolation, types of regression including linear and polynomial, and the importance of evaluating model fit using metrics like R-squared and Mean Squared Error. Additionally, it provides examples of implementing these techniques using Python, particularly for chemical engineering applications.

Uploaded by

alayid1438888
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture_6_Python

The document covers data modeling and analysis techniques, focusing on interpolation, regression, and curve fitting. It explains various methods such as linear and spline interpolation, types of regression including linear and polynomial, and the importance of evaluating model fit using metrics like R-squared and Mean Squared Error. Additionally, it provides examples of implementing these techniques using Python, particularly for chemical engineering applications.

Uploaded by

alayid1438888
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CHE 212 : Data Modelling and Analysis

Lecture No. 06

1 / 38
Table of Contents

1 Introduction

2 Interpolation
Linear Interpolation
Spline Interpolation
Other Types of Interpolation

3 Regression
Types of Regression
Goodness of Fit
Overfitting and Underfitting

4 Curve Fitting
Introduction
Curve fitting Examples

2 / 38
Introduction

In this chapter, we study data modeling and analysis.


Topics include interpolation, regression, and curve fitting.
These techniques are essential for estimating unknown values from
known data points.

3 / 38
Interpolation

Interpolation estimates unknown values between known data points.


E.g. estimating values from steam tables, pressure data, etc.
Methods include linear, polynomial, spline interpolation, and more.
Extrapolation refers to estimating values outside the known data
range.

4 / 38
Interpolation

Figure: Interpolation vs Extrapolation

5 / 38
Linear Interpolation

Linear interpolation assumes a straight-line relationship between two


points.
Formula:
(y2 − y1 )
y = y1 + (x − x1 )
(x2 − x1 )
Python’s numpy.interp() function performs linear interpolation.

6 / 38
Example 1: Linear Interpolation

Given data points:

x = [0, 2, 4, 6, 8, 10], y = [0, 4, 8, 16, 32, 64]

import numpy as np
x = np.array ([0, 2, 4, 6, 8, 10])
y = np.array ([0, 4, 8, 16, 32, 64])

x_new = 5
y_new = np.interp(x_new , x, y)

print(f"The estimated value at x = {x_new} is {y_new }.")

7 / 38
Example 1 (Continued): Interpolating at Multiple Values
x_new2 = [1, 3, 5, 7]
y_new2 = np.interp(x_new2 , x, y)

for i in range(len(x_new2)):
print(f"The estimated value at x = {x_new2[i]} is {y_new2[i]}.")

Figure: Linear interpolation at multiple values of x.

8 / 38
Spline Interpolation

Spline interpolation uses a series of polynomial functions to estimate


intermediate values between data points.
Cubic splines are commonly used due to their smoothness, offering
continuous first and second derivatives.
Unlike linear interpolation, cubic splines provide smooth curves,
making them ideal for more complex or curved data relationships.
Ensures continuity of first and second derivatives at each data point.
Minimizes overall curvature across the dataset.

9 / 38
Example 2: Spline Interpolation

Given data points for heat capacity Cp of a substance at different


temperatures:

Temperature (°C) : [100, 125, 175, 200, 225, 275, 300, 400, 500]

Heat Capacity (J/mol·K) : [25.15, 25.9, 26.7, 26.9, 27.3, 28.5, 29.1, 32, 37.0]
Estimate heat capacity at intermediate temperatures
T = [150, 250, 350, 450].
import numpy as np
from scipy.interpolate import CubicSpline

temperature = np.array ([100 ,125 ,175 ,200 ,225 ,275 ,300 ,400 ,500])
heat_capacity = np.array ([25.15 , 25.9, 26.7, 26.9, 27.3, 28.5, 29.1, 32, 37.0])

cs = CubicSpline(temperature , heat_capacity)
temp_new = np.array ([150 , 250, 350, 450])
cp_interpolated = cs(temp_new)

for i in range(len(temp_new)):
print(f"The estimated Cp at T = {temp_new[i]} °C is {cp_interpolated [i]:.2f} J/mol·K.")

10 / 38
Example 2: Plot

Figure: Spline interpolation for heat capacity at different temperatures.


11 / 38
Other Types of Interpolation

Polynomial Interpolation: Uses a single polynomial to pass through


all data points, suitable for small datasets but prone to oscillation
(Runge’s phenomenon).
Nearest-Neighbor Interpolation: Assigns the value of the nearest
data point to the interpolated point, simple but leads to
discontinuities.
Barycentric Interpolation: A stable form of polynomial
interpolation, preferred for large datasets due to its numerical stability.

12 / 38
Introduction to Regression

Regression is a statistical technique to identify relationships between


dependent and independent variables.
It models the conditional mean of the response variable given certain
predictors.
Unlike interpolation, regression assumes noise in the data and provides
a model that approximates the relationship between variables.

13 / 38
Regression vs Interpolation

Interpolation: Estimates unknown values between known data


points, assumes no noise.
Regression: Models the entire dataset, assuming noise, and finds
relationships between variables.

14 / 38
Types of Regression

Linear Regression: Models a linear relationship between dependent


and independent variables.
Polynomial Regression: Fits a polynomial of degree n to the data.
Multiple Linear Regression: Models the relationship between one
dependent variable and multiple independent variables.

15 / 38
Linear Regression

A linear model is given by:

y = mx + c

where:
m is the slope (rate of change of y with respect to x).
c is the y-intercept (value of y when x = 0).
The goal is to find m and c that minimize the difference between
predicted and actual values.
Methods: Least squares, numpy.polyfit(), scikit-learn, statsmodels.

16 / 38
Polynomial Regression

Extends linear regression by fitting a polynomial of degree n.

y = an x n + an−1 x n−1 + · · · + a1 x + a0

Useful when data shows curvature or a non-linear relationship.


Methods: numpy.polyfit(), scikit-learn, statsmodels.

17 / 38
Multiple Linear Regression

Models the relationship between multiple independent variables and


one dependent variable.

y = m1 x1 + m2 x2 + · · · + mn xn + b

Useful when several variables affect the outcome.

18 / 38
Types of Regression Models

Figure: Different types of regression models: (left) linear regression, (middle)


polynomial regression, (right) multiple linear regression.

19 / 38
Example 3: Linear Regression

Given data for the relationship between temperature and reaction rate:

Temperature (°C) : [0, 20, 40, 60, 80, 100]

Reaction Rate (mol/s) : [0.5, 2.5, 4.8, 7.0, 9.8, 12.5]


Perform linear regression to model this relationship.
import numpy as np
import matplotlib.pyplot as plt

temperature = np.array ([0, 20, 40, 60, 80, 100])


reaction_rate = np.array ([0.5 , 2.5, 4.8, 7.0, 9.8, 12.5])

# Perform linear regression


coefficients = np.polyfit(temperature , reaction_rate , 1)
slope , intercept = coefficients
print(f"Slope: {slope :.2f}, Intercept: {intercept :.2f}")

20 / 38
Example 3: Plotting Linear Fit
# Generate fitted values using the linear model
reaction_rate_fitted = np.polyval(coefficients , temperature)

# Plot the data and regression line


plt.figure(dpi =150)
plt.plot(temperature , reaction_rate , 'ro', label='Data ')
plt.plot(temperature , reaction_rate_fitted , 'b-',
label=f'Linear Fit: y = {slope :.2f}x + {intercept :.2f}')
plt.xlabel('Temperature (°C)')
plt.ylabel('Reaction Rate (mol/s)')

21 / 38
Example 4: Polynomial Regression
We extend the linear regression example by fitting a 2nd-degree
polynomial.
# Perform quadratic regression ( degree 2)
coefficients = np.polyfit(temperature , reaction_rate , 2)
a, b, c = coefficients
print(f"Quadratic Coefficients: a={a:.2f}, b={b:.2f}, c={c:.2f}")

# Generate fitted values using the quadratic model


reaction_rate_fitted = np.polyval(coefficients , temperature)

# Plot the data and regression curve

22 / 38
Evaluating Goodness of Fit

To determine how well the model fits the data, we use metrics such as:
Residuals Analysis: Measures differences between observed and
predicted values.
R-squared (R²): Proportion of variance in the dependent variable
explained by the model.
Mean Squared Error (MSE): Average squared difference between
observed and predicted values.

23 / 38
Python Code to Calculate R² and MSE

from sklearn.metrics import r2_score , mean_squared_error

# Linear regression ( degree 1)


coef_lin = np.polyfit(temperature , reaction_rate , 1)
rate_lin_fit = np.polyval(coef_lin , temperature)

# Quadratic regression ( degree 2)


coef_quad = np.polyfit(temperature , reaction_rate , 2)
rate_quad_fit = np.polyval(coef_quad , temperature)

# Calculate R- squared and MSE for both models


r2_lin = r2_score(reaction_rate , rate_lin_fit)
r2_quad = r2_score(reaction_rate , rate_quad_fit)
mse_lin = mean_squared_error (reaction_rate , rate_lin_fit)
mse_quad = mean_squared_error (reaction_rate , rate_quad_fit)

print(f"Linear Regression - R²: {r2_lin :.4f}, MSE: {mse_lin :.4f}")


print(f"Quadratic Regression - R²: {r2_quad :.4f}, MSE: {mse_quad :.4f}")

24 / 38
Goodness of Fit Results

**Linear Regression**:
R²: 0.97, MSE: 0.49
**Quadratic Regression**:
R²: 0.99, MSE: 0.09
Quadratic regression has a better fit, with a higher R² and lower MSE.

25 / 38
Overfitting and Underfitting

Underfitting: Occurs when the model is too simple to capture


underlying patterns in the data.
Overfitting: Happens when the model is too complex, fitting noise as
well as the trend.
Use metrics like R-squared and MSE to evaluate the fit.
Start with a lower-degree polynomial and increase only as needed.

26 / 38
Introduction to Curve Fitting

Curve fitting is used to find a curve (often non-linear) that best


describes a dataset.
Involves using complex functions (e.g., exponential, logarithmic) to
model the data.
Curve fitting is used to precisely describe the shape of the data, while
regression focuses on creating predictive models.

27 / 38
Steps for Curve Fitting in Python

1 Import necessary libraries like numpy, matplotlib, and


scipy.optimize.curve_fit.
2 Define the function representing the model (e.g., exponential,
logarithmic).
3 Define independent (x) and dependent (y) variables for the dataset.
4 Use curve_fit() to fit the model to the data.
5 Retrieve optimized parameters from the fitting process.
6 Plot the original data and fitted curve for visualization.
7 Evaluate the fit using metrics like R 2 or MSE.

28 / 38
Example 5: Curve Fitting (Arrhenius Equation)

In a chemical reaction, the reaction rate follows the Arrhenius equation:

E
 
r (T ) = A · exp −
R ·T

where:
r (T ): reaction rate at temperature T ,
A: pre-exponential factor,
E : activation energy,
R = 8.314 J/(mol·K) is the universal gas constant.
The goal is to fit the Arrhenius equation to the data and determine A and
E.

29 / 38
Example 5: Plotting Arrhenius Fit

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

T = np.array ([300 , 350, 400, 450, 500, 550])


rate = np.array ([0.0025 , 0.0048 , 0.0075 , 0.0112 , 0.0164 , 0.0235])
R = 8.314

def arrhenius(T, A, E):


return A * np.exp(-E / (R * T))

popt , pcov = curve_fit(arrhenius , T, rate)


A, E = popt

print(f"Fitted A: {A:.4e} mol/s, Fitted E: {E:.2f} J/mol")


# Generate fitted curve
T_fit = np.linspace(min(T), max(T), 100)
rate_fit = arrhenius(T_fit , *popt)

# Plot data and fitted curve


plt.scatter(T, rate , color='red', label='Experimental Data ')
plt.plot(T_fit , rate_fit , label='Fitted Arrhenius Model ', color='blue ')
plt.xlabel('Temperature (K)')
plt.ylabel('Reaction Rate (mol/s)')
plt.title('Arrhenius Model Fitting ')

30 / 38
Figure: Comparison of experimental data and Arrhenius fit.

31 / 38
Improving Curve Fitting

Initial Guesses: Providing good initial guesses for parameters can


significantly improve fit accuracy and prevent errors, especially in
complex models like exponential or Gaussian.
Initial guesses help the fitting process converge faster and lead to a
better fit.
Example: In Gaussian peak fitting, appropriate initial guesses for
amplitude, center, and width can improve the fit.

32 / 38
Example 6: Gaussian Curve Fitting

In a chemical engineering process, the absorption of a compound is


measured at different wavelengths. The absorption forms a peak, which
can be modeled using a Gaussian function:

(λ − λ0 )2
 
A(λ) = a · exp −
2 · σ2

Where:
A(λ) is the absorption at wavelength λ,
a is the peak amplitude (maximum absorption),
λ0 is the center of the peak,
σ is the standard deviation (related to the width of the peak).
The goal is to fit the Gaussian model to the experimental data and
determine a, λ0 , and σ.

33 / 38
Experimental Data and Initial Guess

Given data:

Wavelengths (nm) : [400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600]

Absorption (AU) : [0.12, 0.31, 0.82, 1.50, 1.85, 1.92, 1.67, 1.10, 0.65, 0.30, 0.12]
We will use an initial guess for the Gaussian parameters:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Data: Wavelength (nm) and absorption (AU)


wavelengths = np.array ([400 , 420, 440, 460, 480, 500, 520, 540, 560, 580, 600])
absorption = np.array ([0.12 , 0.31, 0.82, 1.50, 1.85, 1.92, 1.67, 1.10, 0.65, 0.30, 0.12])

# Gaussian model function


def gaussian(x, a, x0 , sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma **2))

# Initial guess for parameters


initial_guess = [2, 500, 30]

34 / 38
Fitting the Gaussian Model

Perform curve fitting using the initial guess:


# Perform curve fitting to find best -fit a, x0 �(0) , sigma
popt , pcov = curve_fit(gaussian , wavelengths , absorption , initial_guess)

# Extract the optimized parameters


a, x0 , sigma = popt

print(f"Fitted peak amplitude a: {a:.2f} AU")


print(f"Fitted center wavelength �0: {x0:.2f} nm")
print(f"Fitted standard deviation �: {sigma :.2f} nm")

Output:

Fitted peak amplitude a: 1.98


Fitted center wavelength λ0 : 495.29
Fitted standard deviation σ: 42.22 nm.

35 / 38
Plotting the Gaussian Fit
# Generate fitted curve using the optimized parameters
wavelengths_fit = np.linspace(min(wavelengths), max(wavelengths), 100)
absorption_fit = gaussian(wavelengths_fit , *popt)

# Plot the original data and the fitted Gaussian curve


plt.scatter(wavelengths , absorption , color='red', label='Experimental Data ')
plt.plot(wavelengths_fit , absorption_fit , label='Fitted Gaussian Model ', color='blue ')

Figure: Comparison of experimental data and Gaussian fit.


36 / 38
Fitting Without Initial Guess

The curve_fit() function can also work without an initial guess,


but providing a good initial guess speeds up the optimization process
and ensures accurate fitting, especially for non-linear models.

37 / 38
Thank You!
Any questions?

You might also like