0% found this document useful (0 votes)

187 views134 pages

Data Analytics

The document describes a virtual workshop on applying data analytics in petroleum engineering offered by Dr. Srikanta Mishra for IIT(ISM) Dhanbad Department of Petroleum Engineering on September 25, 2021. The workshop aims to provide a practical introduction to statistical modeling, machine learning techniques, and their applications in petroleum engineering. The agenda includes introductions to foundational concepts, techniques like regression, clustering, and machine learning algorithms, as well as a software demonstration.

Uploaded by

MOHANRAJ.G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views134 pages

Data Analytics

Uploaded by

MOHANRAJ.G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 134

Dr.

Srikanta Mishra
[email protected]
+1-614-424-5712

Application of Data Analytics in

Petroleum Engineering
Virtual Workshop Offered for
IIT(ISM) Dhanbad Dept. of Petroleum Engineering
25 September 2021
Speaker Introduction
• Technical Director, Geo-energy
Modeling & Analytics, Battelle Memorial
Institute, Columbus, Ohio, USA
• Recipient of SPE 2021 International
Award for Distinguished Membership
• Author of book “Applied Statistical
Modeling and Data Analytics”
• Instructor of multiple SPE and industry
workshops on petroleum data analytics
• SPE Distinguished Lecturer
(2018-19) on big data analytics
• Technical Lead, US Department of
Energy’s Subsurface ML Initiative
• PhD (Stanford), MS (Texas-Austin),
BTech (ISM) – all in Petroleum Engg.

Mishra - IIT(ISM) 2021 2

The Attraction / Challenge

Big Data Analytics ? Game Changer

=
large volumes of data
about subsurface, physical
infrastructure and flows

Actionable
New insights about reservoir
from “data mining” can help information
increase operational efficiencies

Mishra - IIT(ISM) 2021 3

Workshop Goals

• Provide a practical introduction to applied statistical

modeling and data analytics techniques
▪ Foundational concepts
▪ Linear regression and variants (for building simple input-output models)
▪ Multivariate data reduction and clustering (for finding sub-groups of data
that have similar attributes)
▪ Machine learning for regression and classification (for developing more
complex data-driven input-output models)
▪ R/RATTLE software demonstration (application of ML techniques)

• Emphasize practical aspects (not statistical rigor)

Mishra - IIT(ISM) 2021 4

Workshop Agenda

600 – 650p Introduction, statistical foundations –

regression analysis and multivariate statistics

650 – 700p Break

700 – 750p Machine learning basics, case studies

750 – 800p Break

800 – 850p Software demonstration

850 – 900p Wrap up

Mishra - IIT(ISM) 2021 5

What is Statistics All About?

INFORMATION  Structure ~ Trends ~ Relationships

Relationships require specifying the form of model

Mishra - IIT(ISM) 2021 6

Big Data & Analytics – What & Why?
Big Data Data Analytics

Volume
Make
better
decisions
Understand
“what does
Examine data say”
data Prediction
Velocity Variety
Learning

Data Analytics (aka Machine Learning, Knowledge Discovery, Data Mining)

helps understand hidden patterns and relationships in large, complex datasets –
machine (algorithm) determines the form of the (black-box) model

Mishra - IIT(ISM) 2021 7

Competing on Analytics
by Tom Davenport
It’s virtually impossible to differentiate yourself from competitors
based on products alone. Your rivals sell offerings similar to yours.
And thanks to cheap offshore labor, you’re hard-pressed to beat
overseas competitors on product cost.

How to pull ahead of the pack? Become an analytics competitor:

Use sophisticated data-collection technology and analysis to wring
every last drop of value from all your business processes.

With analytics, you discern not only what your customers want but
also how much they’re willing to pay and what keeps them loyal.
You look beyond compensation costs to calculate your workforce’s
exact contribution to your bottom line. And you don’t just track
existing inventories; you also predict/prevent future inventory issues.

Harvard Business Review 84(1):98-107, January 2006

Mishra - IIT(ISM) 2021 8

Types of Analytics

Mishra - IIT(ISM) 2021 9

A Rose by Any Other Name!
Artificial
Intelligence

Knowledge
Machine Discovery
Learning Data
Mining

Data Statistical
Analytics Learning

Mishra - IIT(ISM) 2021 10

Data Analytics v/s Machine Learning
v/s Artificial Intelligence
• Data analytics  data collection A
and analysis to understand Collect Data
C
hidden patterns and relationships
Collect Data
▪ Machine learning  building B
model between predictors and Infer Rules
response, most commonly using (predictive model)
black-box methods
D
E
• Artificial intelligence  Make Decision
applying predictive model with
new data to make decisions,
Data analytics  A, B
without human intervention (with Machine learning  B
possibility of feedback) Artificial Intelligence  C, B, D, E

Mishra - IIT(ISM) 2021 11

Data Analytics Process
Exploratory Data Analysis
• Multi-dimensional data visualization
• Scatter-plot matrix, trellis plots

Unsupervised Learning
• Data reduction and clustering
• PCA, k-means, self-organizing maps

Supervised Learning
• Regression and classification
• Random forest, SVM, neural nets, kriging

Mishra - IIT(ISM) 2021 12

Supervised v/s Unsupervised Learning

Ma et al., 2018, Symmetry, 10, 734

Mishra - IIT(ISM) 2021 13

Areas of Application in O&G
Combining streaming data Exploration Finding hidden patterns
with past performance Data Mining in large geologic datasets
to predict potential failures

Predictive Reservoir
Maintenance Management
Identifying factors for
Real-time prediction of system improved performance
response (drilling, fluid injection)

Performance Proxy Creating fast system

Forecasting Modeling “emulators”

Reduce cost, improve productivity, increase efficiency

Mishra - IIT(ISM) 2021 14

Exponential Growth in ML Applications
"Machine Learning" Hits from OnePetro Database
6000

5000

4000

3000

2000

1000

0
1985 1990 1995 2000 2005 2010 2015 2020 2025

Mishra - IIT(ISM) 2021 15

Examples
• Predicting oil production as a function of water
• Predicting core-derived permeability from injection rates of surrounding injectors to identify
basic well-log attributes injector-producer connectivity
• Predicting rock facies (originally based on • Predicting bottom-hole and/or surface pressure as a
core and advanced log data) from basic well function of production rate to flag potential
log attributes anomalous trends and/or adverse events
• Predicting incidence of vuggy zones in
carbonate reservoirs (originally based on
• Identifying variables responsible for equipment
failure (e.g., ESP) based
core and advanced log data) from basic
on historical data, and calculating forward-looking
well log attributes
failure probability using real-time data
• Predicting total organic carbon content in
shale formations from basic well log • Analyzing drillers logs, fields reports etc. (using
attributes natural language processing) to identify incidence of
adverse events + underlying causes
• Predicting geomechanical properties in shale
wells from basic well log attributes • Building predictive model for drilling rate of
penetration using historical data, and forecasting
• Predicting oil production from shale wells as
future response using real-time data
a function of geologic and completion related
parameters • Building fast surrogate (proxy) model using
• Calculating water saturation from basic well reservoir simulation outputs
log attributes for repetitive calculations (e.g., history matching,
uncertainty quantification)
• Predicting PVT properties from basic crude
oil and reservoir characteristics • Using data-driven fits to decline curves for
predicting EUR

Mishra - IIT(ISM) 2021 16

Statistical Foundations

Basic Regression Analysis

Ch. 4, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 17

Linear Regression

• Given data (x1,y1), (x2, y2)….(xn,yn)

• Postulate model
yi = a + bxi +ei ; e~N(0,s)
• Minimize objective function
S(a,b) = S(yi - a - bxi)2
• Residuals should be normally distributed with
mean=0 and SD= s
• If trend in ei versus xi is non-random (i.e., cyclic,
monotonic, etc.), then linear model not appropriate

Mishra - IIT(ISM) 2021 18

Linear Regression Example
350

y = 2.0626x + 97.397
300
Initial Well Potential (BOPD)

R2 = 0.5385

250

200

150

100

0
0 10 20 30 40 50 60 70 80 90
Net Pay (ft)

Mishra - IIT(ISM) 2021 19

Regression Coefficients
Regression Statistics
Multiple R 0.733851
R Square 0.538538 Fraction of total variance explained by model
Adjusted R Square
0.522625
~ RMSE
Standard Error
44.71329 Estimated SD of error term in regression =
Observations 31

Coeff Std Error t Stat P-value Lower 95% Upper 95%

Intercept 97.397 13.844 7.035 9.75E-08 69.082 125.711
X Variable 1 2.063 0.355 5.818 2.63E-06 1.337 2.788

Mean and SD =Coeff/SE, the smaller ~ Coeff ± 2SE

=
of regression the bigger the better
coefficients the better (likelihood that
Coeff is different
from zero)

Mishra - IIT(ISM) 2021 20

Diagnostic Plots
Observed v/s Predicted X Variable 1 Residual Plot
400 100
350 80
300 60
Y - predicted

250 40

Residuals
200 20
0
150
-20 0 20 40 60 80 100
100
-40
50
-60
0
-80
0 100 200 300 400
-100
Y - observed X Variable 1

Normal Probability Plot

20
y = 7.2525x - 8E-16
15
R² = 0.9879
10
5
Y

0
-3 -2 -1 -5 0 1 2 3
-10
-15
Standard Normal Deviate

Mishra - IIT(ISM) 2021 21

Non-Linear Regression

• Given data (x1,y1), (x2, y2)….(xn,yn)

• Postulate model
f(yi) = a + b g(xi) +ei ; e~N(0,s)
• Minimize objective function
Can Use
S(a,b) = S(f(yi) - a - b g(xi))2 Excel SOLVER
• f(.) and g(.)  linearizing transforms
▪ Logarithmic ▪ Power
▪ Exponential ▪ Non-parametric

Mishra - IIT(ISM) 2021 22

Multiple Linear Regression

• Also called Ordinary Least Squares (OLS) regression

• Assume a linear model between response and predictors

• Solve for bk by minimizing the sum of squared residuals

• Easily interpretable, fast, well-studied statistical properties

• Can capture some non-linear behavior through
transformation of variables (e.g., quadratic model)

Mishra - IIT(ISM) 2021 23

How Many Terms in Regression?
• Seek parsimonious
balance between
goodness of fit and
model complexity
• Minimize AIC (Akaike
Information Criterion)
AIC = n log( SS E / n ) + 2 p
where
n = number of observations
p = number of model parameters
SS E = residual sum of squares

Mishra - IIT(ISM) 2021 24

Key Takeaways

• First, plot the data !!!!

• Model structure important – may need to experiment
with multiple models for best fit
• Understand distribution of residuals
• Examine significance of regression coefficients
• Check for parsimonious outcomes
• Partition data into sub-populations prior to regression
analysis, if necessary

Mishra - IIT(ISM) 2021 25

Plotting Multivariate Data

Mishra et al., 2014, Env. Geosci., 21(2), 59-74.

Mishra - IIT(ISM) 2021 26

Visualizing Correlations
Correlation CO2_MMP.csv using Spearman
• Rank (Spearman) correlation 

MWC7.

MWC5.
C2.C6

MMP
robust measure for strength of

API
C7.

Vol
C1
V.I
Int

T
1
association (linear/non-linear) C7.

• Rank samples from smallest

0.8
C2.C6

(rank=1) to largest (rank=N) MWC7.

0.6

• Helps identify both Int

0.4

▪ Redundant variables API 0.2

▪ Relevant variables MWC5. 0

• Also referred to as “feature V.I -0.2

selection” when dealing with C1

-0.4

large data sets – multiple Vol

-0.6
approaches possible MMP
-0.8
T
https://www.kdnuggets.com/2021/06/ -1
feature-selection-overview.html
Rattle 2019-Oct-07 15:02:10 MISHRAS

Mishra - IIT(ISM) 2021 27

Statistical Foundations

Multivariate Analysis
Ch. 5, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 28

Principal Component Analysis

• Statistical technique for

▪ reducing dimensionality

▪ making data independent of each other

▪ without significant loss of information

• PCs formed by weighted linear combination of

original variables (rotation and projection)
• Coordinate transformation into a new set of variance
maximizing, mutually orthogonal coordinates (PCs)
• PCs can be interpreted as surrogate variables

Mishra - IIT(ISM) 2021 29

PCA – Details

PC = a1x1 + a2x2 + …..anxn

Weighting factors Original variables

• Weighting factors given by eigenvectors of correlation

matrix of original variables
• Relative importance of PCs given by eigenvalues of
correlation matrix
• Correlation between PCs and original variables given
by factor loadings

Mishra - IIT(ISM) 2021 30

Raw and Standardized Data

x1 x2 x3 dx1 dx2 dx3

2 1 2 -1.92 -1.85 -1.29
4 3 1 -1.19 -1.30 -1.55
6 7 5 -0.46 -0.19 -0.50
7 6 3 -0.09 -0.46 -1.02
5 4 7 -0.82 -1.02 0.02
8 9 6 0.27 0.37 -0.24
9 8 7 0.64 0.09 0.02
7 10 9 -0.09 0.65 0.54
10 11 11 1.01 0.93 1.07
8 9 8 0.27 0.37 0.28
12 11 10 1.74 0.93 0.81
9 13 14 0.64 1.48 1.85
mean 7.25 7.67 6.92 0.00 0.00 0.00
variance 7.48 12.97 14.63 1.00 1.00 1.00

Mishra - IIT(ISM) 2021 31

Eigenvalues

• Eigenvalues solution of
Correlation Matrix |C-lI| = 0
x1 x2 x3 C = correlation matrix
x1 1.000 0.886 0.750
x2 0.886 1.000 0.889
I = identity matrix
x3 0.750 0.889 1.000 l = vector of eigenvalues
• N (obs) x P (var) dataset
Eigenvalues has P eigenvalues
l 2.685
l 0.251
• Eigenvalue = variance of
l 0.065 corresponding PC
• Sli = 3 = P (trace of C)

Mishra - IIT(ISM) 2021

32
Selection of Key PCs
Eigenvalues of correlation matrix
• Based on relative magnitude of Active variables only
3.5
eigenvalues 43.38%
3.0
• Each eigenvalue represents
fraction of total variance 2.5

explained by corresponding PC 2.0

23.12%
• Criteria for selecting key PCs 1.5

13.92%

Eigenvalue
▪ Scree plot (keep all PCs above 1.0
10.27%
“floor” level) 6.30%
0.5
3.01%
▪ Kaiser criterion (keep all PCs with .00%
0.0
eigenvalue > 1)
-0.5
▪ Variance threshold (keep all PCs -1 0 1 2 3 4 5 6 7 8 9

explaining ~90% variance) Eigenvalue number

Mishra - IIT(ISM) 2021 33

Eigenvectors

• Eigenvectors solution of
Eigenvalues
|C-liI|ui = 0
l 2.685 C = correlation matrix
l 0.251 I = identity matrix
l 0.065 li = eigenvalues for i-th PC
ui = eigenvector for li
Eigenvectors • Eigenvectors are coefficients
u1 u2 u3 of variables in linear equations
0.567 0.711 0.416 defining PCs
0.597 -0.007 -0.802
0.567 -0.703 0.429
• Also define rotation from
original variable to PC space

Mishra - IIT(ISM) 2021

34
Principal Components

dx1 dx2 dx3 pc1 pc2 pc3

-1.92 -1.85 -1.29 -2.92 -0.45 0.13
-1.19 -1.30 -1.55 -2.33 0.25 -0.12
-0.46 -0.19 -0.50 -0.65 0.03 -0.26
-0.09 -0.46 -1.02 -0.91 0.66 -0.11
-0.82 -1.02 0.02 -1.06 -0.59 0.48
0.27 0.37 -0.24 0.24 0.36 -0.29
0.64 0.09 0.02 0.43 0.44 0.20
-0.09 0.65 0.54 0.64 -0.45 -0.32
1.01 0.93 1.07 1.73 -0.04 0.13
0.27 0.37 0.28 0.54 -0.01 -0.06
1.74 0.93 0.81 1.99 0.66 0.33
0.64 1.48 1.85 2.30 -0.86 -0.13
mean 0.00 0.00 0.00 0.00 0.00 0.00
variance 1.00 1.00 1.00 2.685 0.250 0.065

l1 = 2.685

PC1 = 0.270.567 + 0.370.597 – 0.24*0.567 = 0.24

Mishra - IIT(ISM) 2021 35

Example – PCA (Salt Creek Data)
0.486

0.272

0.116 0.098
0.028

Mishra - URTeC 2019

36
Cluster Analysis

• The goal is to group objects that are similar based on some

measured characteristics.
• Unsupervised classification since the operation is not guided
by a priori hypothesis or external models

Mishra - AAPG ES 2019 37

k-Means Clustering

• Modifies an initial classification by moving objects

from one group to another
• Requires a measure of distance between each pair
of points (x, y)
o Usually Euclidean distance

• Requires specifying number of clusters in advance

• Normalize variables for better, more stable results
• Needs an initial classification of the data (i.e., use
hierarchical approach for initialization)

Mishra - URTeC 2019

38
Example – Clustering (Salt Creek Data)

Mishra - URTeC 2019

39
Hierarchical Clustering

• Start with each observation in

separate groups (i.e., k = n)

• Clusters are progressively

merged until all observations are
in a single group

• At each step, the clusters chosen

for the merge are the ones that
are the least dissimilar, as defined
by some dissimilarity measure D
• Merge two clusters with smallest
dissimilarity
• Compute dissimilarity between new
cluster and all remaining clusters

Mishra - IIT(ISM) 2021 40

Key Takeaways

• Different (novel) possibilities for data visualization

• Principal component analysis for reducing data
dimensionality and identifying surrogate variables

• Cluster analysis for grouping data into statistically

homogeneous populations for model building

• 1st step in determining structure within the space of

independent variables building input-output modeling

Mishra - IIT(ISM) 2021 41

Mishra - IIT(ISM) 2021 42
Machine Learning

Basic Concepts
Ch. 8, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 43

Data-Driven Modeling

• Classical statisticspostulate model between independent

(predictor) and dependent (response) variables
• Need to look beyond linear regression (and variants) for
complex multi-dimensional data sets
• Idea is to extract the model from the data without making
any assumptions regarding the underlying functional form
(supervised learning)
▪ regression problems, where response variable is continuous
(e.g., permeability)
▪ classification problems, where response variable is categorical
(e.g., rock type)

Mishra - IIT(ISM) 2021 44

Classification v/s Regression

Outputs are categorical Outputs are continuous

https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d

Mishra - IIT(ISM) 2021 45

Why ML Models and When?
• Historically, subsurface science and engineering
analyses have relied on mechanistic models
• Incorporation of causal input-output relationship
• Experienced professionals are wary of purely “black-
box” ML models that lack such understanding
• Nevertheless, the use of ML models is easy to justify
➢ relevant physics-based model is computation intensive
and/or immature
➢ suitable mechanistic modeling paradigm does not exist

Mishra - IIT(ISM) 2021 46

Three Cases for Black-Box Models

• When the cost of a wrong answer is low relative to the

value of a correct answer,
➢ Proxy models in history matching

• When they produce the best results,

➢ Image libraries for recognizing well-test response patterns

• As tools to inspire and guide human inquiry,

➢ Preventive maintenance applications

Holm, 2019, Science, 367, 26-27.

Mishra - IIT(ISM) 2021 47

Key Concepts for Model Building

• Predictive modeling methods

• Model evaluation and validation
• Automatic tuning of model parameters
• Model aggregation
• Variable Importance
• Classification problems

Mishra - IIT(ISM) 2021 48

Predictive Modeling Methods
Regression &
Regression &
Classification Tree
Classification Tree

Random
Random X1 < t1
Forest
Forest

Gradient
Gradient Boosting
Boosting
Machine
Machine X2 < t2 X2 < t3

Support
Support Vector
Vector
Machine
Machine R1 R2 R4 R3

Artificial Neural
Neural Partition
Build
Inputs
Find parameter
Multidimensional
Build
hyperplane
sequence
mapped
ensemble to space
ofmaximizing into
interpolation
of
outputs
trees
trees
that
via
using rectangular
address
separation
hidden
considering
random
short-
units
of
Network regions
trend
data
using awith
subsets
comings
and
and constant
sequence
of
transform values
autocorrelation
of
observations
eachof data
previous orlinear
nonlinear
into class
structure
and fitted labels
predictors
functions
tree
of
space
data

Gaussian
Gaussian Process
Process
Emulation
Emulation

Mishra - IIT(ISM) 2021 49

Regression Model Evaluation

Model overfitting likely, if evaluated solely

against training dataset
Mishra - IIT(ISM) 2021 50
Bias and Variance
Bias  difference between expected prediction and correct value
Variance  variability in predictions (due to model complexity)

Mishra - IIT(ISM) 2021 51

Goodness-of-fit Metrics
• Average absolute error (AAE)
𝑛
1
𝐴𝐴𝐸 = ෍ 𝑦𝑖 − 𝑦ො𝑖
𝑛
𝑖=1

• Mean or root mean squared error (MSE/RMSE)

• Pseudo-R2 (not bounded between 0 and 1 for general regression!)

Mishra - IIT(ISM) 2021 52

k-fold Cross Validation
Recommended, if an independent test dataset is not available

Train

Model Model Model Model Model

Full Dataset
Predict

Mishra - IIT(ISM) 2021 53

General Logistics of Model Fitting

Kuhn, M. and K. Johnson, Applied Predictive Modeling, 2013. Springer.

Mishra - IIT(ISM) 2021 54

Tuning Parameters

Method Tuning Parameters

Tree depth,
Regression Tree
cost complexity parameter
Random Forest Randomly selected predictors
Number of iterations,
Gradient Boosting Machine Tree depth,
shrinkage
Cost,
Support Vector Regression
Sigma (radial basis function)
Number of hidden units,
Neural Networks
Weight decay

Mishra - IIT(ISM) 2021 55

Model Aggregation – Why?
• Model fits measured in terms of training or test error –
multiple competing models may arise!

• Aggregating over a large set of acceptable models can

provide more robust understanding and predictions
• Ensemble models (with predictions aggregated) top
performers in data science competitions
Mishra - IIT(ISM) 2021 56
Ensemble Modeling Methods
• Model aggregation strategies
▪ Simple averaging (direct average of constituent model predictions,
e.g., using arithmetic average)

▪ Weighted averaging (weighted averaging of constituent model

predictions, e.g., using inverse of RMSE)

▪ Stacking (predictions from the constituent models are used as

predictors in an aggregate model, e.g., NN training)

• Similar arguments underlie Beven’s concept of

“equifinality” in watershed hydrology modeling in the
GLUE framework (weighted averaging)
Schuetter et al., URTeC (2019)

Mishra - IIT(ISM) 2021 57

Variable Importance

• Some algorithm specific importance

metrics available (e.g., RF, GBM, NN)

• Not always consistent across models

(because of different approaches for
handling variable interaction)
• R2-loss  loss in explanatory power
of model if the variable of interest is
dropped from regression

• Simple and intuitive model-

independent measure

• Can be aggregated for “meta ranking”

Mishra - IIT(ISM) 2021 58

Variable Importance Approaches
Strategy Notation Description
Removing a Remove a variable from the model, re-train the model and compare
Remove
variable the reduction in pseudo-R2, i.e. R2 loss.
Permute a variable’s values, which breaks the relationship between
Permuting a the variable and the true outcome, then compare the reduction in
Permute
variable pseudo-R2, i.e. R2 loss, of the dataset with permuted values to that
with true values.
The partial dependence plot shows the marginal effect of different
Partial variables on the predicted outcome. PDPs are “flat” for less important
PDP
Dependent Plot variables while the variables whose PDP vary across a wider range of
the response are more likely to be important.
Compare how the model predictions change in a small “window” of
Accumulated
ALE different variables. ALE plots are faster and unbiased alternative to
Local Effects Plot
partial dependence plots.
Local LIME attempts to understand the model by perturbing the input of
Interpretable data samples and interpreting how the predictions change. Variable
LIME
Model-Agnostic weights can then be extracted from a simple local model on the
Explanations permuted dataset to explain local behavior.
SHAP is a method to explain individual predictions based on the game
theoretically optimal Shapley values. A prediction can be explained by
Shapley
assuming that each feature value of the instance is a “player” in a
Addictive SHAP
game where the prediction is the payout. Shapley values – a method
exPlantations
from coalitional game theory, tells us how to fairly distribute the
“payout” among the features.

https://christophm.github.io/interpretable-ml-book/

Mishra - IIT(ISM) 2021 59

Classification Problems

• Goal is to use the predictors (x1, x2, …, xp) to determine

a group label for the observation
• Response y is categorical ; predictors xi can be
categorical or continuous
• As in regression, the model is trained using data
▪ Response is observed for n samples y1, y2, …, yn

▪ Predictors are observed for n samples, where the kth sample is

(x1k, x2k, …, xpk)
• Same concerns as before on evaluating model
performance (i.e., use training AND test data)

Mishra - IIT(ISM) 2021 60

Classification Model Evaluation

▪ Be aware of class imbalance, especially for rare events

− Many methods can use weights on observations, or include options for
balancing the analysis with respect to class sizes

▪ One useful visualization: ROC curve + area under curve (AUC)

− Measures probability of separating 1
two classes (true v/s false positive rate)
− Shows performance (TPR
vs. FPR) as the parameter is True Receiver Operating
varied over a range Positive
Characteristic Curve
Rate
− AUC = 1  perfect prediction
− AUC = 0.5  no ability to distinguish
between classes
0 False Positive Rate 1
− AUC = 0  perfect inverse prediction

Mishra - IIT(ISM) 2021 61

Other Classification Metrics

Recall

https://manisha-sirsat.blogspot.com/2019/04/confusion-
matrix.html
Mishra - IIT(ISM) 2021 62
Machine Learning Methods

Case Studies & Examples

Ch. 8, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 63

Case Study 1 – Shale Well Productivity
Schuetter et al., SPEJ (2018)
Field Description
• Wolfcamp Shale ID
M12CO
Well ID
Cum. production of 1st 12 producing months (BBL)
horizontal wells MMO12 Max. monthly production of 1st 12 producing months (BBL)
MMO2CLAT Production efficiency (MMO12/LATLEN)
▪ Data from 476 Wells Opt2 Categorized operator code
COMPYR Well completion year
▪ Goal Fit M12CO SurfX, SurfY Geographic location
as a function of the AZM Azimuth angle
TVDSS True vertical depth (ft)
12 predictors DA Drift angle
LATLEN Total horizontal lateral length (ft)
▪ Multiple regression
STAGE Frac stages
modeling methods FLUID Total frac fluid amount (gal)
PROP Total proppant amount (lb)
▪ Model validation + PROPCON Proppant concentration (lb/gal)
variable importance GORMM12 Gas-oil-ratio of the max. producing month (MMcf/BBL)
GORC12 Avg. gas-oil-ratio of the 1st 12 producing months (MMcf/BBL)

Mishra - IIT(ISM) 2021 64

Scatter Plot Matrix Analysis

Mishra - IIT(ISM) 2021 65

Model Fits

Variety of
modeling methods

Three types of
model validation
- full training data
- 10-fold CV
- held out test data

Mishra - IIT(ISM) 2021 66

Predictor Importance

Mishra - IIT(ISM) 2021 67

Conditional Sensitivity Analysis

Mishra - IIT(ISM) 2021 68

Classification Tree Analysis

▪ Binary classification to identify

factors separating top 25% from Top 25%: Not
bottom 25% producing wells too shallow, not
too deep, long
lateral with
▪ Accuracy: more proppant,
but not too long

BOTTOM TOP CORRECT

25% 25% ID
BOTTOM
62 18 78%
25%
TOP 25% 7 73 91%
TOTAL 69 91 70%

Mishra - IIT(ISM) 2021 69

Ensemble Modeling
M1 — direct averaging; M2 — weighted averaging;
M3a — stacking with NN; M3b — stacking with RF

RMSE
Model Name
(x1k BBL)
M1 37.57
M2 37.45
M3a 36.21
M3b 36.15
LPM 47.12
QPM 40.03
SVR 39.00
RF 38.33
GBM 40.40

Mishra - IIT(ISM) 2021 70

Case Study 2 – Vug Characterization
Howat et al., AAPG (2016)
• Vuggy zones create high-
permeability pathways in
carbonate rocks

• Generally identified from

cores and FMI logs

• Challenge: Can vuggy

zones be identified from
well-log response alone? Zone of high
density vugs

• Approach: use exploratory

data analysis and machine
learning tools to create Large
classification rules Vug

Crystalline
Dolomite

Mishra - IIT(ISM) 2021 71

Machine Learning Phase-1

• Identify vugs in a
single well using
image logs and
core samples

• Using that “truth”

data, train several
models to detect
vugs using sensor
log data only

Mishra - IIT(ISM) 2021 72

Machine Learning Phase-2

• Identify vugs in multiple wells Correct ID

Held Out Well
Rate
• Evaluate using the best Well #1 0.721
performing model from Well #2 0.675
Phase I on the new data Well #3 0.748
▪ Wells held out one at a time Well #4 0.820
Well #5 0.767
▪ Model trained using the other Well #6 0.885
wells, then predicted on the held Well #7 0.733
out well
Well #8 0.604
• Vug correct identification rate Well #9 0.810
range 60% – 90% Well #10 0.820

Mishra - IIT(ISM) 2021 73

Example Predictions on a Well

• Train a final model

using all the wells,
then use it to identify
vugs in wells for which
no image logs are
currently available

• Output file is a
Synthetic Vug Log:
SVL (0-1)

Mishra - IIT(ISM) 2021 74

Mapping Vugs in Multiple Wells

Probability
of Vugs

q = 5 bbl/min q = 5 bbl/min q = 1 bbl/min

Mishra - IIT(ISM) 2021 75

Case Study 3 – Prediction & Optimization
of Drilling Rate of Penetration

• Goal  Fit data-driven

model to predict and
optimize ROP during drilling
• Example  Data from
vertical well in Texas
• Selected features
▪ Weight on bit (WOB)
▪ Rotary speed of drilling (RPM)
▪ Rock strength (UCS)
▪ Flow rate

Hegde and Gray, 2017, JNGSE,

40, 327-335
Mishra - IIT(ISM) 2021
Model Fitting and Validation

• Random Forest
▪ R^2 = 0.96
▪ RMSE = 7.4 ft/hr
▪ Mean error = ~5%

• Linear regression
▪ R^2 = 0.42
▪ RMSE = 18.4 ft/hr
▪ Mean error = ~14%

• Results above are for

Tyler sandstone
• RF error <10% for all
other formations

Mishra - IIT(ISM) 2021

(Near) Real-time Optimization

• Using data-driven model, explore

feature space for improved ROP
▪ Change WOB, RPM, flow
▪ Use feature subspace in vicinity of
depth of interest
▪ Should not extrapolate, but expand
range of training data

• Possible to improve efficiency by

increasing ROP
• Practical considerations would
call for optimization over finite
intervals
Improved performance and cost savings

Mishra - IIT(ISM) 2021

Example [1]
Perez et al.
SPERE April 2005

• Classification tree
analysis for
identifying rock
types from basic well
log attributes
• Accounting for
missing well logs
• Application for
permeability
prediction in Salt
Creek field

Mishra - IIT(ISM) 2021 79

Example [2]
Shelley et al.
SPE-171003, 2014

• Identifying performance
drivers and completion
effectiveness for
Marcellus shale wells
• Predictive model using
ANN (Artificial Neural
Networks)
• Role of different
variables evaluated

Mishra - IIT(ISM) 2021 80

Example [3]
Schuetter et al.
SPE 174905, 2015

• Building proxy model to CO2

geologic sequestration full-
physics simulation output
• Compositional simulation
with 9 inputs and 3 inputs
• Different designs (Box-
Behnken, Maximin LHS,
Maximum Entropy) used to
generate training runs
• Response fitted with
quadratic and kriging models

Mishra - IIT(ISM) 2021 81

Example [4]
Santos et al.
OTC-26275, 2014

• Building prognostic
classifier for specific
turbogenerator failures
during startup

Test Accuracy
• Data from offshore (deg C)
facility – extraction of
Temperaure
Temperature

fuel burning related

Average

features
• RUSBoost and RF
models
Validation Set

• Multi-fold validation
approach for evaluation
Mishra - IIT(ISM) 2021 82
Example [5]
Arumugam et al.
SPE-184062, 2016

• Processing of daily
drilling data to identify
drilling anomalies / Drill, Directional Drill,
Connections increased,
best practices observed excess drag,
observed fresh cuttings
▪ Information retrieval

▪ Conversion to Drill, maintained

structured data ROP/WOB/Torque, No
Drill, tight spot, tank stopped caving observed, Good
increasing, losses in trip tank, hole cleaning, No
▪ Clustering rare cavings, over pulled, vibrations
observed torque, drags
observed
▪ Pattern identification

▪ Knowledge management

Mishra - IIT(ISM) 2021 83

Recap of Lessons Learned

• Problem formulation is important

• Predictive modeling is nuanced
• Multiple competing models may exist
• Data quality/quantity can compromise results
• Unwrapping black-box models is difficult
• Communicating results can be challenging
• Text-based datasets mostly untapped

Mishra - IIT(ISM) 2021 84

Mishra - IIT(ISM) 2021 85
Machine Learning Methods

Software Demo

Mishra - IIT(ISM) 2021 86

R / RATTLE
• R – a freely available language and environment for statistical
computing and graphics which provides a wide variety of statistical
and graphical techniques: linear and nonlinear modelling, statistical
tests, time series analysis, classification, clustering, etc

• https://cran.r-project.org/

• Rattle – a popular GUI for data mining using R. It presents statistical

and visual summaries of data, transforms data so that it can be
readily modelled, builds both unsupervised and supervised machine
learning models from the data, presents the performance of models
graphically, and scores new datasets for deployment into production.

• https://rattle.togaware.com/

Mishra - IIT(ISM) 2021 87

Problem 1 (Regression)
• Data set => Salt_Creek_regress
• 5 well-logs as inputs (log10(LLD), GR, NPHI, RHOB, PEF)
• Permeability as output (ln(Kg))
• Exploratory data analysis (distribution, correlation)
• Modeling (linear, tree, random forest, neural net)
• Validation
• Variable importance

Mishra - IIT(ISM) 2021 88

Problem 2 (Classification)
• Data set => Salt_Creek_classify
• 5 well-logs as inputs (log10(LLD), GR, NPHI, RHOB, PEF)
• Class of permeability as output (high or low)
• Modeling (linear, tree, random forest, SVM, neural net)
• Validation
• Variable importance

Mishra - IIT(ISM) 2021 89

Wrap-up Comments

Mishra - IIT(ISM) 2021 90

Resources

• Mishra, Srikanta; and Akhil Datta-Gupta (2017). Applied Statistical Modeling

and Data Analytics for the Petroleum Geosciences, Elsevier, New York, NY.
• Mohaghegh, Shahab (2017). Data-Driven Reservoir Modeling, Society of
Petroleum Engineers, Richardson, TX.
• Mishra, Siddharth; Hao Li; and Jiabo He (2019). Machine Learning in
Subsurface Characterization, Gulf Professional Publishing, Houston, TX.
• Holdaway, Keith (2014). Harness Oil and Gas Big Data with Analytics: Optimize
Exploration and Production with Data-Driven Models, John Wiley & Sons, New
York, NY.
• Hastie, T., R. Tibshirani, and J.H. Friedman, 2008. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction, Springer, New York, NY.
• Kuhn, M. and K. Johnson, 2013, Applied Predictive Modeling, 2013. Springer,
New York, NY.

Mishra - IIT(ISM) 2021 91

Software Resources

• Mishra and Datta-Gupta (2017) companion site – which includes:

(a) GRACE (non-parametric regression), (b) E-FACIES
(multivariate analysis), (c) E-REGRESS (experimental design),
(d) misc R scripts for machine learning, and (e) example datasets
https://www.elsevier.com/books-and-journals/book-
companion/9780128032794/software
• R – open source statistical analysis software
• Python – Programming language with built-in libraries
• WEKA – suite of machine learning software in Java
• Commercial packages (MATLAB, SAS, SPSS, …..)

Mishra - IIT(ISM) 2021 92

Combining the Old and the New
350

y = 2.0626x + 97.397
300

Initial Well Potential (BOPD)

R2 = 0.5385

250

200

150

100

0
0 10 20 30 40 50 60 70 80 90
Net Pay (ft)

Exploratory Regression Multivariate

Data Analysis Modeling Analysis

0.5

-0.5

-1
1

0.5 1
0.5
0
0
-0.5
-0.5
-1 -1

Machine Experimental Uncertainty

Learning Design Quantification

Mishra - IIT(ISM) 2021 93

Recommended Workflow
• Framing the problem
• Checking the data
• Selecting the causal variables
• Picking the software
• Choosing the modeling technique(s)
• Validating the model
• Understanding and communicating the results

Mishra - IIT(ISM) 2021 94

Challenges for Acceptance of ML

• Our ML models are not very good.

• If I don’t understand the model,

how can I believe it?

• We are still waiting for the “Aha” moment!

• My staff need to learn data science, but how?

Mishra et al., 2021, JPT (March)

Mishra - IIT(ISM) 2021 95

Challenges for Acceptance of ML
Poor Model Quality Lack of Understanding
• Consumer marketing ML/AI models • Articulate adequacy of predictors
are not necessarily highly accurate!
• Demonstrate model robustness
• Need to manage expectations re.
quality of fit for subsurface models • Explain inner workings (key variables)
• Focus more on added value from ML • Use creative visualizations
models and complementary role

“Aha” Moment? Learning Data Science

• ML model may or may not produce • Significant (informal) self-learning to
new insights
become “citizen data scientists”
• Provides an alternative quantitative
input-output relationship • Need formal knowledge of
conventional data analysis, python/R
• Useful when physics-based model is
slow, data-intensive or immature programming, and machine learning

Mishra et al., 2021, JPT (March)

Mishra - IIT(ISM) 2021 96
Q1– Which Software Should I Use?
• Commercial visualization tools
▪ Spotfire, Tableau

• “Light and Easy” statistics tools

▪ Excel, JMP, SPSS, Minitab

• Commercial statistics software

▪ SAS, MATLAB, Stata

• Open source statistics software

▪ R/RATTLE, Python

Mishra - IIT(ISM) 2021 97

Q2– Do I Need Machine Learning?
Linear
Linear

Random
Random Forest
Forest

Mishra - IIT(ISM) 2021 98

Q3– Which Technique Works Best?

• No single technique is Power of Ensemble Modeling

consistent best performer
• Often, multiple competing
models have equally good
fits (R2/RMSE)

• Aggregate models  robust

understanding
and predictions
• Pick “Forest” over “Trees”
• Baseline regression model + one(+) tree-based model (e.g.,
RF, GBM) + one(+) non-tree based model (e.g., SVM, ANN)

Mishra - IIT(ISM) 2021 99

Q4– How Much Data Do I Need?
Large datasets can produce poor
results if key causal variables
are not included in the model

N=81

Robust models can be built with N=2935

small datasets if all relevant causal
variables are included in the model

Mishra - IIT(ISM) 2021 100

Q5– How Do I Learn Data Science?
• Citizen data scientist/analyst
(one who learns from data)
▪ Basic skills  domain knowledge (e.g., PE)

▪ New skills  statistics/ML, programming

• Core (data science) competencies

▪ Data collection, preparation, exploration

▪ Data storage and retrieval

▪ Computing with data

▪ Applied machine learning

▪ Data visualization/communication

Donoho, J. Comp. Graphical Stat, 2017 https://www.linkedin.com/pulse/new-venn-

diagram-data-science-pierluigi-casale/

Mishra - IIT(ISM) 2021 101

Looking Ahead

• Growing trend towards the use of statistical and machine

learning techniques for oil and gas applications
• Goal  “mine” big data and develop data-driven insights
to improve reservoir description & performance prediction
• Broader applications for digital oil field data management,
real-time analytics and predictive maintenance
• Petro-techs need to develop better understanding of
full repertoire of available techniques and their potential
• Data scientists need to understand problem domain to
propose/apply appropriate techniques

Mishra - IIT(ISM) 2021 102

Gartner “Hype” Trajectory

Mishra - IIT(ISM) 2021 103

Final Thoughts

Beware the
hype / manage
expectations

ML comes
after posing
the problem

Don’t forget
the physics

Mishra - IIT(ISM) 2021 104

Dr Srikanta Mishra
Battelle, Columbus, OH
(614) 424-5712
[email protected]

Mishra - IIT(ISM) 2021 105

Machine Learning

Regression/Classification Techniques
Ch. 8, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 106

Regression Trees (RT)

• Regression trees are simple, interpretive models to

describe how the predictors impact the response
• General idea:
▪ Split the predictor space into nested rectangular regions

▪ Within each region, predict the response with a constant value

▪ Can handle non-linear behavior using arbitrarily small regions

• Why is it called a tree?

▪ Rectangular regions are defined by using a branching structure

▪ Each branch is a split obtained by applying a threshold to the value

of one of the predictors

Mishra - IIT(ISM) 2021 107

RT – Schematic

X1 < t1
R3
R2 t3
X2

X2 < t2 X2 < t3
t2
R4
R
R1 R2 R4 R3
t1
X1

Mishra - IIT(ISM) 2021 108

RT – Procedure

• Parameters to be chosen at each split:

▪ Predictor j ; Threshold value s; Predicted response c1 and c2

▪ Find j, s, c1 and c2 to minimize

• Given j and s, the best prediction within each node is just

the mean response
• Pruning (balance between complexity and fit quality)
▪ Grow a tree to nearly full size

▪ Select the subtree that optimizes a complexity criterion

Mishra - IIT(ISM) 2021 109

Example – RT Fit

Regression
True Surface Tree

Mishra - IIT(ISM) 2021 110

RT – Pruning

summation term representing overall node

impurity, and a penalty term combining a
tuning (cost-complexity) parameter and the
number of terminal nodes.

Mishra - IIT(ISM) 2021 111

Different Levels of Pruning

No Pruning More complicated, less pruning

Mishra - IIT(ISM) 2021 112

RT – Pros and Cons

• Advantages
▪ Interpretable

▪ Identifies important predictors and critical values

▪ Easy to fit, fast evaluation

▪ Invariant to monotone transformations of inputs

▪ Resistant to outliers

• Disadvantages
▪ Less accurate than other models

▪ Prone to overfitting (can use pruning to mitigate this)

Mishra - IIT(ISM) 2021 113

Classification Tree Analysis

• Analog of regression trees

• Provides rules that partition output into categories
based on input values

• Useful for determining prediction rules and finding

structure in data

• 2-variable “partition plot” helps visualize separation

of categorical outcomes

• Widely used in medical decision making

Mishra - IIT(ISM) 2021 114

Simple Classification Tree
x1 x2 y
1 3 High
3 1 High
x1 x2 y
4 5 High
3 1 High
9 2 Low 5 4 Low
1 3 High 9 2 Low
11 6 Low
x1 x2 y 5 4 Low
4 5 High 2 7 High
4 5 High
11 6 Low 6 8 High
3 1 High
9 9 High
1 3 High 2 7 High
8 10 High
5 4 Low 6 8 High
6 11 High
9 2 Low 9 9 High
7 12 High
11 6 Low 8 10 High
2 7 High 6 11 High x1 > 4.5
7 12 High
Low
6 8 High
9 9 High
8 10 High x2 < 6.5
6 11 High
7 12 High
x1 < 4.5 High

x2 > 6.5 High

Mishra - IIT(ISM) 2021 115

Random Forest Regression (RF)

• Random forest regression uses an ensemble of trees to

increase performance of a single regression tree

• Same data input always yields the same tree – how to

introduce variation?
▪ Fit each tree with only a subset of the data (a bootstrap sample)

▪ Constrain branches to select from a random selection of predictors

▪ Forces trees to view the dataset from multiple perspectives

• Called a “bagging” approach (bootstrap aggregation)

Mishra - IIT(ISM) 2021 116

RF – Schematic

Mishra - IIT(ISM) 2021 117

RF – Procedure

• Prediction
▪ Observation is passed through all of the trees in the ensemble

▪ Each tree produces a regression estimate

▪ Final estimate is an average of those tree-level estimates

• Built-in cross-validation
▪ Since each tree sees only a subset of the data, the remaining
observations are called out-of-bag samples
▪ For that tree, those out-of-bag samples are independent test data

▪ Used to get error rate estimates for gauging model performance

Mishra - IIT(ISM) 2021 118

Example – RF Fit

True Surface (100 sampled points) Random Forest – 500 Trees

Mishra - IIT(ISM) 2021 119

RF – Pros and Cons

• Advantages
▪ Can handle highly non-linear behavior

▪ Useful built-in methods (cross-validation, variable importance,

proximity measure, missing data imputation)
▪ Invariant to monotone transformations of inputs

▪ Resistant to outliers

• Disadvantages
▪ Not easily interpretable

▪ Slower to fit, although it is parallelizable

▪ Can have more trouble modeling certain types of surfaces

Mishra - IIT(ISM) 2021 120

RF- Classification
• The random forest classifier is trained the same way as it
was in the regression setting
▪ Now classification trees are used instead of regression trees

• Prediction
▪ Observation is passed through all of the trees in the ensemble

▪ Each tree produces a predicted class label

▪ Final label is most popular vote among the trees

• Out-of-bag samples used to estimate misclassification

rate on independent test data
• Advantages and disadvantages are similar to those in the
regression case

Mishra - IIT(ISM) 2021 121

Gradient Boosting Machines (GBM)

• GBMs are a different kind of ensemble method where

models in the ensemble are trained sequentially
• Each model is trained to overcome weaknesses of
previous models in the sequence
• General procedure:
▪ Begin with a base model F0(x)

▪ Repeat for m = 1, …, M:
− Fit a model hm(x) to the negative gradient of the residuals y – Fm-1(x)
− Let Fm(x) = Fm-1(x) + hm(x)
▪ Make predictions with the final model FM(x)

Mishra - IIT(ISM) 2021 122

GBM – Schematic

Mishra - IIT(ISM) 2021 123

Example – GBM Fit

Random Forest – 500 Trees GBM – 150 Trees

Mishra - IIT(ISM) 2021 124

GBM – Pros and Cons

• Advantages
▪ Invariant under all monotone transformations of the input variables

▪ Robust against presence of irrelevant input variables

▪ Easy handling of missing values

▪ Useful built-in methods, similar to Random Forest

▪ Competitive accuracy

• Disadvantages
▪ Can easily overfit

▪ Can take a while to fit, but there are tricks for speeding this up

▪ Not easily interpretable

Mishra - IIT(ISM) 2021 125

GBM - Classification
• GBMs are easily ported over from regression to the
classification setting
• TreeBoost scenario:
▪ Rather than fitting a single model at each step, there are K trees,
one for each group
▪ Negative gradient is based on multinomial deviance, rather than
regression residuals
• Advantages and disadvantages are the same as in the
regression setting
• One of the better models out there for regression or
classification

Mishra - IIT(ISM) 2021 126

Support Vector Regression (SVR)
• SVR machines are linear models with ε-insensitive loss
• Errors within ε-tube are ignored
• vi are the training inputs
• w is the vector of parameters
▪ Goal is to minimize

▪ Alternate formulation:

• Kernel “trick” to handle non-linearities

Mishra - IIT(ISM) 2021 127
SVR – Kernel Trick
• The “kernel trick”
▪ The model is only specified through the dot product of the support
vectors and the predictors (vitx)
▪ The dot product can be replaced by any kernel function

Polynomial

Gaussian

Exponential

Hyperbolic Tangent

▪ Using kernels like the ones above can produce regression fits to
non-linear surfaces
Mishra - IIT(ISM) 2021 128
Kernel Trick - Schematic

Mishra - IIT(ISM) 2021 129

SVR – Pros and Cons

• Advantages
▪ Can capture non-linear behavior
− Kernel function allows adaptability to many situations
▪ Accurate predictor compared to most methods

• Disadvantages
▪ Not easily interpretable

▪ Prediction requires storage of training data

▪ Can be influenced by outliers

Mishra - IIT(ISM) 2021 130

Support Vector Machines
• SVMs1-2 define a hyperplane that separates two classes
▪ Hyperplane maximizes the β0 + xtβ = 1
margin between the classes Group A β0 + xtβ = 0
(Y = 1)

β0 + xtβ = -1
▪ Prediction is made using the sign
of the hyperplane equation
Group B
(Y = -1)

▪ β is a linear combination of vectors lying exactly on the margin

− These are called support vectors

• The kernel trick works here as it does in SVR machines

1. Vapnik V., The Nature of Statistical Learning Theory, Springer, 2000.
2. Hastie, T., R. Tibshirani, and J.H. Friedman, The elements of statistical learning: data mining, inference, and prediction. Second Edition 2008. Springer.

Mishra - IIT(ISM) 2021 131

Neural Networks (NN)

• Neural Nets mimic how the

human brain works
• Neurons collect electrical
impulses from surrounding
cells (Σ)
• These impulses are combined
in a non-linear fashion (σ)
• Electrical impulses are
attenuated by the strengths of
the interconnections between
neurons (weights, bias)

Mishra - IIT(ISM) 2021 132

NN – Procedure

• Each hidden unit (neuron) is a non-linear function of

weighted linear combinations of the inputs
▪ σ is the activation function

▪ Most commonly used

function today is a sigmoid

• Outputs fk(x) are different non-linear functions of weighted

linear combinations of the hidden units
• gk is the output function

Mishra - IIT(ISM) 2021 133

Building the NN Model

• How many parameters?

▪ In a fully connected network
− p+1 parameters for each of M1 units in the first hidden layer
− M1+1 parameters for each of M2 units in the second hidden layer
− Continue until the last layer, with K outputs
▪ Example
− p = 4 inputs
− Two hidden layers (M1 = 5, M2 = 3)
− K = 2 outputs
− That is 5(4+1) + 3(5+1) + 2(3+1) =
25 + 18 + 8 = 51 parameters

Mishra - IIT(ISM) 2021 134

Google Hacking Database
79% (19)
Google Hacking Database
91 pages
300+ Powerful Termux Hacking Tools For Hackers
89% (27)
300+ Powerful Termux Hacking Tools For Hackers
16 pages
The Hacking Bible
81% (27)
The Hacking Bible
101 pages
Mobile Network Hacking
100% (10)
Mobile Network Hacking
119 pages
Hackers Black Book
80% (20)
Hackers Black Book
30 pages
Insights The BoF Brand Magic Index 2023
100% (1)
Insights The BoF Brand Magic Index 2023
74 pages
The Complete Cyber Security Course, Hacking Exposed
96% (28)
The Complete Cyber Security Course, Hacking Exposed
282 pages
Penetration Testing Step-By-Step Guide
92% (12)
Penetration Testing Step-By-Step Guide
417 pages
My Favorate Hacking Sites
90% (29)
My Favorate Hacking Sites
3 pages
Linux Essentials For Cybersecurity
100% (23)
Linux Essentials For Cybersecurity
1,966 pages
Best 20 Hacking Tutorials
93% (29)
Best 20 Hacking Tutorials
404 pages
Wireless Hacking Tools
100% (9)
Wireless Hacking Tools
28 pages
Ethical Hacking With Android Termux 2021
91% (11)
Ethical Hacking With Android Termux 2021
157 pages
Hack Android Phone Remotely
100% (7)
Hack Android Phone Remotely
5 pages
Measuring Variability and Factors Affecting The Agricultural Production: A Ridge Regression Approach
No ratings yet
Measuring Variability and Factors Affecting The Agricultural Production: A Ridge Regression Approach
14 pages
ST2195 Programming For Data Science
No ratings yet
ST2195 Programming For Data Science
11 pages
12 Hacking Cheatsheets That You Must Keep Handy!
60% (5)
12 Hacking Cheatsheets That You Must Keep Handy!
8 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
How To Make Maps in R
100% (1)
How To Make Maps in R
54 pages
Machine Learning Essentials
0% (1)
Machine Learning Essentials
2 pages
Hacking Tools Cheat Sheet
100% (14)
Hacking Tools Cheat Sheet
2 pages
Kali Linux Advanced Methods and Strategies To Learn Kali Linux (BooxRack)
100% (8)
Kali Linux Advanced Methods and Strategies To Learn Kali Linux (BooxRack)
110 pages
Hacking With Kali Linux
100% (10)
Hacking With Kali Linux
170 pages
Kali Linux - The Beginners Guide On Ethical Hacking With Kali
100% (10)
Kali Linux - The Beginners Guide On Ethical Hacking With Kali
75 pages
Certified Blackhat Practical Ways To Hack - Abhishake Banerjee
78% (9)
Certified Blackhat Practical Ways To Hack - Abhishake Banerjee
181 pages
Hacking The Ultimate Hacking For Beginners
100% (11)
Hacking The Ultimate Hacking For Beginners
239 pages
Ethical Hacking With Kali Linux - Made Easy
100% (8)
Ethical Hacking With Kali Linux - Made Easy
185 pages
Ethical Hacking
100% (5)
Ethical Hacking
170 pages
Hacking For Beginners
100% (5)
Hacking For Beginners
255 pages
Ethical Hacking - Learn Penetration Testing, Cybersecurity Wi
100% (6)
Ethical Hacking - Learn Penetration Testing, Cybersecurity Wi
112 pages
Text Mining With R - Twitter Data Analysis
No ratings yet
Text Mining With R - Twitter Data Analysis
24 pages
Powerbivstableau 160912230240
100% (1)
Powerbivstableau 160912230240
34 pages
Social Network Analysis in R PDF
No ratings yet
Social Network Analysis in R PDF
35 pages
Business Intelligence & Business Analytics
No ratings yet
Business Intelligence & Business Analytics
8 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
Powerbi Intro
No ratings yet
Powerbi Intro
46 pages
Why Data Preprocessing?: Incomplete
No ratings yet
Why Data Preprocessing?: Incomplete
17 pages
DATA Mining
No ratings yet
DATA Mining
55 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
An Introduction To Text: Mining
No ratings yet
An Introduction To Text: Mining
39 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
PPTs of Business Analytics
No ratings yet
PPTs of Business Analytics
22 pages
Big Data - Introduction: Ravichandran
100% (1)
Big Data - Introduction: Ravichandran
44 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Augmented Analytics
No ratings yet
Augmented Analytics
8 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
The Next Frontier For Innovation, Competition and Productivity
No ratings yet
The Next Frontier For Innovation, Competition and Productivity
23 pages
An Introduction To Social Network Analysis
100% (8)
An Introduction To Social Network Analysis
38 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
Data Mining
No ratings yet
Data Mining
27 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
117 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
18 pages
Different Types of Regression Models
No ratings yet
Different Types of Regression Models
18 pages
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
No ratings yet
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
20 pages
The Age of Big Data: Kayvan Tirdad
No ratings yet
The Age of Big Data: Kayvan Tirdad
26 pages
Chapter 9 & 10 - Data Warehouse
100% (1)
Chapter 9 & 10 - Data Warehouse
90 pages
Data Preparation For Automated Machine Learning: White Paper
No ratings yet
Data Preparation For Automated Machine Learning: White Paper
21 pages
Lesson 2 - Designing Web Services and Web Maps
No ratings yet
Lesson 2 - Designing Web Services and Web Maps
10 pages
Analyzing Social Media Data in Python Chapter1
No ratings yet
Analyzing Social Media Data in Python Chapter1
21 pages
Business Analytics and Data Science
No ratings yet
Business Analytics and Data Science
25 pages
DataMining S
No ratings yet
DataMining S
103 pages
Lesson 02 2.01 Introduction To Data Science
No ratings yet
Lesson 02 2.01 Introduction To Data Science
31 pages
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
No ratings yet
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
30 pages
Data Science With Python - Lesson 02 - Data Analytics Overview
No ratings yet
Data Science With Python - Lesson 02 - Data Analytics Overview
54 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
Slides l4 Ts
No ratings yet
Slides l4 Ts
162 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
Cluster
100% (1)
Cluster
72 pages
Data Cleaning and Datamining
No ratings yet
Data Cleaning and Datamining
54 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Beginners Guide To Data Science - A Twics Guide 1
100% (1)
Beginners Guide To Data Science - A Twics Guide 1
41 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
Chap8 Basic Cluster Analysis
100% (1)
Chap8 Basic Cluster Analysis
104 pages
Thesis Anum Afzal
No ratings yet
Thesis Anum Afzal
127 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Data Visualization and Discovery For Better Business Decisions
No ratings yet
Data Visualization and Discovery For Better Business Decisions
36 pages
Data Literacy Fundamentals: Understanding the Power & Value of Data
From Everand
Data Literacy Fundamentals: Understanding the Power & Value of Data
Ben Jones
No ratings yet
L0 Sylabus-1
No ratings yet
L0 Sylabus-1
15 pages
Business Analytics Using R
No ratings yet
Business Analytics Using R
1 page
Analytics 02011 Learning Path - Curriculum (6632)
No ratings yet
Analytics 02011 Learning Path - Curriculum (6632)
22 pages
IDMA-1 Hero Data Science
No ratings yet
IDMA-1 Hero Data Science
19 pages
Module 1
No ratings yet
Module 1
138 pages
Kali Linux Tutor
80% (5)
Kali Linux Tutor
24 pages
Open Source Hacking Tools PDF
100% (2)
Open Source Hacking Tools PDF
198 pages
Hacking - Hacking With Python - The Complete Beginner'Srcises (Hacking Series Book 1)
100% (9)
Hacking - Hacking With Python - The Complete Beginner'Srcises (Hacking Series Book 1)
150 pages
Hacking
87% (23)
Hacking
78 pages
ETHICAL HACKING A Beginners Guide To Learn About Ethical Hacking From Scratch and Reconnaissance - SC
100% (3)
ETHICAL HACKING A Beginners Guide To Learn About Ethical Hacking From Scratch and Reconnaissance - SC
130 pages
Hacking With Python
93% (15)
Hacking With Python
501 pages
Multicpmponent Distillation (A Brief Introduction)
No ratings yet
Multicpmponent Distillation (A Brief Introduction)
9 pages
Aspen HYSYS Interview Questions
100% (1)
Aspen HYSYS Interview Questions
1 page
Dynamic Simulation and Process Control With Aspen Hysys
No ratings yet
Dynamic Simulation and Process Control With Aspen Hysys
79 pages
Distillation Column Tray Selection & Sizing
100% (1)
Distillation Column Tray Selection & Sizing
20 pages
Pub 8344
No ratings yet
Pub 8344
6 pages
Transportation Research Part C: Sciencedirect
No ratings yet
Transportation Research Part C: Sciencedirect
26 pages
Chapter 4 - Lecture Notes
No ratings yet
Chapter 4 - Lecture Notes
39 pages
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual pdf download
100% (1)
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual pdf download
46 pages
[2024+issue]+DIRDC2-301-PUB24_319+-+Full+paper+++-+JES++--+AL+--
No ratings yet
[2024+issue]+DIRDC2-301-PUB24_319+-+Full+paper+++-+JES++--+AL+--
10 pages
Performance Metrics ML
No ratings yet
Performance Metrics ML
4 pages
The Influence of Electronic Tax Filing S
No ratings yet
The Influence of Electronic Tax Filing S
24 pages
AI-Lecture-07 (Types of AI-Fake-Real-GenAI and Cost Function)
No ratings yet
AI-Lecture-07 (Types of AI-Fake-Real-GenAI and Cost Function)
34 pages
Forecasting Using Simple Exponential Smoothing Method: Acta Electrotechnica Et Informatica December 2012
No ratings yet
Forecasting Using Simple Exponential Smoothing Method: Acta Electrotechnica Et Informatica December 2012
6 pages
cái gì đó tui quên mất tiu r hưu
No ratings yet
cái gì đó tui quên mất tiu r hưu
21 pages
Ss 2 Economics 1st Term E-note
No ratings yet
Ss 2 Economics 1st Term E-note
77 pages
alamo manual
No ratings yet
alamo manual
30 pages
Industrial - Training - Report (1) (1) Neelu
No ratings yet
Industrial - Training - Report (1) (1) Neelu
17 pages
Disperson SkwenessOriginal
No ratings yet
Disperson SkwenessOriginal
10 pages
CH Chapter 4 Test Bank CH Chapter 4 Test Bank
No ratings yet
CH Chapter 4 Test Bank CH Chapter 4 Test Bank
32 pages
A Survey of Forecast Error Measures
No ratings yet
A Survey of Forecast Error Measures
7 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
BEWIEEE20110621final PDF
No ratings yet
BEWIEEE20110621final PDF
7 pages
Gaussian Sequence Model
No ratings yet
Gaussian Sequence Model
470 pages
MSC Statistics Syllabus First Year Regular Mode
No ratings yet
MSC Statistics Syllabus First Year Regular Mode
18 pages
Lecture 06
No ratings yet
Lecture 06
55 pages
Module 6
No ratings yet
Module 6
82 pages
1 s2.0 S2772508124000383 Main
No ratings yet
1 s2.0 S2772508124000383 Main
12 pages
Automated-Diamond-Price-Prediction-Using-Machine-Learning
No ratings yet
Automated-Diamond-Price-Prediction-Using-Machine-Learning
1 page
E & Ai Lab Manual
No ratings yet
E & Ai Lab Manual
31 pages
Nonparametric Econometrics. A Primer
No ratings yet
Nonparametric Econometrics. A Primer
103 pages
Plant Leaf Disease Detection Using Resnet-50 Based On Deep Learning
No ratings yet
Plant Leaf Disease Detection Using Resnet-50 Based On Deep Learning
17 pages
Airline Delay Model
No ratings yet
Airline Delay Model
11 pages