0% found this document useful (0 votes)
11 views5 pages

Class 23

The document discusses the integration of traditional digital signal processing (DSP) and neural networks in geophysics, highlighting how neural networks complement DSP in handling large data volumes. It explains the architecture of neural networks, the training process through supervised learning, and the importance of loss functions and validation data to prevent overfitting. Examples of applications in data denoising and seismic event discrimination are also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Class 23

The document discusses the integration of traditional digital signal processing (DSP) and neural networks in geophysics, highlighting how neural networks complement DSP in handling large data volumes. It explains the architecture of neural networks, the training process through supervised learning, and the importance of loss functions and validation data to prevent overfitting. Examples of applications in data denoising and seismic event discrimination are also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Theory of Digital Data Processing in Geophysics Class 23

1. Traditional DSP is based on elegant algorithms that require relatively few


parameters. These algorithms can transform signals into a more useful rep-
resentation. Neural networks can also be used to transform signals into
more useful representations, but they do this using simple algorithms and
vast numbers of highly-optimized parameters.
2. With recent growth in data volumes and computing power, traditional DSP
is becoming increasingly complemented by neural networks for ’big data’
research.
3. Often traditional DSP and neural networks work together. For instance,
traditional DSP may be used for preprocessing of data prior to application
of a neural network.1
4. Modern neural networks typically consist of large numbers of layers but
an illustration of a three-layer fully-connected neural network is shown in
Figure 1. In this classical architecture, the inputs are scaled by weights
(parameters of the model) and summed.2 The sum is fed through a nonlinear
activation function that allows the neural network to learn much more
complex representations from the data.
5. There is a wide variety of architectures of neural networks that are used
today (e.g., Figure 2).3
6. The way in which the weights in the neural network are defined is through
training. The flowchart in Figure 3 illustrates how this process works for
a type of training called supervised learning. The network is exposed
to a set of labeled data and learns the weights required to optimize the
predictions of the network on the training data. It does this by minimizing
a loss function.
7. In supervised learning, the loss function quantifies the difference between
the predictions and targets. A common loss function is the mean squared
1
This is also called data reduction. The basic idea is to transform the signal to highlight the most pertinent properties
for a particular application.
2
If this were the only component, the summations and weights of the hidden and output layers could be combined into
a single output layer (i.e., the architecture would collapse to a two-layer model). If the two-layer model had N inputs and
N outputs, the consequence would be a weighted sum, identical to convolution (the DFT can also be implemented as a
two-layer neural network).
3
Some of these use convolutional layers instead of fully-connected layers. Convolutional layers are based on correlation
and have the advantage that they can learn local patterns and spatial hierarchies of patterns (i.e., they don’t use information
on all the data).

1
Figure 1: An illustration of a fully connected neural network. In this representation, the input layer is
passive; each node receives an input (e.g., X1i could represent the i’th sample of a signal) and sends
that as output to each subsequent node. The nodes of the hidden layer and output layer are active (see
inset).

error:

N
1 X  (i) ′(i)
2
J(w)
⃗ = y −y (1)
N i=0

The loss function is minimized by gradient descent using a learning rate


that describes how rapidly to adjust the weights of the model (Figure 4).
8. When training a neural network, the data are split into training data and
testing data4 . For large datasets, the split can be simple but for small
datasets it is hard to make sure that training and testing datasets have the
same statistical properties. To address this problem, K-fold cross-validation
involves dividing the dataset into k subsets, or folds. The model is trained
and evaluated k times using a different fold as the test set each time (Figure
5).
9. The loss function is often analyzed as a function of epoch, where the model
has been exposed to all training data in an epoch. While the model weights
4
The training data are often subset into training and validation data. Validation data are used to tune hyper-parameters
of the model

2
Figure 2: Sketches of deep neural networks (DNNs). The blue lines indicate inputs, and the orange lines
indicate outputs. The length of the blue and orange lines represents the data dimension. The green
lines indicate intermedia connections. (a) In a fully connected neural network (FCNN), the inputs of
one layer are connected to every unit in the next layer. f stands for a nonlinear activation function. In
(b–f), the details of the layers are omitted and only the shape of the network is represented. Note that
(b) is a vanilla Convolutional Neural Network (CNN) that is cascaded by convolutional layers, pooling
layers, nonlinear layer. (c) is a Convolutional Autoencoder (CAE) that uses convolutional layers in the
encoder and deconvolutional layers in the decoder. each network architecture.

always improve the loss on the training data, an independent set of data is
held out (called validation data) to assess the point at which the model is
overfitting. Figure 6 shows example training curves, showing that there is
overfitting after 3 epochs.
10. Examples of neural networks are numerous. We’ll consider two examples.5

(a) An approach to denoise data (e.g., Zhu et al., 2019), that relies on tradi-
tional DSP (STFT) for pre-processing but extends traditional frequency
filtering.
(b) An approach to discriminate between certain types of seismic events
using the STFT and array processing for pre-processing (e.g., Ronac-
Giannone et al., 2024).

5
When reading these papers, it helps me to review Figure 3 and to think through the input X and the predictions and
true targets, Y’ and Y.

3
Figure 3: An illustration of the training process. The goal is to find the value of the weights that
minimize the loss, which is a measure of the difference between the predictions and true targets.

Figure 4: The learning rate is often adjusted as a function of epoch.

4
Figure 5: Two strategies commonly used for separating data into testing and training sets.

Figure 6: Training and validation loss from an example of training a neural network model.

You might also like