Nanospore
Nanospore
December 2024
Abstract
Nanopore sensing has emerged as a transformative technology with
applications spanning DNA sequencing, protein analysis, and chemical
sensing. Signal processing plays a pivotal role in enabling high-resolution
analysis and extracting meaningful information from the noisy and com-
plex signals generated by nanopores. This chapter explores recent ad-
vances in nanopore sensing, focusing on the evolving landscape of signal
processing techniques. Topics include noise reduction, feature
extraction, machine learning applications, and real-time analysis. The
discussion also delves into future directions, such as integrating
quantum computing and edge AI for improved accuracy and efficiency.
KEYWORDS: nanopore sensing, signal processing algorithm, pulse-like
signal, spike recognition, feature extraction, analyte identification, ma-
chine learning, neural network
1 Introduction
Nanopore sensing relies on the principle of detecting changes in ionic current as
biomolecules pass through a nanopore embedded in a membrane. The
simplicity of the technique belies the complexity of the signals generated. This
section provides an overview of nanopore sensing technology, its applications,
and the challenges associated with signal analysis. Nanopore sensing has
emerged as a transformative technology in fields ranging from genomics to
environmental monitoring. By leveraging nanoscale pores to analyze molecules
such as DNA, RNA, proteins, and small metabolites, this technology provides a
label-free, single-molecule analysis platform with unparalleled sensitivity and
resolution. Central to its success is the ability to process the ionic current signals
generated as molecules pass through the nanopore, translating these complex
signals into meaningful biological or chemical information.
Advancements in signal processing techniques have been pivotal in address-
ing key challenges in nanopore sensing, including noise reduction, feature ex-
traction, and real-time analysis. From traditional filtering methods to cutting-
edge machine learning algorithms, signal processing innovations have greatly
en- hanced the accuracy, speed, and versatility of nanopore-based systems.
These
1
developments are not only improving current applications, such as DNA se-
quencing, but are also paving the way for future breakthroughs, including early
disease detection, drug discovery, and molecular diagnostics.
Nanopore sensors have been developed for decades to target multiple appli-
cations, including DNA sequencing, protein profiling, small chemical molecule
detection, and nanoparticle characterization. Nanopore sensor is inspired by
the Coulter cell counter and realizes a task by matching its dimension to that of
analytes, molecules or nanoparticles. Thus, it possesses an extremely succinct
structure, a nanoscale pore in an ultrathin membrane. Its sensing function is
based on a simple working principle: the passage of an analyte temporarily
blocks a size-proportional volume of the pore and induces a spike signal on the
monitoring ionic current at a given bias voltage. Information about passing
analytes is hidden in the corresponding current spikes, i.e., translocation spikes
distributed on the ionic current traces. By processing the signal and analyzing
the features of the spikes such as amplitude, width (duration), occurrence fre-
quency, and waveform, the properties of the analytes can be inferred, including
size, shape, charge, dipole moment, and concentration. Therefore, signal pro-
cessing is the crucial link to interpreting the signal by assigning the associated
features to relevant physical properties. In general, signal processing comprises
denoising, spike recognition, feature extraction, and analysis. A powerful signal
processing algorithm should be able to isolate signals from a noisy background,
extract useful information, and utilize the multidimensional information syn-
thetically to accurately derive the properties of the analytes. Low-pass filters
have been adopted as a simple approach to removing the background noise.
However, this function risks filtering out the important high-frequency compo-
nents naturally present in signals representing rapid changes of ionic current
associated with translocation spikes that carry informative waveform details
related to the target analytes. Thus, self-adaptive filters and advanced cur-
rent level tracing algorithms have been developed. Traditional algorithms are
mainly based on a user-defined amplitude threshold as a criterion for detection
of translocation spikes. Apparently, the choice of this threshold determines how
successful a spike is singled out and how good the quality of the subsequent
feature extraction is. However, the threshold is usually chosen based on the ex-
perience of individuals dealing with the data. It is, hence, a subjective process.
Moreover, using the extracted
Nanopores with confined chemical environments and designable nanosensing
interfaces show a good capability to measure individual entities at the nanoscale,
which cannot be individually analyzed in bulk systems Recently, the nanopore
achieves a variety of applications from DNA sequencing to the detection of sin-
gle liposomes by taking advantage of electrokinetic transport of ions and faradic
electron transfer in confinement.Once single analytes enter the confined space of
nanopore electrode at appropriate potential, they temporarily present in con-
finement and undergo electrochemical reaction at the electrochemical sensing
interface, thus generating transient current signals from both faradic and non-
faradaic response. Compared with traditional nanoelectrodes, these nanopore-
based electrodes with well-defined structure provide the opportunity to con-
2
trol over transport and reactivity, which enable to achieve enhanced nanoscale
electrochemical measurements.Because of these features, nanopore electrochem-
istry demonstrates advantages in single nanoparticle (NPs) detection, single
biomolecule probing and single cell analysis.With the further help of advanced
data processing such as Machine Learning (ML) methodologies, nanopore elec-
trochemistry reveals single entity information regarding the diversity and vari-
ability from complex and massive datasets.In this review, we highlight very
recent processes on nanopore electrochemistry. We will start by introducing
the construction and characterization of nanopore-based electrodes. Then, we
will review the literature on the applications of the nanopore-based electrode in
single NP collisions and single living cells sensing. Finally, we will recapitulate
advanced algorithms for nanopore electrochemical data analysis.This article ex-
plores the latest advancements in signal processing for nanopore sensing and
discusses the emerging trends and challenges shaping the future of the field. By
examining the intersection of hardware innovation, computational methods, and
application-driven needs, we aim to provide a comprehensive perspective on the
state-of-the-art in nanopore sensing and the exciting opportunities it holds.
3
Figure 1: Diagram of data process Figure 2: Schematic of the data
module for nanopore analysis process. The recorded experimental
data contain rapid blockages, open-
pore currents, and noise components.
The starting and stopping points of
a typical blockage (blue circles) were
identified by the DBC method
Figure 3: A typical threshold detection scheme involves finding the local baseline
and root mean square (rms) noise level
4
3 Fundamental Challenges in Nanopore Signal
Processing
3.1 Noise Sources:-
Thermal Noise: Caused by random thermal motion of ions and molecules in
the nanopore environment, this noise is inherent and limits the baseline signal
stability.
Electrical Noise: Arising from electronic components such as amplifiers, re-
sistors, or capacitance in the measurement system, electrical noise impacts
signal clarity.
1/f Noise (Flicker Noise): A low-frequency noise that dominates in nanopore
measurements, often linked to material imperfections in nanopores or electrodes.
Pore Blockages or Clogs: Partial or transient blockages of the nanopore by
contaminants or analytes can produce signal fluctuations unrelated to target
molecules.
Translocation Variability: Irregular speeds and orientations of molecules
passing through the nanopore introduce inconsistencies in the current signal.
Environmental Factors: Fluctuations in temperature, pH, and ionic concen-
tration can alter nanopore behavior and the resulting signal.
5
3.3 Data Volume:
Data volume is a significant challenge in nanopore signal processing due to the
large amounts of data generated during molecular analysis. Nanopore-based
systems often operate at high data acquisition rates, especially when
monitoring single-molecule events in real-time. This results in vast datasets
that need to be efficiently processed, stored, and analyzed. The following
aspects highlight the challenges related to data volume:-
Real-time Monitoring: Nanopore sensors generate continuous streams of
data as molecules translocate through the nanopore. For instance, in DNA
sequencing, each base in a DNA strand passing through the pore results in a
series of current changes that are recorded in real time. The data can be highly
granular, producing thousands of data points per second.
Multi-parameter Data: Nanopore sensing systems collect multiple types of
data simultaneously, such as current measurements, voltage, temperature, and
environmental parameters. These data streams can generate a huge volume of
information that must be processed together.
Data Storage Requirements: The volume of data generated in a nanopore
experiment is often too large for traditional data storage solutions. For
example, continuous real-time data from long experiments, such as DNA
sequencing runs, can produce gigabytes to terabytes of raw data. Storing this
data in a way that makes it accessible for further analysis requires scalable and
efficient storage solutions.
Managing Large Datasets for Visualization: Visualizing large datasets in a
meaningful way is challenging. Effective data visualization tools are needed to
provide researchers with insights without overwhelming them with raw data.
The ability to display meaningful patterns, translocation events, or molecular
characteristics in a user-friendly format is crucial.
7
analysis is used in nanopore systems to filter out high-frequency noise (e.g., from
electronic components or power supplies) and retain the signal corresponding
to molecule translocation, which typically occurs at lower frequencies.
Wavelet Transform: The Wavelet Transform (WT) is a powerful technique
for analyzing signals at multiple scales or resolutions. Unlike Fourier transforms,
which can only provide frequency information, wavelets allow for both time
and frequency localization, making it ideal for signals that are non-stationary.
Wavelet-based filtering decomposes the signal into different frequency bands
and allows for selective filtering of noise while preserving the important
features of the translocation events, which may occur at various time scales.The
discrete wavelet transform (DWT) is particularly useful in nanopore sensing
where the ionic current signal may exhibit sharp changes (e.g., molecule
translocations) that need to be preserved, while other noise (such as baseline
drift or high- frequency spikes) is removed.
Kalman Filtering: Kalman filtering is a recursive, real-time estimation tech-
nique that can predict the signal’s state based on prior data and update the
prediction as new measurements come in. This is particularly useful in noisy,
dynamic environments.Kalman filters are used for noise reduction in systems
with rapidly changing signals. They are particularly useful when there is a need
to estimate the true value of the signal based on previous measurements while
correcting for noisy fluctuations. In nanopore sensing, Kalman filters help im-
prove the signal-to-noise ratio (SNR) by predicting the expected signal (based on
past data) and adjusting it with real-time corrections. This is especially helpful
when monitoring continuous molecular translocation events, where noise might
distort the signal.
Moving Average Filtering: Moving average filtering is a simple and com-
monly used technique that averages the values of a signal over a fixed window
of time. This helps smooth out short-term fluctuations and reduces high-
frequency noise. It is most effective when the noise has a high frequency
compared to the signal. A sliding window of values is averaged to suppress
rapid variations that do not represent meaningful changes in the system.In
nanopore sensing, moving average filters are used to smooth out high-
frequency electrical noise or fluctu- ations in the baseline current, helping to
reveal the translocation events more clearly.
High-pass/Low-pass Filtering: High-pass and low-pass filters are used to re-
move noise at specific frequency ranges. Low-pass filters allow signals below
a certain frequency to pass through while attenuating higher-frequency noise,
while high-pass filters do the opposite.These filters are used to isolate the fre-
quency range that corresponds to the nanopore signal (e.g., the frequency
range of molecular translocations) while removing unwanted higher or lower
frequency noise.In nanopore sensing, low-pass filters are typically used to
remove high- frequency noise from electronic components, while high-pass
filters may be ap- plied to remove baseline drift or slow-changing trends that do
not correspond to molecular events.
8
4.2 Event Detection:
Event detection in nanopore sensing refers to identifying and extracting rele-
vant translocation events (e.g., DNA bases, proteins, or other analytes passing
through the nanopore) from the ionic current signal. This task is challenging
due to noise, signal complexity, and variability in the characteristics of translo-
cation events. Below are the key signal processing techniques used for event
detection in nanopore systems:
Threshold-Based Detection: Threshold-based detection involves setting a
predefined threshold for the ionic current. Events are detected when the
current exceeds or drops below this threshold.A fixed or adaptive threshold is
applied to the signal to identify deviations corresponding to translocation
events.Commonly used in early nanopore systems for detecting events like DNA
or protein translo- cations based on sharp current blockades.It’s main
advantages are it’s Simple and computationally efficient and Effective for signals
with high signal-to-noise ratios (SNR). It’s main challenges are Poor performance
in noisy signals or when events have variable amplitudes.
Hidden Markov Models (HMMs): HMMs are probabilistic models that treat
the signal as a sequence of hidden states, each corresponding to a specific event
or baseline condition.The ionic current signal is modeled as transitions between
states (e.g., open-pore, translocation, noise).HMMs predict the most likely se-
quence of states based on observed data.Widely used in DNA sequencing to
identify base-specific current levels or to classify molecular events.It’s main ad-
vantages are it’s Effective for handling noisy signals and can model variable
event durations and amplitudes.It’s main disadvantages are Computationally
expensive for real-time applications and requires careful parameter tuning.
Machine Learning-Based Detection: Machine learning (ML) algorithms, such
as supervised and unsupervised methods, are trained to identify patterns asso-
ciated with events in nanopore signals.Machine learning is increasingly used
for real-time event detection in complex biosensing applications, such as iden-
tifying nucleotide bases in noisy sequencing data.Models (e.g., Support Vector
Machines, Neural Networks) are trained on labeled datasets to classify events
versus noise.Clustering algorithms like k-means are used to group signal fea-
tures into events and non-events.It’s main advantages are High accuracy in
distinguishing events from noise and can adapt to complex, non-linear signal
patterns.Challenges faced are it requires large, high-quality labeled datasets for
training and computationally intensive.
9
Figure 7: General nanopore characteristics at 1 M salt. (a) Current–voltage characteristics of six individual
nanopores with nanopore diameters as indicated. All curves show a linear I–V dependence. (Inset) A transmission
electron microscopy image of the 15.6-nm-diameter nanopore. (b) Resistance values of 28 individual nanopores
as a function of nanopore diameter. The red line represents the resistance of a 25-nm-long cylinder. Resistance
values larger than 2.5 times the resistance indicated by the red line are shown in grey. (c) Current recordings and
histograms of two nanopores (at 100 mV) with substantially different resistance values, illustrating clear differences
in current noise. The nanopores diameters are 20.8 nm (bottom traces) and 22.0 nm (top traces). The current was
filtered at 10 and 1 kHz, as indicated. The black, grey, and blue histograms, shown on the right, are magnified
along the x axis to be visible on the same scale. (d) Current power spectral densities of the two nanopores used in
c, showing 1/f low-frequency noise of different magnitude and comparable high-frequency noise.
10
translocations)and Comparing experimental signals to templates or simulated
signals.Advantages are Effective for detecting hidden periodicity or sequence
pattern and Challenges faced are Less effective for non-repetitive or irregular
events.
Wavelet-Based Feature Extraction:Wavelet transforms decompose the sig-
nal into multiple resolution levels, enabling both time and frequency localiza-
tion.Feratures include Wavelet coefficients at different scales and Energy of spe-
cific wavelet sub-bands.Applications are Identifying transient events with short
time durations. Extracting features for machine learning models in nanopore se-
quencing. Advantages are Handleing non-stationary and time-localized signals
effectively and Challenges are Computationally intensive, especially for large
datasets.
12
In conclusion, statistical modeling is indispensable for nanopore signal pro-
cessing, providing the tools necessary to extract valuable information from noisy
and complex data. Techniques such as HMMs, GMMs, Bayesian inference, and
MLE have proven effective for detecting events, estimating parameters, and
clas- sifying molecules. While traditional models remain widely used,
advancements in computational power and algorithm design are enabling the
integration of more sophisticated approaches, such as machine learning and
non-linear mod- eling, to further enhance the precision and efficiency of
nanopore data analysis.
13
This capability is particularly useful for real-time analysis, where rapid and ac-
curate event detection is essential for guiding experimental
protocols.Supervised learning also plays a significant role in molecular
classification, where the objec- tive is to identify the type or properties of a
molecule based on its interaction with the nanopore. For example, molecules
such as proteins, metabolites, or DNA fragments produce characteristic signal
patterns. Machine learning al- gorithms, trained on labeled data, can classify
these signals into categories, enabling applications such as pathogen detection,
biomarker identification, and drug discovery. Techniques such as gradient-
boosted decision trees or ensem- ble learning models are often employed for
such tasks, as they are effective in handling high-dimensional and noisy
datasets.A key advantage of supervised learning in nanopore sensing is its ability
to adapt to domain-specific challenges, such as noise, baseline drift, and signal
variability. Models are trained to focus on signal features that are most relevant
to the target task while ignoring ir- relevant variations. For example, feature
engineering approaches in traditional machine learning methods (e.g., SVMs or
random forests) involve extracting attributes like signal amplitude, duration,
frequency, and entropy, which are then used as inputs for classification. In
contrast, deep learning models like CNNs and RNNs can automatically learn
these features from raw signals, re- ducing the need for manual preprocessing
and improving performance.Despite its advantages, supervised learning in
nanopore sensing also presents challenges. The performance of these models
heavily depends on the availability of large, high-quality labeled datasets, which
can be difficult and time-consuming to gen- erate. The variability in nanopore
experimental conditions further complicates the generalization of models to
new datasets or setups. Additionally, the com- putational requirements of
training and deploying complex models, especially deep learning architectures,
can be significant, making real-time applications challenging.In conclusion,
supervised learning has revolutionized the field of nanopore sensing by enabling
accurate and efficient analysis of ionic current signals for applications such as
DNA sequencing, event detection, and molecu- lar classification. With
continued advancements in algorithms, computational power, and dataset
curation, supervised learning is poised to further enhance the capabilities of
nanopore technologies, enabling new applications in genomics, proteomics, and
beyond.
14