Unit 4 Notes New
Unit 4 Notes New
Introduction on Deep Learning – DFF network CNN- RNN for Sequences – Biomedical Image
and Signal Analysis – Natural Language Processing and Data Mining for Clinical Data – Mobile
Imaging and Analytics – Clinical Decision Support System.
I
Introduction to Deep Learning
Deep learning is a branch of machine learning which is based on artificial neural networks. It is capable of
learning complex patterns and relationships within data. In deep learning, we don’t need to explicitly
program everything. It has become increasingly popular in recent years due to the advances in processing
power and the availability of large datasets. Because it is based on artificial neural networks (ANNs) also
known as deep neural networks (DNNs). These neural networks are inspired by the structure and function
of the human brain’s biological neurons, and they are designed to learn from large amounts of data.
1. Deep Learning is a subfield of Machine Learning that involves the use of neural networks to model and
solve complex problems. Neural networks are modeled after the structure and function of the human
brain and consist of layers of interconnected nodes that process and transform data.
2. The key characteristic of Deep Learning is the use of deep neural networks, which have multiple layers
of interconnected nodes. These networks can learn complex representations of data by discovering
hierarchical patterns and features in the data. Deep Learning algorithms can automatically learn and
improve from data without the need for manual feature engineering.
3. Deep Learning has achieved significant success in various fields, including image recognition, natural
language processing, speech recognition, and recommendation systems. Some of the popular Deep
Learning architectures include Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Deep Belief Networks (DBNs).
4. Training deep neural networks typically requires a large amount of data and computational resources.
However, the availability of cloud computing and the development of specialized hardware, such as
Graphics Processing Units (GPUs), has made it easier to train deep neural networks.
Deep learning is the branch of machine learning which is based on artificial neural network architecture.
An artificial neural network or ANN uses layers of interconnected nodes called neurons that work together
to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers connected
one after the other. Each neuron receives input from the previous layer neurons or the input layer. The
output of one neuron becomes the input to other neurons in the next layer of the network, and this process
continues until the final layer produces the output of the network. The layers of the neural network
transform the input data through a series of nonlinear transformations, allowing the network to learn
complex representations of the input data.
Artificial neural networks
Artificial neural networks are built on the principles of the structure and operation of human neurons. It
is also known as neural networks or neural nets. An artificial neural network’s input layer, which is the
first layer, receives input from external sources and passes it on to the hidden layer, which is the second
layer. Each neuron in the hidden layer gets information from the neurons in the previous layer, computes
the weighted total, and then transfers it to the neurons in the next layer. These connections are weighted,
which means that the impacts of the inputs from the preceding layer are more or less optimized by giving
each input a distinct weight. These weights are then adjusted during the training process to enhance the
performance of the model.
Machine learning and deep learning both are subsets of artificial intelligence but there are many
similarities and differences between them.
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features which are Relevant features are automatically extracted
manually extracted from images to detect an object from images. It is an end-to-end learning
in the image. process.
It can work on the CPU or requires less computing It requires a high-performance computer with
power as compared to deep learning. GPU.
Deep Learning models are able to automatically learn features from the data, which makes them well-
suited for tasks such as image recognition, speech recognition, and natural language processing. The most
widely used architectures in deep learning are feedforward neural networks, convolutional neural
networks (CNNs), and recurrent neural networks (RNNs).
Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow of information
through the network. FNNs have been widely used for tasks such as image classification, speech
recognition, and natural language processing.
Convolutional Neural Networks (CNNs) are specifically for image and video recognition tasks. CNNs
are able to automatically learn features from the images, which makes them well-suited for tasks such as
image classification, object detection, and image segmentation.
Recurrent Neural Networks (RNNs) are a type of neural network that is able to process sequential data,
such as time series and natural language. RNNs are able to maintain an internal state that captures
information about the previous inputs, which makes them well-suited for tasks such as speech recognition,
natural language processing, and language translation.
Computer vision
In computer vision, Deep learning models can enable machines to identify and understand visual data.
Some of the main applications of deep learning in computer vision include:
• Object detection and recognition: Deep learning model can be used to identify and locate objects
within images and videos, making it possible for machines to perform tasks such as self-driving cars,
surveillance, and robotics.
• Image classification: Deep learning models can be used to classify images into categories such as
animals, plants, and buildings. This is used in applications such as medical imaging, quality control,
and image retrieval.
• Image segmentation: Deep learning models can be used for image segmentation into different
regions, making it possible to identify specific features within images.
Natural language processing (NLP):
In NLP, the Deep learning model can enable machines to understand and generate human language.
Some of the main applications of deep learning in NLP include:
• Automatic Text Generation – Deep learning model can learn the corpus of text and new text like
summaries, essays can be automatically generated using these trained models.
• Language translation: Deep learning models can translate text from one language to another,
making it possible to communicate with people from different linguistic backgrounds.
• Sentiment analysis: Deep learning models can analyze the sentiment of a piece of text, making it
possible to determine whether the text is positive, negative, or neutral. This is used in applications
such as customer service, social media monitoring, and political analysis.
• Speech recognition: Deep learning models can recognize and transcribe spoken words, making it
possible to perform tasks such as speech-to-text conversion, voice search, and voice-controlled
devices.
Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action in an environment to
maximize a reward. Some of the main applications of deep learning in reinforcement learning include:
• Game playing: Deep reinforcement learning models have been able to beat human experts at games
such as Go, Chess, and Atari.
• Robotics: Deep reinforcement learning models can be used to train robots to perform complex tasks
such as grasping objects, navigation, and manipulation.
• Control systems: Deep reinforcement learning models can be used to control complex systems such
as power grids, traffic management, and supply chain optimization.
Challenges in Deep Learning
Deep learning has made significant advancements in various fields, but there are still some challenges
that need to be addressed. Here are some of the main challenges in deep learning:
1. Data availability: It requires large amounts of data to learn from. For using deep learning it’s a big
concern to gather as much data for training.
2. Computational Resources: For training the deep learning model, it is computationally expensive
because it requires specialized hardware like GPUs and TPUs.
3. Time-consuming: While working on sequential data depending on the computational resource it can
take very large even in days or months.
4. Interpretability: Deep learning models are complex, it works like a black box. it is very difficult to
interpret the result.
5. Overfitting: when the model is trained again and again, it becomes too specialized for the training
data, leading to overfitting and poor performance on new data.
1. High accuracy: Deep Learning algorithms can achieve state-of-the-art performance in various tasks,
such as image recognition and natural language processing.
2. Automated feature engineering: Deep Learning algorithms can automatically discover and learn
relevant features from data without the need for manual feature engineering.
3. Scalability: Deep Learning models can scale to handle large and complex datasets, and can learn
from massive amounts of data.
4. Flexibility: Deep Learning models can be applied to a wide range of tasks and can handle various
types of data, such as images, text, and speech.
5. Continual improvement: Deep Learning models can continually improve their performance as more
data becomes available.
Disadvantages of Deep Learning:
1. High computational requirements: Deep Learning models require large amounts of data and
computational resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a large amount of
labeled data for training, which can be expensive and time- consuming to acquire.
3. Interpretability: Deep Learning models can be challenging to interpret, making it difficult to
understand how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the training data, resulting in poor
performance on new and unseen data.
4. Black-box nature: Deep Learning models are often treated as black boxes, making it difficult to
understand how they work and how they arrived at their predictions.
In summary, while Deep Learning offers many advantages, including high accuracy and scalability,
it also has some disadvantages, such as high computational requirements, the need for large amounts
of labeled data, and interpretability challenges. These limitations need to be carefully considered
when deciding whether to use Deep Learning for a specific task.
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture
commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in
this layer is equal to the total number of features in our data (number of pixels in the case of an
image).
2. Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be
many hidden layers depending on our model and data size. Each hidden layer can have different
numbers of neurons which are generally greater than the number of features. The output from
each layer is computed by matrix multiplication of the output of the previous layer with
learnable weights of that layer and then by the addition of learnable biases followed by
activation function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid
or softmax which converts the output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained from the above step is called
feedforward, we then calculate the error using an error function, some common error functions are cross-
entropy, square loss error, etc. The error function measures how well the network is performing. After that,
we backpropagate into the model by calculating the derivatives. This step is called Backpropagation which
basically is used to minimize the loss.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling
layer, and fully connected layers.
Convolution Neural Networks or covnets are neural networks that share their parameters. Imagine you have
an image. It can be represented as a cuboid having its length, width (dimension of the image), and height
(i.e the channel as images generally have red, green, and blue channels).
Now imagine taking a small patch of this image and running a small neural network, called a filter or kernel
on it, with say, K outputs and representing them vertically. Now slide that neural network across the whole
image, as a result, we will get another image with different widths, heights, and depths. Instead of just R,
G, and B channels now we have more channels but lesser width and height. This operation is called
Convolution. If the patch size is the same as that of the image it will be a regular neural network. Because
of this small patch, we have fewer weights.
● Convolution layers consist of a set of learnable filters (or kernels) having small widths and
heights and the same depth as that of input volume (3 if the input layer is image input).
● For example, if we have to run convolution on an image with dimensions 34x34x3. The
possible size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as
compared to the image dimension.
● During the forward pass, we slide each filter across the whole input volume step by step where
each step is called stride (which can have a value of 2, 3, or even 4 for high-dimensional
images) and compute the dot product between the kernel weights and patch from input volume.
● As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a
result, we’ll get output volume having a depth equal to the number of filters. The network will
learn all the filters.
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence
of layers, and every layer transforms one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
● Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input
will be an image or a sequence of images. This layer holds the raw input of the image with
width 32, height 32, and depth 3.
● Convolutional Layers: This is the layer, which is used to extract the feature from the input
dataset. It applies a set of learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image
data and computes the dot product between kernel weight and the corresponding input image
patch. The output of this layer is referred as feature maps. Suppose we use a total of 12 filters
for this layer we’ll get an output volume of dimension 32 x 32 x 12.
● Activation Layer: By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network. it will apply an element-wise activation
function to the output of the convolution layer. Some common activation functions are RELU:
max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume
will have dimensions 32 x 32 x 12.
● Pooling layer: This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory and also prevents
overfitting. Two common types of pooling layers are max pooling and average pooling. If we
use a max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension
16x16x12.
● Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.
● Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.
● Output Layer: The output from the fully connected layers is then fed into a logistic function
for classification tasks like sigmoid or softmax which converts the output of each class into the
probability score of each class.
Example:
Let’s consider an image and apply the convolution layer, activation layer, and pooling layer operation to
extract the inside feature.
Input image:
Step:
# convolution layer
conv_fn = tf.nn.conv2d
image_filter = conv_fn(
input=image,
filters=kernel,
strides=1, # or (1, 1)
padding='SAME',
)
plt.figure(figsize=(15, 5))
plt.imshow(
tf.squeeze(image_filter)
)
plt.axis('off')
plt.title('Convolution')
# activation layer
relu_fn = tf.nn.relu
# Image detection
image_detect = relu_fn(image_filter)
plt.subplot(1, 3, 2)
plt.imshow(
# Reformat for plotting
tf.squeeze(image_detect)
)
plt.axis('off')
plt.title('Activation')
# Pooling layer
pool = tf.nn.pool
image_condense = pool(input=image_detect,
window_shape=(2, 2),
pooling_type='MAX',
strides=(2, 2),
padding='SAME',
)
plt.subplot(1, 3, 3)
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.title('Pooling')
plt.show()
Output:
Advantages of Convolutional Neural Networks (CNNs):
1. Good at detecting patterns and features in images, videos, and audio signals.
2. Robust to translation, rotation, and scaling invariance.
3. End-to-end training, no need for manual feature extraction.
4. Can handle large amounts of data and achieve high accuracy.
Typically a deep NN will perform better than a shallow one. However, it is not always necessary to use a
deep network. The choice will largely depend on the task at hand.
If you are working with many inputs such as image data, then using a Deep Feed-Forward (DFF) or a
Convolutional Neural Network (CNN) would likely yield better results than a shallow Feed-Forward
network.
However, suppose your task is to do some basic classification with a limited number of inputs. In that
case, you may be better off using a shallow FF or even a tree-based algorithm such as XGBoost, Random
Forest, or a single Decision tree.
So, going back to the point of depth, the simple answer is that deeper networks tend to deliver better
performance on more complex tasks.
Everyday use cases: Deep Feed-Forward Neural Networks are good at complex tasks, so they are often
used for classification problems, including those of small, low-resolution images. However, a
Convolutional Neural Network (CNN) is a better choice if you are working with larger images.
Sequence Models
Sequence models are the machine learning models that input or output sequences of data.
Sequential data includes text streams, audio clips, video clips, time-series data and etc.
1. Speech recognition: In speech recognition, an audio clip is given as an input and then the
model has to generate its text transcript. Here both the input and output are sequences of
data.
Speech recognition (Source: Author)
3. Video Activity Recognition: In video activity recognition, the model needs to identify the
activity in a video clip. A video clip is a sequence of video frames, therefore in case of video
These examples show that there are different applications of sequence models. Sometimes
both the input and output are sequences, in some either the input or the output is a sequence.
Recurrent neural network (RNN) is a popular sequence model that has shown efficient
Network architecture that is specialized for processing sequential data. RNNs are mostly used in the
field of Natural Language Processing (NLP). RNN maintains internal memory, due to this they are
very efficient for machine learning problems that involve sequential data. RNNs are also used in time
The main advantage of using RNNs instead of standard neural networks is that the features
are not shared in standard neural networks. Weights are shared across time in RNN. RNNs
can remember its previous inputs but Standard Neural Networks are not capable of
Loss function
In RNN loss function is defined based on the loss at each time step.
RNN Architectures
There are several RNN architectures based on the number of inputs and outputs,
1. One to Many Architecture: Image captioning is one good example of this architecture. In
image captioning, it takes one image and then outputs a sequence of words. Here there is
In this case, the input is a sequence of words and output is a binary classification.
3. Many to Many Architecture: There are two cases in many to many architectures,
● The first type is when the input length equals to the output length. Name entity
recognition is one good example where the number of words in the input
● The second type of many to many architecture is when input length does not equal
to the output length. Machine translation is one good scenario for this
then converts it to another language. Here input length and output length are
different.
RNN architectures
vanishing gradient problem. When training very deep network gradients or the derivatives decreases
exponentially as it propagates down the layers. This is known as Vanishing Gradient Problem. These
gradients are used to update the weights of neural networks. When the gradients vanish then the
completely stop the neural network from training. This vanishing gradient problem is a common
To overcome this vanishing gradient problem in RNNs, Long Short-Term Memory was
RNN hidden layer. LSTM has enabled RNNs to remember its inputs over a long period
of time. In LSTM in addition to the hidden state, a cell state is passed to the next time step.
Internal structure of basic RNN and LSTM unit (Source: stanford.edu)
LSTM can capture long-range dependencies. It can have memory about previous inputs for
extended time durations. There are 3 gates in an LSTM cell. Memory manipulations in
LSTM are done using these gates. Long short-term memory (LSTM) utilizes gates to control
● Forget Gate: Forget gate removes the information that is no longer useful in the
cell state
● Input Gate: Additional useful information to the cell state is added by input gate
● Output Gate: Additional useful information to the cell state is added by output
gate
This gating mechanism of LSTM has allowed the network to learn the conditions for when
Google’s voice search are some real-world examples that have used the LSTM algorithm and
it is behind the success of those applications. Recent research has shown how the LSTM
algorithm can improve the performance of the machine learning model. LSTM is also used
The motivation for using AI and NLP in healthcare is rooted in improving patient care and treatment
outcomes while reducing healthcare costs. The healthcare industry generates vast amounts of data,
including EMRs, clinical notes, and health-related social media posts, that can provide valuable insights
into patient health and treatment outcomes. However, much of this data is unstructured and difficult to
analyze manually.
Additionally, the healthcare industry faces several challenges, such as an aging population,
These challenges have led to a growing need for more efficient and effective healthcare delivery.
By providing valuable insights from unstructured medical data, NLP can help to improve patient care and
treatment outcomes and support healthcare professionals in making more informed clinical decisions.
Natural Language Processing?
Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) that deals with the interaction
between computers and human languages. It uses computational techniques to analyze, understand, and generate
human language.
Natural language processing is used in various applications, including speech recognition, machine translation,
sentiment analysis, and text summarization. Text summarization using NLP automates the process of reducing
lengthy text into shorter summaries, which is useful in domains such as financial research, media monitoring,
and question-answer bots. This technique saves time and effort in comprehending complex texts while retaining
essential information.
We will now explore the various NLP Techniques, libraries, and frameworks.
There are two commonly used techniques used in the NLP industry.
2. Statistical Techniques: use machine learning algorithms to analyze and understand language
These techniques involve creating a set of hand-crafted rules or patterns to extract meaningful information from
text. Rule-based systems typically work by defining specific patterns that match the target information, such as
named entities or specific keywords, and then extracting that information based on those patterns. Rule-based
systems are fast, reliable, and straightforward, but they are limited by the quality and number of rules defined,
For example, a rule-based system for named entity recognition could be designed to identify proper nouns in
text and categorize them into predefined entity types, such as a person, location, organization, disease, drugs,
etc. The system would use a series of rules to identify patterns in the text that match the criteria for each entity
type, such as capitalization for person names or specific keywords for organizations.
patterns. Machine learning models can be trained on large amounts of annotated data, making them more flexible
and scalable than rule-based systems. Several types of machine learning models are used in NLP, including
decision trees, random forests, support vector machines, and neural networks.
For example, a machine learning model for sentiment analysis could be trained on a large corpus of annotated
text, where each text is tagged as positive, negative, or neutral. The model would learn the statistical patterns in
the data that distinguish between positive and negative text and then use those patterns to make predictions on
new, unseen text. The advantage of this approach is that the model can learn to identify sentiment patterns that
Transfer Learning
These techniques are a hybrid approach combining the strengths of rule-based and machine-learning models.
Transfer learning uses a pre-trained machine learning model, such as a language model trained on a large corpus
of text, as a starting point for fine-tuning a specific task or domain. This approach leverages the general
knowledge learned from the pre-trained model, reducing the amount of labeled data required for training and
For example, a transfer learning approach to named entity recognition could fine-tune a pre-trained language
model on a smaller corpus of annotated medical text. The model would start with the general knowledge learned
from the pre-trained model and then adjust its weights to match the medical text’s patterns better. This approach
would reduce the amount of labeled data required for training and result in a more accurate model for named
Natural Language Processing (NLP) libraries and frameworks are software tools that help develop and deploy
NLP applications. Several NLP libraries and frameworks are available, each with strengths, weaknesses, and
focus areas.
These tools vary in terms of the complexity of the algorithms they support, the size of the models they can
handle, the ease of use, and the degree of customization they allow.
Large language models are trained on massive amounts of data. Can generate human-like text and perform a
Here are some examples of large language models and a brief description of each:
GPT-3 (Generative Pretrained Transformer 3): Developed by OpenAI, GPT-3 is a large transformer-based
language model that uses deep learning algorithms to generate human-like text. It has been trained on a massive
corpus of text data, allowing it to generate coherent and contextually appropriate text responses based on a
prompt.
transformer-based language model that has been pre-trained on a large corpus of text data. It is designed to
perform well on a wide range of NLP tasks, such as named entity recognition, question answering, and text
RoBERTa (Robustly Optimized BERT Approach): Developed by Facebook AI, RoBERTa is a variant of BERT
that has been fine-tuned and optimized for NLP tasks. It has been trained on a larger corpus of text data and uses
a different training strategy than BERT, leading to improved performance on NLP benchmarks.
ELMo (Embeddings from Language Models): Developed by Allen Institute for AI, ELMo is a deep
contextualized word representation model that uses a bidirectional LSTM (Long Short-Term Memory) network
to learn language representations from a large corpus of text data. ELMo can be fine-tuned for specific NLP
ULMFiT (Universal Language Model Fine-Tuning): Developed by FastAI, ULMFiT is a transfer learning
method that fine-tunes a pre-trained language model on a specific NLP task using a small amount of task-specific
annotated data. ULMFiT has achieved state-of-the-art performance on a wide range of NLP benchmarks and is
traditional NLP models to understand and process. Additionally, clinical text often includes important
information such as disease, drugs, patient information, diagnoses, and treatment plans, which require
specialized NLP models that can accurately extract and understand this medical information.
Another reason clinical text needs different NLP models is that it contains a large amount of data spread across
different sources, such as EHRs, clinical notes, and radiology reports, which need to be integrated. This requires
models that can process and understand the text and link and integrate the data across different sources and
Lastly, clinical text often contains sensitive patient information and needs to be protected by strict regulations
such as HIPAA. NLP models used to process clinical text must be able to identify and protect sensitive patient
The textual data within medicine requires a specialized Natural Language Processing (NLP) system capable of
extracting medical information from various sources such as clinical texts and other medical documents.
Here is a list of NLP libraries and models specific to the medical domain:
spaCy: It is an open-source NLP library that provides out-of-the-box models for various domains, including the
medical domain.
ScispaCy: A specialized version of spaCy that is trained specifically on scientific and biomedical text, which
BioBERT: A pre-trained transformer-based model specifically designed for the biomedical domain. It is pre-
ClinicalBERT: Another pre-trained model designed to process clinical notes & discharge summaries from the
MIMIC-III database.
Med7: A transformer-based model that was trained on electronic health records (EHR) to extract seven key
DisMod-ML: A probabilistic modeling framework for disease modeling that uses NLP techniques to process
medical text.
MEDIC: A rule-based NLP system for extracting medical information from text.
These are some of the popular NLP libraries and models that are specifically designed for the medical domain.
They offer a range of features, from pre-trained models to rule-based systems, and can help healthcare
In our NER model, we will use spaCy and Scispacy. These libraries are comparatively easy to run on Google
infrastructure.
Medical text data can be obtained from various sources, such as electronic health records (EHRs), medical journals,
clinical notes, medical websites, and databases. Some of these sources provide publicly available datasets that can be
used for training NLP models, while others may require approval and ethical considerations before accessing the data.
1. Open-source medical corpora such as the MIMIC-III database is a large, openly accessible electronic health records
(EHRs) database from patients who received care at the Beth Israel Deaconess Medical Center between 2001 and
2012. The database includes information such as patient demographics, vital signs, laboratory tests, medications,
procedures, and notes from healthcare professionals, such as nurses and physicians. Additionally, the database
includes information on patients’ ICU stays, including the type of ICU, length of stay, and outcomes. The data in
MIMIC-III is de-identified and can be used for research purposes to support the development of predictive models
2. The National Library of Medicine’s ClinicalTrials.gov website has clinical trial data & disease surveillance data.
3. National Institutes of Health’s National Library of Medicine, National Centers for Biotechnology Information
4. Healthcare institutions and organizations such as hospitals, clinics, and pharmaceutical companies generate large
amounts of medical text data through electronic health records, clinical notes, medical transcription, and medical
reports.
5. Medical research journals and databases, such as PubMed and CINAHL, contain vast amounts of published medical
6. Social media platforms like Twitter can provide real-time insights into patient perspectives, drug reviews, and
experiences.
To train NLP models using medical text data, it is important to consider the data’s quality and relevance and ensure
that it is properly pre-processed and formatted. Additionally, it is important to adhere to ethical and legal
Clinical data refers to information about individuals’ healthcare, including patient medical history, diagnoses,
treatments, lab results, imaging studies, and other relevant health information.
EHR/EMR data are linked to Demographic data (This includes personal information such as age, gender, ethnicity,
and contact information.), Patient-generated data (This type of data is generated by patients themselves, including
information collected through patient-reported outcome measures and patient-generated health data.)
Genomic Data: This type relates to an individual’s genetic information, including DNA sequences and markers.
Wearable Device Data: This data includes information collected from wearable devices such as fitness trackers and
heart monitors.
Each type of clinical data plays a unique role in providing a comprehensive view of a patient’s health and is used in
different ways by healthcare providers and researchers to improve patient care and inform treatment decisions.
Natural Language Processing (NLP) has been widely adopted in the healthcare industry and has several use cases.
Population Health: NLP can be used to process large amounts of unstructured medical data such as medical records,
surveys, and claims data to identify patterns, correlations, and insights. This helps in monitoring population health and
Patient Care: NLP can be used to process patients’ electronic health records (EHRs) to extract vital information such
as diagnosis, medications, and symptoms. This information can be used to improve patient care and provide
personalized treatment.
Disease Detection: NLP can be used to process large amounts of text data, such as scientific articles, news articles,
NLP can be used to analyze patients’ electronic health records to provide real-time decision support to healthcare
providers. This helps in providing the best possible treatment options and improving the overall quality of care.
Clinical Trial: NLP can process clinical trial data to identify correlations and potential new treatments.
Drugs Adverse Events: NLP can be used to process large amounts of drug safety data to identify adverse events
Precision Health: NLP can be used to process genomic data and medical records to identify personalized
Medical Professional’s Efficiency Improvement: NLP can automate routine tasks such as medical coding, data
entry, and claim processing, freeing medical professionals to focus on providing better patient care.
These are just a few examples of how NLP revolutionizes the healthcare industry. As NLP technology continues
to advance, we can expect to see more innovative uses of NLP in healthcare in the future.
We will develop a step-by-step Spacy pipeline using SciSpacy NER Model for Clinical Text.
Objective: This project aims to construct an NLP pipeline utilizing SciSpacy to perform custom Named Entity
Outcome: The outcome will be extracting information regarding diseases, drugs, and drug doses from clinical
Solution Design:
Here is the high-level solution to extract entity information from Clinical Text. NER extraction is important NLP
en_ner_bc5cdr_md-0.5.1 is a spaCy model for named entity recognition (NER) in the biomedical domain.
The “bc5cdr” refers to the BC5CDR corpus, a biomedical text corpus used to train the model. The “md” in
the name refers to the biomedical domain. The “0.5.1” in the name refers to the version of the model.
We will use the sample “transcription” text from mtsample.csv and annotate using a rule-based pattern to
Step-by-Step Code:
Install spacy & scispacy Packages. spaCy models are designed to perform specific NLP tasks, such as
The en_ner_bc5cdr_md-0.5.1 model is specifically designed to recognize named entities in biomedical text,
This model can be useful for NLP tasks in the biomedical domain, such as information extraction, text
Import Packages
import scispacy
import spacy
#Core models
import en_core_sci_sm
import en_core_sci_md
#NER specific models
import en_ner_bc5cdr_md
#Tools for extracting & displaying data
from spacy import displacy
import pandas as pd
Python Code:
# Pick specific transcription to use (row 3, column "transcription") and test the scispacy NER model
text = mtsample_df.loc[10, "transcription"]
nlp_sm = en_core_sci_sm.load()
doc = nlp_sm(text)
#Display resulting
entity extraction
displacy_image = displacy.render(doc, jupyter=True,style='ent')
Note the entity is tagged here. Mostly medicalterms. However, these are generic entities.
Now Load the specific model: en_core_sci_md and pass text through
nlp_md = en_core_sci_md.load()
doc = nlp_md(text)
#Display resulting entity extraction
displacy_image = displacy.render(doc, jupyter=True,style='ent')
Now Load specific model: import en_ner_bc5cdr_md and pass text through
nlp_bc = en_ner_bc5cdr_md.load()
doc = nlp_bc(text)
#Display resulting entity extraction
displacy_image = displacy.render(doc, jupyter=True,style='ent')
Process the clinical text dropping NAN values and creating a random smaller sample for the custom entity
model.
mtsample_df.dropna(subset=['transcription'], inplace=True)
mtsample_df_subset = mtsample_df.sample(n=100, replace=False, random_state=42)
mtsample_df_subset.info()
mtsample_df_subset.head()
spaCy matcher – The rule-based matching resembles the usage of regular expressions, but spaCy
provides additional capabilities. Using the tokens and relationships within a document enables you
to identify patterns that include entities with the help of NER models. The goal is to locate drug
names and their dosages from the text, which could help detect medication errors by comparing
The goal is to locate drug names and their dosages from the text, which could help
The output will display the entities extracted from the clinical text sample.
Now we can see the pipeline extracted Disease, Drugs(Chemicals), and Drugs-Doses information from the
clinical text.
There is some misclassification, but we can increase the model’s performance using more data.
We can now use these medical entities in various tasks like disease detection, predictive analysis, clinical
decision support system, medical text classification, summarization, questions -answering, and many more.
For example, the increasing amount of knowledge and evidence released has made finding, retrieving,
collecting and validating the data a very complex and time-consuming process.
⚫ The limitation and lack of resources, especially human resources, in the health system also limits the
evidence and time to use by the healthcare teams. Under these circumstances, providing a system that
can facilitate the process of data retrieval, analysis and inference based on the best practice
recommendations could be very helpful.
⚫ Other drivers of CDSS usage such as increased computer literacy and acquiring of necessary skills of
using information and communication technology and its derivatives have also resulted in popularity
and acceptance of CDSSs in healthcare systems.
⚫ There are also some pushing drivers that facilitate the process of CDSS implementation and
application. For example, the availability of ICT infrastructure and facilities has made implementing
CDSSs possible, easy and relatively inexpensive. The existence of cheap, powerful and accessible
data storage, management, recovery and analysis tools, in addition to internet, web and cloud
technologies, made the design and implementation of all these components facilitated, cost-effective
and simply possible.
⚫ With electronic health records (EHR) in place and enormous improvements in computers’ capacities
and analytical capabilities, CDSSs are becoming an inevitable tool in health settings that intend to
provide highest standards of health service to their patients. CDSSs analyze data within EHRs
to provide alerts and reminders to assist health care providers in benefiting from the existing and
approved clinical guidelines at the point of care.
Preserving the same structure, under the hood CDSSs vary in ways they come to a conclusion, falling into
two types — knowledge-based and nonknowledge-based systems.
Knowledge-based CDSS
Systems of this type are built on top of a knowledge base in which every piece of data is structured in the
form of if-then rules. For instance, if a new order for a blood test is placed and if the same blood test was
made within the past 24 hours, then a duplication is possible.
The inference engine runs the built-in logic to combine the evidence-based rules with the patient’s
medical history and data on his or her current condition. The results come in the form of alerts, reminders,
diagnostic suggestions, a series of treatment options or ranked lists of possible solutions while the final
word rests with a human expert.
Nonknowledge-based CDSS
The core difference from the previous group consists of applying machine learning models. Rather than
consulting with a library of predefined if-then rules, such a system learns from past experiences and finds
patterns in historical data. These are the two techniques most widely used in such CDSSs:
⚫ Genetic algorithms (GA) reflecting the mechanics of natural selection described by Charles Darwin.
Just as species change from generation to generation to better fit their environment, GAs adapt to a
new task, producing a number of random solutions and then iteratively evaluating and improving
them until the most fitting option is found.
⚫ Artificial neural networks (ANN) that mimic human thinking. Similar to human brains, ANNs have
a set of “neurons” called “neurods.” They are linked to each other with weighted connections that act
as nerve synapses transmitting signals across the neural network.
⚫ Nonknowledge-based systems come with a promise to significantly cut healthcare costs and relieve
the pressure on medical experts. However, there are issues preventing their large-scale adoption.
They include a compute-intensive and time-consuming training process and the requirement of large
datasets needed to improve accuracy of models. But the main obstacle is the lack of interpretability
as systems can’t explain the reasoning behind generated decisions.
Key Components:
Drug selection
Statistics show that 7,000 to 9,000 US patients die annually because of medication errors. Besides that, a lot
more people suffer from complications caused by inappropriate medicines, ill-judged dosage, or drug
incompatibility, increasing treatment costs by over 40 billion a year.
The good news is that nearly 50 percent of medication errors happen at the first — ordering or prescribing -
stage. So, mistakes can be spotted and prevented before they cause any harm. And that’s where a decision
support tool comes in handy, eliminating risks from human factors like distraction that accounts for around 75
percent of medication errors.
Using critical patient data such as weight, age, allergy status, and current prescriptions, CDSSs may
automatically deal with the following tasks.
Drug allergy checking. The system matches an ordered medication against a patient’s list of documented
allergies, evaluates the probability of unwanted reactions, and generates alerts.
Basic guidance on dosage. Dosing errors account for over 60 percent of all the prescribing mistakes. But this
can be improved by a corresponding decision support module. In the simplest scenario, the software
component generates a patient-specific list of recommended dosing parameters for a particular medication. It
saves a clinician time on selection of the most appropriate dosage and frequency. The CDSS may as well alert
experts to exceeding dosing limits.
Checking for duplicate therapy. A duplicate therapy occurs when two or more drugs with the same active
ingredient are prescribed simultaneously. It leads to overdose and related adverse effects. The CDSS feature
addressing this problem compares a newly added drug with active ingredients of drugs in a patient’s profile. If
a match is detected, the system generates an alert.
Drug interactions checking. A drug's interaction with other substances may change its expected effect.
Based on clinical documentation at hand, a CDSS considers interactions of a newly prescribed
medication with
• other drugs in a patient’s list (drug-drug interactions or DDI),
• food and beverage (drug-food interactions),
• herbals,
• ethanol,
• testing (if the medication can affect the results of laboratory tests),
• pregnancy and lactation, and
• a patient’s disease that can also affect the drug’s performance (drug-disease interactions).
Today, many Computer Provider Order Entry (CPOE) systems come equipped with drug safety
components that perform duplicate therapy, DDI and drug-dose checking. But you may find
separate decision support modules to complete existing software as well. The example of a
single-task solution is a drug allergy checker by PEPID which can be integrated with any EHR or
other healthcare information system.
Diagnostic support
CDSSs for disease identification are called diagnostic decision support systems (DDSSs) or
medical diagnosis systems (MDSs). They compare information on a patient's condition with a
knowledge base and generate a list of possible diagnoses.
A specific example of a DDSS is a solution utilizing deep learning for diagnostic imaging. It
would traditionally focus on a specific problem area — say, lung abnormalities or a particular
type of cancer. Similar to other CDS tools, AI-fueled programs work as a second pair of eyes and
make suggestions and alerts — rather than come to a final conclusion.
Cost containment
Integrated in a CPOE system, decision support tools may decrease treatment costs by suggesting
cheaper drug alternatives or spotting test duplications. Studies revealed that CDSSs save hospital
units hundreds of thousands of dollars per year by alerting to cases of excessive medical testing.
Clinical management
Some clinics employ decision support software to enhance adherence to clinical guidance.
Similar to information about drugs and diseases, hospital rules can be encoded into a knowledge-
based CDSS in the form of IF-THEN-ELSE pieces of information. Such solutions perform
various tasks, from prompting nurses to take specific measurements according to a protocol to
informing doctors about patients who don’t follow their treatment plans.
It means that prior to ordering an expensive test for a Medicare patient, a physician must consult
a clinical decision support mechanism. Without a verdict made by a CDS tool ("appropriate,"
"may be appropriate," or "rarely appropriate,"), service providers won’t receive reimbursement
for their procedures.
Now the program undergoes the so-called educational and testing period with no penalties
charged while the full implementation is scheduled for 2021. At the end of the day, when all
technological challenges are solved, the AUC initiative is expected to benefit both providers and
patients by enhancing the quality of care and reducing expenses on tests.
No matter the primary declared reason for implementation, all CDSSs are built with a matter-of-
course intention to cut healthcare costs while improving patient safety. The next part will
describe the advances of existing systems in achieving this ambitious goal.