Deep Learning Workflow
Deep Learning Workflow
Introduction :
Successfully using deep learning requires more than just knowing how to build
neural networks; we also need to know the steps required to apply them in
real-world settings effectively.
In this article, we cover the workflow for a deep learning project: how we build
out deep learning solutions to tackle real-world tasks. The deep learning
Acquiring data
Preprocessing
Splitting and balancing the dataset
Evaluation
Hyperparameter tuning
we get enough labeled data?” The more labeled data we have, the better our
model can be. Our ability to acquire data can make or break our solution. Not
only is getting data usually the most important part of a deep learning project,
host thousands of large, labeled data sources. Working with these curated
Existing Databases
In some cases, our organization may have a large dataset on hand. Often,
(RDMS). In this case, we can build our specific dataset using SQL queries.
Web scraping/APIs
Online news, social media posts, and search results represent rich streams of
data, which we can leverage for our deep learning projects. We do this via Web
scraping: the extraction of data from websites. While scraping and collecting
consent issues. There are many tools to web scrape in Python, including
BeautifulSoup. Many sites, like Reddit and Twitter, have Python Application
Programming Interfaces (APIs). We can use APIs to gather data from different
applications. While some APIs are free, others are paid services.
Depending on the size of the dataset, we may be able to directly write our
scraped data to raw data files (e.g., .txt or .csv). However, for larger datasets,
Crowd-sourced Labeling
For many tasks, it’s much easier to acquire data than it is to find labeled data.
For example, it is much easier to scrape the raw text from an entire Reddit
subreddit than to correctly label the contents of each Reddit post. When
possibility would be for us to go through our own data, and annotate each
Amazon Mechanical Turk. We can utilize these sites to pay “gig” workers for
Part 2: Preprocessing
Once we have built our dataset, we need to preprocess it into useful features
for our deep learning models. At a high-level, we have three primary goals
when preprocessing data for neural networks: we want to 1) clean our data, 2)
handle categorical features and text and 3) scale our real-valued features
Cleaning data
Often, our datasets contain noisy examples, extra features, missing data and
unnecessary features, fill-in missing data, and filter out noisy examples.
Scaling features
Because we initialize neural networks with small weights to stabilize training,
our models will struggle when faced with input features that have large
normalize features so that they are between 0 and 1, and standardize them so
handle categorical variables by assigning each option its own unique integer
text, we need to handle a few extra processing steps before encoding our
words as integers. These steps include tokenizing our data (splitting our text
into individual words/tokens), and padding our data (adding padding tokens to
split our data into two datasets: training and validation. In certain cases, we
also create a third holdout dataset, referred to as the test set. When we don’t
do this, we often use the terms “validation” and “test” sets interchangeably.
We train our model on the training dataset and we evaluate it on the validation
dataset. If we have defined a third holdout test set, we test our model on this
dataset after we have finished selecting our model and tuning our
hyperparameters that only happen to work well on the data we chose for our
validation set.
When splitting our dataset, there are two major considerations: the size of our
splits, and whether we will stratify our data. After we split our data, we need to
into training and validation datasets and specifies the size of our validation
data.
classification; it’s very possible that more instances of our minority classes
end up in either the training or the validation set. In the first case, our
validation metrics will not accurately capture our model’s ability to classify the
minority class. In the second case, the model will overestimate the probability
The solution is to use a stratified split: a split that ensures the training and
validation sets have the same proportion of examples from each class.
array of labels, the function will compute the proportion of each class, and
ensure that this ratio is the same in our training and validation data.
data, our resulting model will be heavily biased towards predicting those
majority classes. This is especially problematic because usually we care
much more about identifying instances of the minority classes (like rare cases
There are two main approaches to dealing with imbalanced training data:
with utmost caution, and it’s best to have a domain expert on hand to weigh
in.
Almost always, we only correct the imbalance in our training data, and leave
the validation data as is. In order to only augment our training data, we need to
We never oversample our data before we split it. If we do, copies of our testing
data can sneak into our training data. This is called information leak.
layers.
For each layer, we also need to select a reasonable number of hidden units.
There is no absolute science to choosing the right size for each layer, nor the
Usually, we create each layer with between 32 and 512 hidden units.
0.01.
set. When we provide a validation set at training time, Keras handles this
automatically. Our performance on the validation set gives us a sense for how
our data set is heavily imbalanced, accuracy (and even AUC) will be less
meaningful. In this case, we likely want to consider metrics like precision and
recall. F1-score is another useful metric that combines both precision and
recall. A confusion matrix can help visualize what data-points are
training and evaluating our model, we explore different learning rates, batch
As we tune our parameters, we should watch our loss and metrics, and be on
Unstable learning means that we likely need to reduce our learning rate
means we are overfitting, and should reduce the size of our model, or
Poor performance on both the training and the test set means that we
rate.
Critically, because neural network weights are randomly initialized, your scores
Once our results are satisfactory, we are ready to use our model!
If we made a holdout test set, separate from our validation data, now is when
we use it to test out our model. The holdout test set provides a final guarantee
This is especially true in industry settings, when our networks will be used by
our coworkers and customers, or working behind the scenes in our products
How will we handle the compute requirements for running our models?
It takes a significant amount of computation to evaluate a single input using a
neural network, let alone manage traffic from many different users. As a result,
host the container where it can access powerful computing resources. Cloud
platforms like AWS, GCP and Azure are great places to start. These platforms
provide flexible hosting services for applications that can scale up to meet
changing demand.
How will we pass inputs into the model?
A common approach for interfacing with our model over the web is Flask, a
Python-based web framework. Flask can handle requests and pass inputs to
our model.
How will we run the code and manage dependencies, wherever we host our
application? (Optional)
This can depend on where we host our model. However, a popular
containers are a way to package up our code and its dependencies (e.g. the
correct version of TensorFlow), in such a way that our application can run
Conclusion
In this article, we covered the general workflow for a deep learning project. We
covered a lot of material, so don’t sweat every detail. Our goal is to provide a
sense for the overarching flow of a deep learning project, from data
deep learning project, we often pivot back and forth between different steps,