0% found this document useful (0 votes)
14 views21 pages

Deep Learning and Neural Networks

Stacking is an ensemble learning technique that combines predictions from multiple models to improve accuracy, while random forests aggregate outputs from numerous decision trees to reduce overfitting and enhance performance. Random forests utilize bootstrapping and feature bagging to create diverse datasets and models, making them effective for both regression and classification tasks. Deep learning, a subset of machine learning, employs multi-layered neural networks to automate feature extraction and improve predictive capabilities, enabling advancements in AI applications.

Uploaded by

musavvirk04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Deep Learning and Neural Networks

Stacking is an ensemble learning technique that combines predictions from multiple models to improve accuracy, while random forests aggregate outputs from numerous decision trees to reduce overfitting and enhance performance. Random forests utilize bootstrapping and feature bagging to create diverse datasets and models, making them effective for both regression and classification tasks. Deep learning, a subset of machine learning, employs multi-layered neural networks to automate feature extraction and improve predictive capabilities, enabling advancements in AI applications.

Uploaded by

musavvirk04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Stacking

Stacking is an ensemble learning technique that uses predictions from multiple


models (for example decision tree, knn or svm) to build a new model. This model is
used for making predictions on the test set. Below is a step-wise explanation for a
simple stacked ensemble:

1.The train set is split into 10 parts.

2. A base model (suppose a decision tree) is fitted on 9 parts and predictions are made
for the 10th part. This is done for each part of the train set.

3. The base model (in this case, decision tree) is then fitted on the whole train dataset.
4. Using this model, predictions are made on the test set.
5. Steps 2 to 4 are repeated for another base model (say knn) resulting in another set
of predictions for the train set and test set.

6. The predictions from the train set are used as features to build a new model.

7. This model is used to make final predictions on the test prediction set.
What Are Random Forests?

As the name suggests, a forest is a collection of many trees. And that’s true for
random forests as well – they are a collection of many individual decision trees. In
machine learning, this is referred to as ensemble modelling. In general, ensemble
methods use multiple learning algorithms to obtain better predictive performance
than any of the constituent learning algorithms alone. So, in our case, the collection
of decision trees as a whole unit behaves much better than any stand-alone decision
tree. In short, random forests rely on combining the results of many different decision
trees.

The random forest algorithm is one of the few non-neural network models that give
very high accuracy for both regression and classification tasks. It simply gives good
results. And while decision trees do provide great interpretability, when it comes
down to performance, they lose against random forests. In fact, unless transparency
of the model is a priority, almost every data scientist and analyst will use random
forests over decision trees.
How Do Random Forests Work?
Through a process called bootstrapping, we can create many different datasets that
share the general correlation of the original dataset. And what can we do with similar,
though slightly different datasets? Well, we can train ML algorithms with them and
obtain similar, yet slightly different models.
Since we are talking about random forests, those models would be, of course,
decision trees. And so, we have, let’s say, a hundred decision trees trained on a
hundred bootstrap samples. What do we do with them?
Well, we aggregate them. In other words, we combine all the different outputs into
one, usually through a majority voting system. Therefore, the most common output
among the different decision trees is the one we consider the final output.
So, for example, if 30 models say that the input corresponds to a rocket, 26 say it is a
plane, and 44 determine that it corresponds to a car, the final output would be a car.
That ensemble is called Bagged decision trees – “Bagged” stemming from Bootstrap
Aggregating. The advantages of having many different trees mainly come in the form
of reducing overfitting. Since we have different data for the different trees, it is harder
for them to model every individual datapoint, which is what overfitting is. However,
there’s something more this collection needs to become a random forest.
Random forests employ one additional crucial detail – they don’t consider all the
features at once. Just as every decision tree is trained with a different dataset, it is also
trained with a random subset of the input features.
Machine Learning with Decision Trees and Random Forests: Next Steps
How it works
Random forest algorithms have three main hyperparameters, which need to be set
before training. These include node size, the number of trees, and the number of
features sampled. From there, the random forest classifier can be used to solve for
regression or classification problems.

The random forest algorithm is made up of a collection of decision trees, and each
tree in the ensemble is comprised of a data sample drawn from a training set with
replacement, called the bootstrap sample. Of that training sample, one-third of it is set
aside as test data, known as the out-of-bag (oob) sample, which we’ll come back to
later. Another instance of randomness is then injected through feature bagging,
adding more diversity to the dataset and reducing the correlation among decision
trees. Depending on the type of problem, the determination of the prediction will
vary. For a regression task, the individual decision trees will be averaged, and for a
classification task, a majority vote—i.e. the most frequent categorical variable—will
yield the predicted class. Finally, the oob sample is then used for cross-validation,
finalizing that prediction.

Random Forest Feature Importance

Another great quality of the random forest algorithm is that it is very easy to measure
the relative importance of each feature on the prediction.

In a decision tree, each internal node represents a ‘test’ on an attribute (e.g., whether a
coin flip comes up heads or tails), each branch represents the outcome of the test, and
each leaf node represents a class label (decision taken after computing all attributes).
A node that has no children is a leaf.

By looking at the feature importance you can decide which features to possibly drop
because they don’t contribute enough (or sometimes nothing at all) to the prediction
process. This is important because a general rule in machine learning is that the more
features you have the more likely your model will suffer from overfitting and vice
versa.
Below is a table and visualization showing the importance of 13 features, which are
used during a supervised classification project with the famous Titanic dataset on
kaggle.

Key Benefits
Benefits and challenges of random forest
There are a number of key advantages and challenges that the random forest
algorithm presents when used for classification or regression problems. Some of them
include:

 Reduced risk of overfitting: Decision trees run the risk of overfitting as they
tend to tightly fit all the samples within training data. However, when there’s a robust
number of decision trees in a random forest, the classifier won’t overfit the model
since the averaging of uncorrelated trees lowers the overall variance and prediction
error.
 Provides flexibility: Since random forest can handle both regression and
classification tasks with a high degree of accuracy, it is a popular method among data
scientists. Feature bagging also makes the random forest classifier an effective tool
for estimating missing values as it maintains accuracy when a portion of the data is
missing.
 Easy to determine feature importance: Random forest makes it easy to
evaluate variable importance, or contribution, to the model. There are a few ways to
evaluate feature importance. Gini importance and mean decrease in impurity (MDI)
are usually used to measure how much the model’s accuracy decreases when a given
variable is excluded. However, permutation importance, also known as mean
decrease accuracy (MDA), is another importance measure. MDA identifies the
average decrease in accuracy by randomly permutating the feature values in oob
samples.

Key Challenges

 Time-consuming process: Since random forest algorithms can handle large


data sets, they can be provide more accurate predictions, but can be slow to process
data as they are computing data for each individual decision tree.
 Requires more resources: Since random forests process larger data sets,
they’ll require more resources to store that data.
 More complex: The prediction of a single decision tree is easier to interpret
when compared to a forest of them.

Random forest applications

The random forest algorithm has been applied across a number of industries, allowing
them to make better business decisions. Some use cases include:

 Finance: It is a preferred algorithm over others as it reduces time spent on data


management and pre-processing tasks. It can be used to evaluate customers
with high credit risk, to detect fraud, and option pricing problems.
 Healthcare: The random forest algorithm has applications within
computational biology (link resides outside ibm.com), allowing doctors to
tackle problems such as gene expression classification, biomarker discovery,
and sequence annotation. As a result, doctors can make estimates around drug
responses to specific medications.
 E-commerce: It can be used for recommendation engines for cross-sell
purposes.

What is deep learning?


Deep learning is a subset of machine learning that uses multi-layered neural
networks, called deep neural networks, to simulate the complex decision-making
power of the human brain. Some form of deep learning powers most of the artificial
intelligence (AI) in our lives today.
Deep learning is a type of machine learning and artificial intelligence (AI) that
imitates the way humans gain certain types of knowledge. Deep learning models can
be taught to perform classification tasks and recognize patterns in photos, text, audio
and other various data. It is also used to automate tasks that would normally need
human intelligence, such as describing images or transcribing audio files.
Deep learning is an important element of data science, including statistics and
predictive modeling. It is extremely beneficial to data scientists who are tasked with
collecting, analyzing and interpreting large amounts of data; deep learning makes this
process faster and easier.
Deep learning enables a computer to learn by example. To understand deep learning,
imagine a toddler whose first word is dog. The toddler learns what a dog is -- and is
not -- by pointing to objects and saying the word dog. The parent says, "Yes, that is a
dog," or, "No, that is not a dog." As the toddler continues to point to objects, he
becomes more aware of the features that all dogs possess. What the toddler is doing,
without knowing it, is clarifying a complex abstraction: the concept of dog. They are
doing this by building a hierarchy in which each level of abstraction is created with
knowledge that was gained from the preceding layer of the hierarchy.

How
Does
Deep Learning Work?

Deep learning algorithms attempt to draw similar conclusions as humans would


by constantly analyzing data with a given logical structure. To achieve this, deep
learning uses a multi-layered structure of algorithms called neural networks.
The design of the neural network is based on the structure of the human brain.
Just as we use our brains to identify patterns and classify different types of
information, we can teach neural networks to perform the same tasks on data.

The individual layers of neural networks can also be thought of as a sort of filter
that works from gross to subtle, which increases the likelihood of detecting and
outputting a correct result. The human brain works similarly. Whenever we
receive new information, the brain tries to compare it with known objects. The
same concept is also used by deep neural networks.

Neural networks enable us to perform many tasks, such as clustering ,


classification or regression .

With neural networks, we can group or sort unlabeled data according to


similarities among samples in the data. Or, in the case of classification, we can
train the network on a labeled data set in order to classify the samples in the data
set into different categories.
In general, neural networks can perform the same tasks as classical machine
learning algorithms (but classical algorithms cannot perform the same tasks as
neural networks). In other words, artificial neural networks have unique
capabilities that enable deep learning models to solve tasks that machine
learning models can never solve.

All recent advances in artificial intelligence in recent years are due to deep
learning. Without deep learning, we would not have self-driving cars, chatbots
or personal assistants like Alexa and Siri. Google Translate would continue to be
as primitive as it was before Google switched to neural networks and Netflix
would have no idea which movies to suggest. Neural networks are behind all of
these deep learning applications and technologies.

A new industrial revolution is taking place, driven by artificial neural networks


and deep learning. At the end of the day, deep learning is the best and most
obvious approach to real machine intelligence we’ve ever had.

Deep learning programs have multiple layers of interconnected nodes, with each
layer building upon the last to refine and optimize predictions and classifications.
Deep learning performs nonlinear transformations to its input and uses what it learns
to create a statistical model as output. Iterations continue until the output has reached
an acceptable level of accuracy. The number of processing layers through which data
must pass is what inspired the label deep.

In traditional machine learning, the learning process is supervised, and the


programmer must be extremely specific when telling the computer what types of
things it should be looking for to decide if an image contains a dog or does not
contain a dog. This is a laborious process called feature extraction, and the computer's
success rate depends entirely upon the programmer's ability to accurately define a
feature set for dog. The advantage of deep learning is the program builds the feature
set by itself without supervision.

Initially, the computer program might be provided with training data -- a set of
images for which a human has labeled each image dog or not dog with metatags. The
program uses the information it receives from the training data to create a feature set
for dog and build a predictive model. In this case, the model the computer first
creates might predict that anything in an image that has four legs and a tail should be
labeled dog. Of course, the program is not aware of the labels four legs or tail. It
simply looks for patterns of pixels in the digital data. With each iteration, the
predictive model becomes more complex and more accurate.
Unlike the toddler, who takes weeks or even months to understand the concept of
dog, a computer program that uses deep learning algorithms can be shown a training
set and sort through millions of images, accurately identifying which images have
dogs in them, within a few minutes.
To achieve an acceptable level of accuracy, deep learning programs require access to
immense amounts of training data and processing power, neither of which were easily
available to programmers until the era of big data and cloud computing. Because
deep learning programming can create complex statistical models directly from its
own iterative output, it is able to create accurate predictive models from large
quantities of unlabeled, unstructured data.

Why Is Deep Learning Popular?


Deep learning models are more powerful than machine learning models but
why?

 NO FEATURE EXTRACTION

The first advantage of deep learning over machine learning is the redundancy of
the so-called feature extraction.

Long before we began using deep learning, we relied on traditional machine


learning methods including decision trees,SVM,naïve Bayes classifier and
logistic regression. These algorithms are also called flat algorithms. “Flat” here
refers to the fact these algorithms cannot normally be applied directly to the raw
data (such as .csv, images, text, etc.). We need a preprocessing step called feature
extraction.

The result of feature extraction is a representation of the given raw data that
these classic machine learning algorithms can use to perform a task. For
example, we can now classify the data into several categories or classes. Feature
extraction is usually quite complex and requires detailed knowledge of the
problem domain. This preprocessing layer must be adapted, tested and refined
over several iterations for optimal results.

Deep learning’s artificial neural networks don’t need the feature extraction step.
The layers are able to learn an implicit representation of the raw data directly
and on their own.

Here’s how it works: A more and more abstract and compressed representation
of the raw data is produced over several layers of an artificial neural net. We then
use this compressed representation of the input data to produce the result. The
result can be, for example, the classification of the input data into different
classes.

In other words, we can say that the feature extraction step is already part of the
process that takes place in an artificial neural network.

During the training process, this neural network optimizes this step to obtain the
best possible abstract representation of the input data. This means that deep
learning models require little to no manual effort to perform and optimize the
feature extraction process.

Let’s look at a concrete example. If you want to use a machine learning model to
determine if a particular image is showing a car or not, we humans first need to
identify the unique features of a car (shape, size, windows, wheels, etc.), then
extract the feature and give it to the algorithm as input data. In this way, the
algorithm would perform a classification of the images. That is, in machine
learning, a programmer must intervene directly in the action for the model to
come to a conclusion.

In the case of a deep learning model, the feature extraction step is completely
unnecessary. The model would recognize these unique characteristics of a car
and make correct predictions without human intervention.

In fact, refraining from extracting the characteristics of data applies to every


other task you’ll ever do with neural networks. Simply give the raw data to the
neural network and the model will do the rest.

Deep learning methods


Various methods can be used to create strong deep learning models. These techniques
include learning rate decay, transfer learning, training from scratch and dropout.
 Learning rate decay
The learning rate is a hyperparameter -- a factor that defines the system or sets
conditions for its operation prior to the learning process -- that controls how much
change the model experiences in response to the estimated error every time the model
weights are altered. Learning rates that are too high may result in unstable training
processes or the learning of a suboptimal set of weights. Learning rates that are too small
may produce a lengthy training process that has the potential to get stuck.
The learning rate decay method -- also called learning rate annealing or adaptive
learning rate-- is the process of adapting the learning rate to increase performance and
reduce training time. The easiest and most common adaptations of learning rate during
training include techniques to reduce the learning rate over time.
 Transfer learning
This process involves perfecting a previously trained model; it requires an interface to
the internals of a preexisting network. First, users feed the existing network new data
containing previously unknown classifications. Once adjustments are made to the
network, new tasks can be performed with more specific categorizing abilities. This
method has the advantage of requiring much less data than others, thus reducing
computation time to minutes or hours.
 Training from scratch
This method requires a developer to collect a large, labeled data set and configure a
network architecture that can learn the features and model. This technique is especially
useful for new applications, as well as applications with many output categories.
However, overall, it is a less common approach, as it requires inordinate amounts of
data, causing training to take days or weeks.
 Dropout
This method attempts to solve the problem of overfitting in networks with large amounts
of parameters by randomly dropping units and their connections from the neural network
during training. It has been proven that the dropout method can improve the performance
of neural networks on supervised learning tasks in areas such as speech recognition,
document classification and computational biology.

Deep learning neural networks


A type of advanced machine learning algorithm, known as an artificial neural network
(ANN), underpins most deep learning models. As a result, deep learning may sometimes
be referred to as deep neural learning or deep neural network (DDN).
DDNs consist of input, hidden and output layers. Input nodes act as a layer to place input
data. The number of output layers and nodes required change per output. For example,
yes or no outputs only need two nodes, while outputs with more data require more
nodes. The hidden layers are multiple layers that process and pass data to other layers in
the neural network.
Neural networks come in several different forms, including the following:
 Recurrent neural networks.
 Convolutional neural networks.
 ANNs and feed.
 Forward neural networks.
Each type of neural network has benefits for specific use cases. However, they all
function in somewhat similar ways -- by feeding data in and letting the model figure out
for itself whether it has made the right interpretation or decision about a given data
element.
Neural networks involve a trial-and-error process, so they need massive amounts of data
on which to train. It's no coincidence neural networks became popular only after most
enterprises embraced big data analytics and accumulated large stores of data. Because
the model's first few iterations involve somewhat educated guesses on the contents of an
image or parts of speech, the data used during the training stage must be labeled so the
model can see if its guess was accurate. This means unstructured data is less helpful.
Unstructured data can only be analyzed by a deep learning model once it has been
trained and reaches an acceptable level of accuracy, but deep learning models can't train
on unstructured data.
DEEP LEARNING ACCURACY CAN INCREASE BY USING BIG DATA

The second huge advantage of deep learning, and a key part of understanding
why it’s becoming so popular, is that it’s powered by massive amounts of data.
The era of big data will provide huge opportunities for new innovations in deep
learning. But don’t take my word for it Andrew Ng, the chief scientist of China’s
major search engine Baidu, co-founder of Coursera and one of the leaders of the
Google Brain Project,puts it this way:

AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. If
you have a large engine and a tiny amount of fuel, you won’t make it to orbit. If
you have a tiny engine and a ton of fuel, you can’t even lift off. To build a rocket
you need a huge engine and a lot of fuel.

The analogy to deep learning is that the rocket engine is the deep learning
models and the fuel is the huge amounts of data we can feed to these algorithms.

Deep

learning models tend to increase their accuracy with the increasing amount of
training data, whereas traditional machine learning models such as SVM and
naive Bayes classifier stop improving after a saturation point.
How Do Deep Learning Neural Networks Work?
BIOLOGICAL NEURAL NETWORKS
Artificial neural networks are inspired by the biological neurons found in our
brains. In fact, the artificial neural networks simulate some basic functionalities
of biological neural network, but in a very simplified way. Let’s first look at the
biological neural networks to derive parallels to artificial neural networks.In
short, a biological neural network consists of numerous neurons.

A typical neuron consists of a cell body, dendrites and an axon. Dendrites are
thin structures that emerge from the cell body. An axon is a cellular extension
that emerges from this cell body. Most neurons receive signals through the
dendrites and send out signals along the axon.

At the majority of synapses, signals cross from the axon of one neuron to the
dendrite of another. All neurons are electrically excitable due to the maintenance
of voltage gradients in their membranes. If the voltage changes by a large enough
amount over a short interval, the neuron generates an electrochemical pulse
called an action potential. This potential travels rapidly along the axon and
activates synaptic connections.
ARTIFICIAL NEURAL NETWORKS

Now that we have a basic understanding of how biological neural networks are
functioning, let’s take a look at the architecture of the artificial neural network.

A neural network generally consists of a collection of connected units or nodes.


We call these nodes neurons. These artificial neurons loosely model the
biological neurons of our brain.

A neuron is simply a graphical representation of a numeric value (e.g. 1.2, 5.0,


42.0, 0.25, etc.). Any connection between two artificial neurons can be considered
an axon in a biological brain. The connections between the neurons are realized
by so-called weights, which are also nothing more than numerical values.

When an artificial neural network learns, the weights between neurons change,
as does the strength of the connection. Well what does that mean? Given
training data and a particular task such as classification of numbers, we are
looking for certain set weights that allow the neural network to perform the
classification.

The set of weights is different for every task and every data set. We cannot
predict the values of these weights in advance, but the neural network has to
learn them. The process of learning is what we call training.
Deep Learning Neural Network
Architecture
The typical neural network architecture consists of several layers; we call the
first one the input layer.

The input layer receives input x, (i.e. data from which the neural network learns).
In our previous example of classifying handwritten numbers, these inputs x
would represent the images of these numbers (x is basically an entire vector
where each entry is a pixel).

The input layer has the same number of neurons as there are entries in the
vector x. In other words, each input neuron represents one element in the vector.

The last layer is called the output layer, which outputs a vector y representing the
neural network’s result. The entries in this vector represent the values of the
neurons in the output layer. In our classification, each neuron in the last layer
represents a different class.

In this case, the value of an output neuron gives the probability that the
handwritten digit given by the features x belongs to one of the possible classes
(one of the digits 0-9). As you can imagine the number of output neurons must be
the same number as there are classes.

In order to obtain a prediction vector y, the network must perform certain


mathematical operations, which it performs in the layers between the input and
output layers. We call these the hidden layers. Now let's discuss what the
connections between the layers look like.

Layer Connections in a Deep Learning Neural Network

Please consider a smaller neural network that consists of only two layers. The input layer
has two input neurons, while the output layer consists of three neurons.

As mentioned earlier, each connection between two neurons is represented by a numerical


value, which we call weight.

As you can see in the picture, each connection between two neurons is represented by a
different weight w. Each of these weight w has indices. The first value of the indices stands
for the number of neurons in the layer from which the connection originates, the second
value for the number of the neurons in the layer to which the connection leads.

All weights between two neural network layers can be represented by a matrix called the
weight matrix.
A weight matrix has the same number of entries as there are connections between neurons.
The dimensions of a weight matrix result from the sizes of the two layers that are connected
by this weight matrix.

The number of rows corresponds to the number of neurons in the layer from which the
connections originate and the number of columns corresponds to the number of neurons in
the layer to which the connections lead.

In this particular example, the number of rows of the weight matrix corresponds to the size
of the input layer, which is two, and the number of columns to the size of the output layer,
which is three.

Deep learning applications

Real-world deep learning applications are a part of our daily lives, but in most cases, they
are so well-integrated into products and services that users are unaware of the complex data
processing that is taking place in the background. Some of these examples include the
following:

Law enforcement: Deep learning algorithms can analyze and learn from transactional data
to identify dangerous patterns that indicate possible fraudulent or criminal activity. Speech
recognition, computer vision, and other deep learning applications can improve the
efficiency and effectiveness of investigative analysis by extracting patterns and evidence
from sound and video recordings, images, and documents, which helps law enforcement
analyze large amounts of data more quickly and accurately.
Financial services: Financial institutions regularly use predictive analytics to drive
algorithmic trading of stocks, assess business risks for loan approvals, detect fraud, and help
manage credit and investment portfolios for clients.

Customer service: Many organizations incorporate deep learning technology into their
customer service processes. Chatbots—used in a variety of applications, services, and
customer service portals—are a straightforward form of AI. Traditional chatbots use natural
language and even visual recognition, commonly found in call center-like menus. However,
more sophisticated chatbot solutions attempt to determine, through learning, if there are
multiple responses to ambiguous questions. Based on the responses it receives, the chatbot
then tries to answer these questions directly or route the conversation to a human user.
Virtual assistants like Apple's Siri, Amazon Alexa, or Google Assistant extends the idea of a
chatbot by enabling speech recognition functionality. This creates a new method to engage
users in a personalized way.

Healthcare:The healthcare industry has benefited greatly from deep learning capabilities
ever since the digitization of hospital records and images. Image recognition applications
can support medical imaging specialists and radiologists, helping them analyze and assess
more images in less time.

You might also like