Transformer Neural Network In Deep Learning - Overview
Last Updated :
02 Oct, 2022
In this article, we are going to learn about Transformers. We'll start by having an overview of Deep Learning and its implementation. Moving ahead, we shall see how Sequential Data can be processed using Deep Learning and the improvement that we have seen in the models over the years.
Deep Learning
So now what exactly is Deep Learning? But before we go and understand what is Deep Learning, let's quickly walk you through the chronology over here, starting off with AI. AI or artificial intelligence is basically the entire thing. AI is an area of computer science that emphasizes the creation of intelligence within the machine to work and react like human beings. In short, here, we are trying to have the capability of machines to imitate the intelligence of human behaviour. Then we have Machine Learning. ML is basically a science of getting computers to act by feeding them up on previous data. So Deep Learning is a subset of Machine Learning. And here we make use of something called neural networks. We see neural networks are the set of algorithms and techniques, which are modelled in accordance with the human brain and neural networks are designed to solve complex and advanced machine learning problems.
So what exactly is Deep Learning? Well, Deep Learning is a part of a broad family of ML methods, which are based on learning data patterns in opposition to what a Machine Learning algorithm does. In Machine Learning we have algorithms for a specific task. Here, the Deep Learning algorithm can be supervised semi-supervised or unsupervised. As mentioned earlier, Deep Learning is inspired by the human brain and how it perceives information through the interaction of neurons. So let's see what exactly can we do with Deep Learning. But before we go there, so why should we choose deep Learning for, you know, various tasks? So the big advantage of using Deep Learning is that we can extract more features and when we have more features and when we can work at the same time with a huge amount of data, we can perceive an object like a human being does. What it means is, if you want to perform a classification task between pen and a pencil, you'll obviously know as a human being, you know, the difference because you look at a pen and a pencil contains a number of times, and now when you're trying to actually classify it, you can do it with ease. And the reason for this is because, you know, the features of a pen, and you know, the features of a pencil. Similarly, this is how Deep Learning works. More, the data you feed more, the dimensions, it can analyze more the dimensions, it can learn. As already mentioned, one of the most popular applications of Deep Learning is image classification. And when it comes to image classification, it can be something as simple as classifying between two different animals, for something as complicated as, hiding data or trying to run automated cars using classification task.
So next type of application using Deep Learning is using Sequential Data. Sequential Data, basically refers to something like time-series data or having to understand natural language. So the reason why we call it sequential data is that here the previous word or the previous feature is dependent upon the next feature. If I say what time is it? So if I just say 'it is' like, over here what time is and, it basically features in the sentence. And in order for you to make an analogy or to understand, obviously have to know what has happened in the past. So in order to do this, we use something called as RNNs. And there are various versions of RNN.
Moving on to the next application that is GAN's. GANs, which stands for generative adversarial networks is an unsupervised part of a Deep Learning application. Some common application, which you can see in recent days is nothing but deep fakes and many more. Finally, coming down to performance classification and regression task using multi-layer perceptron. If you remember, or if you are well versed with Machine Learning in order to perform classification in ML, we had algorithms like decision tree, random forest, or something, very simple as linear regression or logistic regression. But when we try to perform classification using MLP or multi-layer perceptron, we get a very high accuracy even compared to SVM and decision trees.
Natural Language Processing (NLP) Using RNN
So now that we know what exactly is Deep Learning and why we use it, let's now stream down to understand how can we process natural language, data using RNNs. So what are RNNs? It Stands for Recurrent Neural Network. And we usually use this in order to deal with sequential data. Sequential data can be something like a time-seriessome data, or textual data of any format. So why should one use RNN? This is because there's a concept of internal memoriam here. RNN can remember important things about the input it has received, which allows them to be very precise in predicting what can be the next outcome. So this is the reason why they are performed or preferred on a sequential data algorithm. And some of the examples of sequence data can be something like time, series, speech, text, financial data, audio, video, weather, and many more. Although RNN was the state-of-the-art algorithm for dealing with sequential data, they come up with their own drawbacks and some popular drawbacks over here can be like due to the complication or the complexity of the algorithm. The neural network is pretty slow to train. And as a huge amount of dimensions here, the training is very long and difficult to do. Apart from that most decisive feature for RNN or for the improvement in RNN, is that off of vanishing gradient? What this vanished gradient is? When we go deeper and deeper into our neural network, the previous data is lost. This is because of a concept, vanishing ingredient. And you do this we cannot work on a large or longer sequence of data.
To overcome this, we came up with some new or upgrades to the current record neural networks or RNNs. Starting off with a Bi-Directional recurrent neural network. Bi-directional recurrent neural network connects two hidden layers of opposite direction into the same output with this form of generating Deep Learning, the output can get information from past and future state simultaneously. So why do we need a Bi-Directional recurrent neural network? Well, it duplicates, RNN processing chain, so that the input process both forward and reverse time order, thus allowing a bi-directional recurrent neural network to look into future context as well. The next one is long short-term memory, long short term memory, or also sometimes referred to as LSTM is an artificial recurrent neural network architecture used in the field of Deep Learning. This standard feedforward neural network at LSTM has a feedback connection. It can not only process single data point, but also the entire sequence of data. With LSTM or long short term memory, it has something like, you know, we can feed a longer sequence compared to what it was with bi-directional RNN or RNNs.
So why is LSTM better than RNN? We can say that when we move from RNN to LSTM, we are introducing more and more control over the sequence of the data that we can provide. Thus, LSTM gives us more control ability and does better results.
So the next type of recurrent neural network is the Gated Recurrent Neural Network also referred to as GRUs. It is a type of recurrent neural network that is in certain cases is advantageous over long short-term memory. GRU makes use of less memory and also is faster than LSTM. But the thing is LSTMs are more accurate while using longer datasets. So the trend over here is, you know, the models should be capable of remembering and taking it on a longer input sequence.
Transformers
The game-changer part for the sequencer data was developed when we came up with something called Transformers and this paper was something which is based on a concept called Attention Is Everything. So let's take a look at this. The paper 'Attention Is All You Need' introduces and an architecture called last Transformers. Like LSTMs Transformers is an architecture for transforming one sequence into an antidote while helping other two parts that is encoders and decoders, but it differs from the previously described sequence your sequence model, because it does not work like GRUs. So it does not implement recurrent neural networks. Recurrent neural network until now was one of the best ways to capture the tiny dependence on a sequence. However, the team presenting this paper that is 'Attention Is All You Need' prove that architecture with only attention mechanism does not use RNN can improve its result in translation task and other NLP tasks. An example of it could be Google's BERT.
So what exactly is this transformer. Both encoder and decoder are comprised of modules that can speak onto the top of each other multiple times. So what happens is the inputs and outputs are first embedded into n-dimension space, since we cannot use this directly. So we obviously have to encode our inputs, whatever we are providing. One slight, but important part of this model is positional and coding of different words. Since we have no recurrent neural network that can remember how to sequence is fed into the model, we need to somehow give every word or part of a sequence, a relative position since a sequence depends on the order of the elements. These positions are added to the embedded representation of each word. So this was a brief about Transformers.
Language Models
Let's move ahead and see some popular language models that are available in the market. We'll start off by understanding OpenAI's GPT3. The successor to GPT and GPT2 is the GPT3, and is one of the most controversial pre-trained models, by OpenAI the large-scale transformer-based language model has been trained on 175 billion parameters, which is 10 times more than any previous non-sparsed language model. The model has been trained to achieve strong performance on much NLP dataset, including task translation, answering questions, as well as several other tasks.
Then we have Google's BERT. It stans for bi-directional encoder representations from Transformers. Is a pre-trained NLP model, which is developed by Google in 2018 with this, anyone in the work and train either their own question-answering module with up to 30 minutes on a single cloud TPU or few hours using a single GPU. The company then showcasing the performance of 11 NLP tasks, including very competitive, Stanford dataset questions. Unlike other language models, but BERT only been pre-trained on 250 million words of Wikipedia and 800 million words of book corpus and has been successfully used as a pre-trained model in a deep neural network, according to researchers, but has achieved 93% accuracy, which has suppressed any previous language models.
Next, we have ELMO. ELMO is also known as embedding for language model is a deep contextualize word representation that model syntax and semantic words, as well as the logistic context. The model developed by Alan LP has been pre-trained on a huge text Corpus and learn functions from bi-directional models. That is by LM. ELMO can easily be added to the existing model, which drastically improves the features of functions across vast NLP problems, including answering questions, textual sentiment, and sentiment analysis.
Transformers Explained | Natural Language Processing (NLP)
Similar Reads
Deep Learning Tutorial Deep Learning is a subset of Artificial Intelligence (AI) that helps machines to learn from large datasets using multi-layered neural networks. It automatically finds patterns and makes predictions and eliminates the need for manual feature extraction. Deep Learning tutorial covers the basics to adv
5 min read
Deep Learning Basics
Introduction to Deep LearningDeep Learning is transforming the way machines understand, learn and interact with complex data. Deep learning mimics neural networks of the human brain, it enables computers to autonomously uncover patterns and make informed decisions from vast amounts of unstructured data. How Deep Learning Works?
7 min read
Artificial intelligence vs Machine Learning vs Deep LearningNowadays many misconceptions are there related to the words machine learning, deep learning, and artificial intelligence (AI), most people think all these things are the same whenever they hear the word AI, they directly relate that word to machine learning or vice versa, well yes, these things are
4 min read
Deep Learning Examples: Practical Applications in Real LifeDeep learning is a branch of artificial intelligence (AI) that uses algorithms inspired by how the human brain works. It helps computers learn from large amounts of data and make smart decisions. Deep learning is behind many technologies we use every day like voice assistants and medical tools.This
3 min read
Challenges in Deep LearningDeep learning, a branch of artificial intelligence, uses neural networks to analyze and learn from large datasets. It powers advancements in image recognition, natural language processing, and autonomous systems. Despite its impressive capabilities, deep learning is not without its challenges. It in
7 min read
Why Deep Learning is ImportantDeep learning has emerged as one of the most transformative technologies of our time, revolutionizing numerous fields from computer vision to natural language processing. Its significance extends far beyond just improving predictive accuracy; it has reshaped entire industries and opened up new possi
5 min read
Neural Networks Basics
What is a Neural Network?Neural networks are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns and enable tasks such as pattern recognition and decision-making.In this article, we will explore the fundamental
12 min read
Types of Neural NetworksNeural networks are computational models that mimic the way biological neural networks in the human brain process information. They consist of layers of neurons that transform the input data into meaningful outputs through a series of mathematical operations. In this article, we are going to explore
7 min read
Layers in Artificial Neural Networks (ANN)In Artificial Neural Networks (ANNs), data flows from the input layer to the output layer through one or more hidden layers. Each layer consists of neurons that receive input, process it, and pass the output to the next layer. The layers work together to extract features, transform data, and make pr
4 min read
Activation functions in Neural NetworksWhile building a neural network, one key decision is selecting the Activation Function for both the hidden layer and the output layer. It is a mathematical function applied to the output of a neuron. It introduces non-linearity into the model, allowing the network to learn and represent complex patt
8 min read
Feedforward Neural NetworkFeedforward Neural Network (FNN) is a type of artificial neural network in which information flows in a single direction i.e from the input layer through hidden layers to the output layer without loops or feedback. It is mainly used for pattern recognition tasks like image and speech classification.
6 min read
Backpropagation in Neural NetworkBack Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Deep Learning Models
Deep Learning Frameworks
TensorFlow TutorialTensorFlow is an open-source machine-learning framework developed by Google. It is written in Python, making it accessible and easy to understand. It is designed to build and train machine learning (ML) and deep learning models. It is highly scalable for both research and production.It supports CPUs
2 min read
Keras TutorialKeras high-level neural networks APIs that provide easy and efficient design and training of deep learning models. It is built on top of powerful frameworks like TensorFlow, making it both highly flexible and accessible. Keras has a simple and user-friendly interface, making it ideal for both beginn
3 min read
PyTorch TutorialPyTorch is an open-source deep learning framework designed to simplify the process of building neural networks and machine learning models. With its dynamic computation graph, PyTorch allows developers to modify the networkâs behavior in real-time, making it an excellent choice for both beginners an
7 min read
Caffe : Deep Learning FrameworkCaffe (Convolutional Architecture for Fast Feature Embedding) is an open-source deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) to assist developers in creating, training, testing, and deploying deep neural networks. It provides a valuable medium for enhancing com
8 min read
Apache MXNet: The Scalable and Flexible Deep Learning FrameworkIn the ever-evolving landscape of artificial intelligence and deep learning, selecting the right framework for building and deploying models is crucial for performance, scalability, and ease of development. Apache MXNet, an open-source deep learning framework, stands out by offering flexibility, sca
6 min read
Theano in PythonTheano is a Python library that allows us to evaluate mathematical operations including multi-dimensional arrays efficiently. It is mostly used in building Deep Learning Projects. Theano works way faster on the Graphics Processing Unit (GPU) rather than on the CPU. This article will help you to unde
4 min read
Model Evaluation
Deep Learning Projects