0% found this document useful (0 votes)
42 views11 pages

DLT Unit-2

Uploaded by

TONY 562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views11 pages

DLT Unit-2

Uploaded by

TONY 562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

DLT - UNIT-II

Introducing Deep Learning: Biological and Machine Vision, Human and Machine
Language, Artificial Neural Networks, Training Deep Networks, Improving Deep Networks.

Biological and Machine Vision


Introduction
Computer vision and human vision are two distinct yet interconnected domains that shape our
understanding of visual perception. While computer vision relies on algorithms and artificial intelligence
to process and interpret visual data, human vision is a complex biological process that involves the eyes,
optic nerves, and the brain. Understanding the differences and similarities between these two realms is
crucial for advancing technology and enhancing our perception of the world.

What is Computer Vision?


Computer vision refers to the ability of machines to understand and interpret visual data. It is a field of
artificial intelligence that utilizes algorithms and computational models to analyze and make sense of
images and videos. By mimicking human visual processing, computer vision algorithms can detect patterns,
recognize objects, and extract meaningful information from visual input. This technology has a wide range
of applications, including robotics, autonomous vehicles, surveillance systems, medical imaging, and
augmented reality.
The Fundamentals of Human Vision
Human vision is a remarkable biological process that allows us to perceive and interpret the visual world.
It begins with the eyes, which capture light and send signals to the brain for processing. The cornea, pupil,
and lens work together to focus light onto the retina, where specialized cells called cones and rods detect
the light and convert it into electrical signals. These signals are then transmitted through the optic nerves
to the brain, which processes the information and forms our visual perception.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

Key Differences Between Computer Vision and Human Vision


While computer vision and human vision share the goal of understanding visual information, there are
several key differences between the two:
1. Processing Mechanisms: Computer vision relies on algorithms and computational models to process
visual data, whereas human vision involves complex neural networks and biological processes.
2. Adaptability and Efficiency: Human vision is highly adaptable and efficient in recognizing patterns,
even in complex scenes or varied lighting conditions. Computer vision algorithms, although powerful, can
struggle in such situations.
3. Handling Complex Scenes and Varied Conditions: Human vision integrates information from multiple
sensory channels, such as color, contrast, and visual acuity, to form a cohesive perception. Computer vision
algorithms often focus on specific visual features and may struggle with complex scenes or changing
conditions.

Advantages and Limitations of Computer Vision


Computer vision offers several advantages over human vision in certain tasks:
1. Processing Speed and Accuracy: Computer vision algorithms can process vast amounts of visual data
quickly and accurately, outperforming human capabilities in tasks such as object recognition and image
classification.
2. Object Recognition: Computer vision excels at identifying and categorizing objects within images and
videos. This capability has numerous practical applications, including surveillance systems, autonomous
vehicles, and medical imaging.
3. Applications in Various Fields: Computer vision finds applications in robotics, where machines can
perceive and interact with their environment, as well as in medical imaging, where it aids in diagnosis and
treatment. Augmented reality is another field where computer vision enhances our interaction with the
digital world.
However, computer vision also has its limitations:
1. Contextual Understanding: While computer vision algorithms can recognize objects and patterns, they
often struggle with understanding the context and meaning behind visual scenes, which comes naturally to
human vision.
2. Handling Ambiguity: Human vision can make sense of ambiguous or incomplete visual information,
leveraging past experiences and cognitive processes. Computer vision algorithms may struggle in such
situations.

Ethical Implications of Computer Vision Technology


The widespread adoption of computer vision technology raises important ethical considerations. Privacy
concerns, surveillance applications, and potential biases in algorithms are among the key issues that need
to be addressed. As computer vision continues to advance, it is crucial to ensure transparency, fairness, and
accountability in its implementation.

The Future of Computer Vision


As technology evolves, computer vision is expected to play an increasingly important role in our lives.
Advancements in machine learning, deep learning, and neural networks will enhance the capabilities of
computer vision algorithms. We can anticipate further integration of computer vision in robotics,

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

autonomous systems, healthcare, and other fields. The challenges of bridging the gap between computer
vision and human vision will continue to inspire research and innovation.

Human and Machine Language


Human language and machine language are two distinct forms of communication, each with its own
characteristics and purposes. Here's an overview of each:
Human Language:
1. Natural and Evolved: Human language is a natural form of communication that has evolved over
thousands of years. It is deeply ingrained in human culture and society.
2. Complex and Expressive: Human languages are highly complex, allowing for nuanced and expressive
communication. They encompass a wide range of sounds, words, grammar rules, and cultural nuances.
3. Contextual: Human language often relies on context, tone of voice, facial expressions, and body language
to convey meaning accurately. Context plays a crucial role in interpreting messages correctly.
4. Subjective and Emotional: Human language can convey emotions, opinions, and subjective experiences.
It is not purely objective and can be influenced by personal feelings and perspectives.
5. Learning and Creativity: Humans learn their native language(s) through exposure and practice. They can
also be creative with language, inventing new words and expressions as needed.
6. Ambiguity: Human languages can be ambiguous, with words or phrases having multiple meanings
depending on context. This ambiguity can sometimes lead to misunderstandings.
Machine Language:
1. Artificial and Designed: Machine language, often referred to as programming languages, is artificial and
designed by humans for specific tasks. Examples include Python, Java, and C++.
2. Formal and Precise: Machine languages are highly formal and precise. They rely on strict syntax and
semantics to provide clear and unambiguous instructions to computers.
3. Lack of Context: Machines do not inherently understand context, emotions, or the subtleties of human
communication. They follow instructions strictly based on the programming provided.
4. Objective and Logical: Machine languages are purely objective and logical, relying on algorithms and
rules to process data and perform tasks.
5. Learned by Instruction: Machines do not naturally learn languages like humans do. Instead, they are
programmed and can only perform tasks for which they have been explicitly instructed.
6. No Ambiguity: In machine languages, ambiguity is typically avoided. Programs must be unambiguous
to ensure predictable and reliable behavior.
In recent years, there has been significant progress in natural language processing (NLP) and machine
learning, enabling machines to understand and generate human language to some extent. This has led to
the development of chatbots, virtual assistants, and language translation tools that can bridge the gap
between human and machine language to facilitate communication between humans and computers.
However, these systems are still limited in their understanding of context, emotions, and the nuances of
human communication compared to humans themselves.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

Artificial Neural Networks


The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure
of a human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the networks. These
neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network. The typical Artificial
Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network


Dendrites Inputs
Cell nucleus Nodes
Synapse Weights

Axon Output

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the network
of neurons makes up a human brain so that computers will have an option to understand things and make
decisions in a human-like manner. The artificial neural network is designed by programming computers to
behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association point somewhere
in the range of 1,000 and 100,000. In the human brain, data is stored in such a manner as to be distributed,
and we can extract more than one piece of this data, when necessary, from our memory parallelly. We can
say that the human brain is made up of incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a digital logic
gate that takes an input and gives an output. "OR" gate, which takes two inputs. If one or both the inputs
are "On," then we get "On" in output. If both the inputs are "Off," then we get "Off" in output. Here the
output depends upon input. Our brain does not perform the same task. The outputs to inputs relationship
keep changing because of the neurons in our brain, which are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to understand what a
neural network consists of. In order to define a neural network that consists of a large number of artificial
neurons, which are termed units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.
Artificial Neural Network primarily consists of three layers:

Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias.
This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output. Activation
functions choose whether a node should fire or not. Only those who are fired make it to the output layer.
There are distinctive activation functions available that can be applied upon the sort of task we are
performing.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

Advantages of Artificial Neural Network (ANN)


Parallel processing capability: Artificial neural networks have a numerical value that can perform more
than one task simultaneously.
Storing data on the entire network: Data that is used in traditional programming is stored on the whole
network, not on a database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
Capability to work with incomplete knowledge: After ANN training, the information may produce
output even with inadequate data. The loss of performance here relies upon the significance of missing
data.
Having a memory distribution: For ANN is to be able to adapt, it is important to determine the examples
and to encourage the network according to the desired output by demonstrating these examples to the
network. The succession of the network is directly proportional to the chosen instances, and if the event
can't appear to the network in all its aspects, it can produce false output.
Having fault tolerance: Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:


Assurance of proper network structure: There is no particular guideline for determining the structure of
artificial neural networks. The appropriate network structure is accomplished through experience, trial, and
error.
Unrecognized behavior of the network: It is the most significant issue of ANN. When ANN produces a
testing solution, it does not provide insight concerning why and how. It decreases trust in the network.
Hardware dependence: Artificial neural networks need processors with parallel processing power, as per
their structure. Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network: ANNs can work with numerical data. Problems must be
converted into numerical values before being introduced to ANN. The presentation mechanism to be
resolved here will directly impact the performance of the network. It relies on the user's abilities.
The duration of the network is unknown: The network is reduced to a specific value of the error, and
this value does not give us optimum results.

Training Deep Networks


A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output layers.
Similar to shallow ANNs, DNNs can model complex non-linear relationships.
The main purpose of a neural network is to receive a set of inputs, perform progressively complex
calculations on them, and give output to solve real world problems like classification. We restrict ourselves
to feed forward neural networks.
We have an input, an output, and a flow of sequential data in a deep network.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

Neural networks are widely used in supervised learning and reinforcement learning problems. These
networks are based on a set of layers connected to each other.
In deep learning, the number of hidden layers, mostly non-linear, can be large; say about 1000 layers.
DL models produce much better results than normal ML networks.
We mostly use the gradient descent method for optimizing the network and minimising the loss function.
We can use the Imagenet, a repository of millions of digital images to classify a dataset into categories like
cats and dogs. DL nets are increasingly used for dynamic images apart from static ones and for time series
and text analysis.
Training the data sets forms an important part of Deep Learning models. In addition, Backpropagation is
the main algorithm in training DL models.
DL deals with training large neural networks with complex input output transformations.
One example of DL is the mapping of a photo to the name of the person(s) in photo as they do on social
networks and describing a picture with a phrase is another recent application of DL.

Neural networks are functions that have inputs like x1,x2,x3…that are transformed to outputs like z1,z2,z3
and so on in two (shallow networks) or several intermediate operations also called layers (deep networks).

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of layers of the
neural networks.
The best use case of deep learning is the supervised learning problem.Here,we have large set of data inputs
with a desired set of outputs.

Here we apply back propagation algorithm to get correct output prediction.


The most basic data set of deep learning is the MNIST, a dataset of handwritten digits.
We can train deep a Convolutional Neural Network with Keras to classify images of handwritten digits
from this dataset.
The firing or activation of a neural net classifier produces a score. For example,to classify patients as sick
and healthy, we consider parameters such as height, weight and body temperature, blood pressure etc.
A high score means patient is sick and a low score means he is healthy.
Each node in output and hidden layers has its own classifiers. The input layer takes inputs and passes on
its scores to the next hidden layer for further activation and this goes on till the output is reached.
This progress from input to output from left to right in the forward direction is called forward propagation.
Credit assignment path (CAP) in a neural network is the series of transformations starting from the input
to the output. CAPs elaborate probable causal connections between the input and the output.
CAP depth for a given feed forward neural network or the CAP depth is the number of hidden layers plus
one as the output layer is included. For recurrent neural networks, where a signal may propagate through a
layer several times, the CAP depth can be potentially limitless.

Deep Nets and Shallow Nets


There is no clear threshold of depth that divides shallow learning from deep learning; but it is mostly agreed
that for deep learning which has multiple non-linear layers, CAP must be greater than two.
Basic node in a neural net is a perception mimicking a neuron in a biological neural network. Then we have
multi-layered Perception or MLP. Each set of inputs is modified by a set of weights and biases; each edge
has a unique weight and each node has a unique bias.
The prediction accuracy of a neural net depends on its weights and biases.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

The process of improving the accuracy of neural network is called training. The output from a forward prop
net is compared to that value which is known to be correct.
The cost function or the loss function is the difference between the generated output and the actual output.
The point of training is to make the cost of training as small as possible across millions of training examples.
To do this, the network tweaks the weights and biases until the prediction matches the correct output.
Once trained well, a neural net has the potential to make an accurate prediction every time.
When the pattern gets complex and you want your computer to recognise them, you have to go for neural
networks. In such complex pattern scenarios, neural network outperformsall other competing algorithms.
There are now GPUs that can train them faster than ever before. Deep neural networks are already
revolutionizing the field of AI
Computers have proved to be good at performing repetitive calculations and following detailed instructions
but have been not so good at recognising complex patterns.
If there is the problem of recognition of simple patterns, a support vector machine (svm) or a logistic
regression classifier can do the job well, but as the complexity of patternincreases, there is no way but to
go for deep neural networks.
Therefore, for complex patterns like a human face, shallow neural networks fail and have no alternative
but to go for deep neural networks with more layers. The deep nets are able to do their job by breaking
down the complex patterns into simpler ones. For example, human face; adeep net would use edges to
detect parts like lips, nose, eyes, ears and so on and then re-combine these together to form a human face
The accuracy of correct prediction has become so accurate that recently at a Google Pattern Recognition
Challenge, a deep net beat a human.
This idea of a web of layered perceptrons has been around for some time; in this area, deep nets mimic the
human brain. But one downside to this is that they take long time to train, a hardware constraint
However recent high performance GPUs have been able to train such deep nets under a week; while fast
CPUs could have taken weeks or perhaps months to do the same.

Choosing a Deep Net


How to choose a deep net? We have to decide if we are building a classifier or if we are trying to find
patterns in the data and if we are going to use unsupervised learning. To extract patterns from a set of
unlabelled data, we use a Restricted Boltzman machine or an Auto encoder.
Consider the following points while choosing a deep net −

• For text processing, sentiment analysis, parsing and name entity recognition, we use a recurrent
net or recursive neural tensor network or RNTN;
• For any language model that operates at character level, we use the recurrent net.
• For image recognition, we use deep belief network DBN or convolutional network.
• For object recognition, we use a RNTN or a convolutional network.
• For speech recognition, we use recurrent net.
In general, deep belief networks and multilayer perceptrons with rectified linear units or RELU are both
good choices for classification.
For time series analysis, it is always recommended to use recurrent net.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

Neural nets have been around for more than 50 years; but only now they have risen into prominence. The
reason is that they are hard to train; when we try to train them with a method called back propagation, we
run into a problem called vanishing or exploding gradients.When that happens, training takes a longer time
and accuracy takes a back-seat. When training a data set, we are constantly calculating the cost function,
which is the difference between predicted output and the actual output from a set of labelled training data.
The cost function is then minimized by adjusting the weights and biases values until the lowest value is
obtained. The training process uses a gradient, which is the rate at which the cost will change with respect
to change in weight or bias values.

Training Deep Networks


Data Preprocessing: Preprocessing the data is a crucial step before training. It involves normalizing,
standardizing, and augmenting the data to make it suitable for the model.
Choosing the Right Architecture: Selecting an appropriate deep network architecture for the task is
essential. Different architectures (e.g., CNNs, RNNs, Transformers) are suited for specific tasks, such as
image recognition, sequence modeling, and natural language processing.
Initialization: Properly initializing the model's weights can significantly impact training. Common
initialization techniques include Xavier/Glorot initialization and He initialization.
Loss Function: Choosing an appropriate loss function depends on the task at hand. For classification tasks,
cross-entropy loss is often used, while mean squared error is common for regression tasks.
Optimization Algorithm: Stochastic Gradient Descent (SGD) and its variants like Adam, RMSprop, and
AdaGrad are commonly used optimization algorithms. These algorithms update the model's parameters
during training to minimize the loss function.
Learning Rate Scheduling: Gradually reducing the learning rate during training can help the model
converge to a better solution and avoid overshooting.
Batch Size: The batch size determines how many samples are processed together before updating the
model's parameters. Adjusting the batch size can impact the training speed and the stability of learning.
Regularization Techniques: Techniques like L1/L2 regularization, dropout, and batch normalization can
help prevent overfitting and improve the generalization of the model.

Improving Deep Networks


Transfer Learning: Transfer learning involves using a pre-trained model on a large dataset and fine-tuning
it for a specific task. It can save training time and improve performance, especially when the target task has
limited data.
Data Augmentation: Generating additional training data by applying random transformations (rotations,
flips, shifts, etc.) to existing samples can help the model generalize better.
Ensemble Methods: Combining predictions from multiple models (either the same architecture with
different initializations or different architectures) can often lead to better performance.
Hyperparameter Tuning: Optimize hyperparameters like learning rate, batch size, and the number of
layers/neurons in the model to find the best configuration for your specific problem.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.


DLT - UNIT-II

Regularization Techniques: As mentioned earlier, L1/L2 regularization, dropout, and batch normalization
can improve model generalization and reduce overfitting.
Advanced Architectures: Staying updated with the latest advancements in neural network architectures
can lead to significant improvements. For example, using architectures like ResNet, Inception, or
Transformers has shown great success in various tasks.
Custom Loss Functions: Designing task-specific loss functions that capture the unique characteristics of
the problem can enhance the model's performance.
Adversarial Training: Incorporating adversarial training to improve the model's robustness against
adversarial examples can be beneficial, especially in security-critical applications.
Remember that deep learning is an iterative process, and experimentation is essential to find the best
approach for a particular task. Monitoring the model's performance during training and adjusting techniques
accordingly is crucial for achieving the best results.

G MUNI NAGAMANI, Assistant Professor, CSE, ALIET.

You might also like