0% found this document useful (0 votes)
37 views13 pages

DLTest 1 QB

Uploaded by

Shreya shresth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

DLTest 1 QB

Uploaded by

Shreya shresth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

DEEP LEARNING (21CS743)

Module wise QB (Module I and Module II)

1. Define deep learning and illustrate the relationships between different disciplines of AI through venn
diagram.
when computers to learn from experience and understand the world in terms of a hierarchy of concepts, with
each concept defined in terms of its relation to simpler Concepts. If we draw a graph showing how these
concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this
approach to AI deep learning.

Deep learning models can be assessed in terms of their depth in two ways:

1. Depth of the Computational Graph - Refers to the number of layers (or transformations) the input
must pass through to produce the output.
2. Probabilistic Modeling Depth - Refers to how layers of concepts are interrelated, as each layer
refines the understanding of simpler concepts based on more abstract concepts.

Deep learning is a subset of machine learning that uses multilayered neural networks, called deep neural
networks, to simulate the complex decision-making power of the human brain.

In terms of the AI ecosystem, the relationship between different disciplines can be represented in a Venn
diagram:

1. Artificial Intelligence (AI) is the broadest circle, representing any machine-based activity aimed at
emulating human intelligence.
2. Machine Learning (ML) sits within AI, focusing on data-driven algorithms that allow systems to
improve over time.
3. Representation Learning is within ML, describing models that automatically extract
representations from data without relying on manual feature engineering.
4. Deep Learning (DL), a subset of representation learning, uses hierarchical layers to progressively
extract features, moving from simple patterns to complex abstractions.
In the Venn diagram:

• AI (outer circle): Includes broader AI systems like knowledge bases.


• Machine Learning (ML): Includes classical methods like logistic regression.
• Representation Learning: Encompasses models that learn feature representations, like shallow
autoencoders.
• Deep Learning: Includes multilayer perceptrons (MLPs) and other deep architectures that leverage
layered learning to handle complex data.

2. Describe the primary motivations behind the development of DL techniques.

The development of deep learning (DL) techniques was motivated by several factors that limited traditional
machine learning's ability to handle complex data representations and generalizations.

The evolution of deep learning has indeed experienced three key waves, each contributing significantly to its
development and influence on modern AI. These stages are:

1. Cybernetics (1940s–1960s): The initial phase, where early deep learning efforts were closely linked
with cybernetics and biological models of learning. During this time, researchers attempted to
understand and replicate the brain's functions computationally, setting the foundation for artificial
neural networks (ANNs). This wave focused on proving that computational systems could
potentially exhibit intelligent behaviors by mimicking brain-like structures, even though early ANNs
were limited in complexity and far from biologically accurate.
2. Connectionism (1980s–1990s): This second phase emphasized connectionism. It saw the rise of
key developments such as backpropagation, which allowed neural networks to train on layered
structures, transforming ANNs into multi-layer models capable of solving more intricate tasks. The
inspiration remained partially biological, aiming to capture aspects of how neural connections might
function, although these models were more computational than biological in focus. During this
period, neural networks gained credibility as researchers moved closer to engineering systems that
could learn complex representations.
3. Modern Deep Learning (since 2006): The third and current wave, recognized as “deep learning,”
emerged with significant advances in hardware (e.g., GPUs) and access to large datasets. This
resurgence shifted focus toward multi-layered, hierarchical models capable of autonomous feature
learning and complex representations. Importantly, this wave of deep learning moved beyond purely
biologically inspired frameworks, embracing a broader, more generalized approach of learning
multiple levels of abstraction that suit various complex AI tasks, such as image and speech
recognition.

The cumulative effect of these waves has led deep learning to evolve from models loosely inspired by the
brain to highly functional, multi-layered architectures. This shift allowed deep learning to achieve
unprecedented levels of performance across diverse applications, paving the way for intelligent systems that
far exceed the capabilities envisioned in earlier decades.
3. Differentiate deep learning over machine learning models. List any six major applications
influenced by DL .

• Computer Vision: Recognizing objects in images or videos.


• Natural Language Processing (NLP): Understanding and generating human language.
• Speech Recognition: Converting spoken language into text.
• Bioinformatics: Analyzing biological data such as DNA sequences.
• Robotics: Enabling robots to learn tasks autonomously.
• Gaming: Creating intelligent game-playing agents
4. Discuss major historical developments(trends) and impact of them on shaping the evolution of Deep
Learning?
Historical Trends in Deep Learning
• It is easiest to understand deep learning with some historical context. Rather than providing a detailed
history of deep learning, we identify a few key trends:
• Deep learning has had a long and rich history, but has gone reflecting different philosophical viewp
by many names oints, and has waxed and waned in popularity.
• Deep learning has data has increased. become more useful as the amount of available training
• Deep learning models have grown in size over time as computer infrastructure both hardware and
software for deep learning has improved.
• Deep accuracy learning has solved increasingly complicated applications with increasing over time.
…………………………………………..

• Cybernetics (1940s–1960s): The first phase focused on trying to mimic basic brain functions using
simple neural models, known as artificial neural networks (ANNs). These early models were inspired by the
brain but were quite basic in their structure.

• Connectionism (1980s–1990s): The second phase, called connectionism, introduced techniques like
backpropagation, which allowed networks to be trained across multiple layers. This made neural networks
better at handling more complex tasks, although they were still inspired by biology to some extent.

• Modern Deep Learning (since 2006): The current phase began around 2006, with improvements in
computing power and data availability. This phase focuses on deeper, more powerful networks that can
automatically learn from data. These models go beyond just mimicking the brain—they’re designed to
handle a wide range of AI tasks, from image recognition to language processing.

Early Beginnings (1940s-1960s)

• 1943: McCulloch-Pitts Neuron: The concept of artificial neurons was introduced, laying the
groundwork for neural networks.
• 1957: Perceptron: Frank Rosenblatt developed the first neural network model for binary
classification.
• 1960: Early Backpropagation Concept: Henry J. Kelley proposed the initial ideas for the
backpropagation algorithm.

Backpropagation and Revival (1980s-1990s)

• 1986: Backpropagation Algorithm: Popularized by Rumelhart, Hinton, and Williams, this allowed
training across multiple layers, sparking renewed interest in neural networks.
• 1989: LeNet: Yann LeCun created the first Convolutional Neural Network (CNN), LeNet, for
recognizing handwritten digits, marking a milestone in image recognition.

Advances in Deep Learning (1990s-2000s)

• 1997: LSTM Networks: Hochreiter and Schmidhuber introduced Long Short-Term Memory
(LSTM) networks, addressing issues with long-term dependencies in recurrent neural networks
(RNNs).
• 2006: Deep Belief Networks: Geoffrey Hinton and colleagues introduced unsupervised pre-training,
enabling more efficient deep network training, which set the stage for modern deep learning.

Each phase built upon the previous one, gradually advancing neural networks to handle increasingly
complex tasks.

5. Define Machine Learning and provide intuitive descriptions and examples of the different kinds of
tasks.

Machine Learning (ML) is a branch of artificial intelligence (AI) focused on developing algorithms that
can learn from and make predictions based on data. A machine learning algorithm can be defined as a
computer program that improves its performance on a specific task TTT through experience EEE over time,
with its success measured by a performance metric PPP. The idea is that instead of programming
instructions explicitly, ML enables systems to identify patterns and adapt to new data automatically.

“A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.” One can imagine a very wide variety of experiences E, tasks T,
and performance

Types of Machine Learning Tasks:


Ø Probability density or PMF estimation:
• Estimate the PMF(probability mass function) or PDF(probability density function) exclusively in all
machine Learning algorithms irrespective of continuous(PDF) or discrete data (PMF). Such kinds of
algorithms is mainly used in anomaly detection and missing value imputation.

6. Summarize the quantitative measures for evaluating ML algorithms abilities.

Performance Measures in Machine Learning

To evaluate the abilities of a machine learning algorithm, a quantitative performance measure P must be
designed, which is specific to the task T.

1. Performance Measures for Classification Tasks:

• Accuracy:
o Proportion of examples for which the model produces the correct output.
• Error Rate:
o Proportion of examples for which the model produces an incorrect output, referred to as the
expected 0-1 loss. The 0-1 loss on an example is:
o 0 if correctly classified.
o 1 if incorrectly classified.

2. Performance Measures for Density Estimation:

• For tasks like density estimation, accuracy and error rate do not apply. Instead, a different metric is
used that assigns a continuous-valued score to each example.
o Average Log-Probability: Commonly reported score reflecting how well the model
performs on unseen data.

Evaluation of Performance Measures

To determine how well a machine learning algorithm performs on unseen data, it is essential to evaluate
these performance measures using a test set that is separate from the training data.

Challenges in Choosing Performance Measures

• The choice of performance measure may seem straightforward, but it can be challenging to select
one that corresponds well to the desired behavior of the system.

Learning Approaches

Unsupervised Learning:

• Algorithms experience a dataset with many features and learn useful properties of its structure.
• Examples include:
o Density Estimation: Learns the entire probability distribution of a dataset.
o Clustering: Divides the dataset into clusters of similar examples.

Supervised Learning:

• Algorithms work with datasets where each example is associated with a label or target.
• For instance, in the Iris dataset, the species of each iris plant is annotated, allowing the algorithm to
learn to classify plants into three different species based on their measurements.
Conclusion

Evaluating performance measures using a separate test set is crucial for understanding how well a machine
learning algorithm will perform in real-world applications. Selecting appropriate performance metrics is
vital for aligning the model's behavior with the desired outcomes.

7. Explain capacity, overfitting and underfitting with relevant examples.

The central challenge in machine learning is generalization—performing well on new, unseen inputs rather
than just the training data.

Key Points:

• Data Split: Typically, 80% of the dataset is used for training, and 20% for testing.
• Training Error: The deviation from the training data during the training phase.
• Test Error: The deviation during the test phase on unseen data.

Performance Evaluation:

A machine learning model's performance is judged by its ability to:

1. Minimize Training Error: Fit the training data well.


2. Maximize Generalization: Maintain low test error on unseen data.

Balancing these objectives is crucial for creating effective models.


8. With a suitable example, explain unsupervised learning approach.(PCA +k-Means)
Unsupervised Learning: This type of learning deals with data that has no labels. The model explores
underlying patterns and structures within the data. Two common unsupervised techniques are PCA (Principal
Component Analysis) and k-Means clustering.

Principal Component Analysis (PCA): PCA is a dimensionality reduction technique that transforms data
into a set of orthogonal components ordered by variance. The first few principal components capture the
most variance, reducing data complexity while preserving its structure.

• Example: In image compression, PCA can reduce the number of pixels required to represent an
image by finding components that represent the most significant features, thus reducing storage
needs without compromising much on quality.

K-Means Clustering: k-Means is a clustering technique that groups data points into a predefined number of
clusters (k). It begins by initializing k centroids, then iteratively adjusts them based on the average position
of points assigned to each centroid, until convergence.

• Example: Customer segmentation in e-commerce. PCA can reduce high-dimensional purchase data
into principal components, and k-Means then clusters customers into groups (e.g., budget buyers,
premium shoppers) based on these components, helping target marketing strategies.

The k-means clustering algorithm groups data points into k clusters, where each cluster contains items that
are similar to each other.

How it Works:

1. The algorithm starts by placing k centroids at random positions in the data space.
2. Each data point is assigned to the nearest centroid, creating clusters.
3. The centroids are updated to the average position of the points in their cluster.
4. Steps 2 and 3 repeat until the clusters stabilize (i.e., assignments no longer change).
9. Illustrate supervised learning with suitable algorithms.(probabilistic suprvised learning , SVM)
10. • Supervised Learning: In supervised learning, the model is trained on labeled data, meaning each
input has a known output. The objective is to learn the mapping from inputs to outputs and generalize well
to unseen data. Examples include probabilistic models and Support Vector Machines (SVM).

1. Probabilistic Supervised Learning: Logistic Regression

2. Support Vector Machines (SVM)

Concept: Support Vector Machines are supervised learning models used for classification tasks that find a
hyperplane that best separates classes in the input space. Unlike logistic regression, SVMs output a class
label rather than probabilities.

Key Features of SVM:


One of the most important methods for supervised learning is the support vector machine (SVM). Like
logistic regression, SVM uses a linear function, wTx+b, but instead of giving probabilities, it simply assigns
a class (positive or negative). If wTx+b positive, it predicts the positive class; if it’s negative, it predicts the
negative class.

A big idea in SVM is the kernel trick. This trick involves rewriting the linear function in terms of dot
products between the data points. By doing this, we can replace the inputs x with a different representation,
called ϕ(x), which can capture more complex patterns. The dot product is replaced with a kernel function that
calculates the similarity between the transformed inputs.

The most commonly used kernel is the Gaussian kernel

where N(x;µ, Σ) is the standard normal density.

The Gaussian kernel, also called the radial basis function (RBF) kernel, is one of the most popular kernels
in machine learning. It measures the similarity between two points u and v based on the normal distribution.
The closer u and v are in Euclidean distance, the higher the kernel value, meaning they are more similar.

We can think of the Gaussian kernel as a way of matching a new point to existing examples in the training set.
If a test point is close to a training point, the model will assign more importance to the label of that training
point when making a prediction.

This trick is useful because it lets us learn nonlinear models using methods that are efficient and guaranteed
to work. Also, the kernel function can often be computed faster than working directly with the transformed
inputs, making the process more efficient.

11. Describe the architecture of deep NN. Design the fully functioning feedforward network for learning
XOR function.

A deep feedforward network, or multilayer perceptron (MLP), is a neural network model structured to
approximate functions by mapping input data xxx to outputs yyy. Information flows forward through the
network, layer by layer, without feedback loops.
Key Components:

1. Layers:
o Input Layer: Takes in the input features xxx.
o Hidden Layers: Intermediate layers that learn complex patterns. Each hidden layer contains
neurons that apply an activation function to a linear combination of inputs.
o Output Layer: Produces the network's final output, which could be a probability distribution
for classification tasks or a numerical value for regression.
2. Activation Functions:
o Activation functions introduce non-linearity, enabling the network to learn complex
mappings. Common functions include ReLU, sigmoid, and tanh.
3. Learning Process:
o The network is trained using gradient-based optimization (e.g., stochastic gradient descent) to
minimize a loss function that measures the error between predicted and true outputs.

Designing a Feedforward Network for the XOR Function

The XOR function, which outputs 1 if only one of its two binary inputs is 1 and 0 otherwise, cannot be
solved by a simple linear model, as it’s not linearly separable. To learn XOR, we design a feedforward
network with:

• One hidden layer with two neurons and ReLU activations.


• One output layer with a single neuron for binary output.

Step-by-Step Design

1. Input Layer: Two inputs, x1and x2


2. Hidden Layer:
o Two neurons with weights and biases, allowing the network to learn a non-linear
transformation.
o ReLU activation applied to each neuron.
3. Output Layer:
o One neuron with weights connecting it to the hidden layer and producing the final output, y,
where y=0or 1 for XOR.

12. Demonstrate the designing of deep NN with gradient based learning w.r.t cost function, output units,
and hidden units.

To design a deep neural network (DNN) with gradient-based learning, we need to carefully consider the
architecture, cost function, output units, and hidden units. Here’s a detailed approach addressing each of these
design components:
Gradient-Based Learning with Respect to the Cost Function

Gradient-based learning is central to optimizing neural networks. Here’s how it works concerning the cost
function:

1. Cost Function: This measures the error between the network’s predictions and actual target values,
guiding parameter updates.
o Mean Squared Error (MSE): Used in regression tasks, MSE minimizes the squared
difference between predicted and actual values.
o Cross-Entropy Loss: Common in classification tasks, cross-entropy measures the difference
between predicted probabilities and true labels. For binary classification, binary cross-entropy
is used, while for multiclass classification, categorical cross-entropy is used(6. Gradient
based learn…).
2. Optimization via Gradient Descent:
o Gradient Descent: Computes the gradient of the cost function with respect to the model
parameters and updates these parameters in the direction that reduces the cost.
o Backpropagation: Efficiently calculates the gradient by propagating errors backward
through the network, layer by layer.
o Learning Rate: Controls the step size for each parameter update. Smaller learning rates
provide more precise convergence but slower training, while larger rates speed up training
but risk overshooting optimal values.

3. Design of Output Units

The choice of output units depends on the type of prediction task:

• Gaussian Output Units (Regression Tasks): For continuous outputs, assuming a Gaussian
distribution with mean μ\muμ and variance σ2\sigma^2σ2, the cost function is MSE. This is used for
tasks like predicting house prices or temperatures.
• Bernoulli Output Units (Binary Classification): For binary outputs (0 or 1), the Bernoulli
distribution models the probability of each outcome. Binary cross-entropy is the cost function here,
common in tasks like spam detection.
• Multinoulli (Categorical) Output Units (Multiclass Classification): For multiclass problems, the
Multinoulli distribution (or categorical distribution) models probabilities across classes. A softmax
activation is applied in the output layer to produce class probabilities, and categorical cross-entropy
is the cost function. This is ideal for problems like image classification(6. Gradient based learn…).

4. Design of Hidden Units

Hidden units in a neural network allow it to learn complex patterns by transforming the input data non-
linearly. Choosing the right activation function is crucial for effective learning:

• ReLU (Rectified Linear Unit): Outputs zero for negative inputs and the input value if positive,
promoting sparse and efficient representations. ReLU is widely used for its computational efficiency
and helps avoid vanishing gradients.
• Leaky ReLU and Parametric ReLU (PReLU): Variants of ReLU that allow a small gradient for
negative inputs to mitigate the “dying ReLU” problem, where neurons stop updating if stuck in
negative values.
• Sigmoid and Tanh Functions: While less common in deep networks, these functions are useful for
bounded output and probabilistic interpretations. However, they are prone to vanishing gradient
issues and can hinder training in deeper networks. Tanh, in particular, outputs values between -1 and
1, providing zero-centered data for more balanced updates

Demonstration: Designing a DNN to Learn the XOR Function

To make the design concrete, let’s apply it to learn the XOR function, which is not linearly separable and
thus requires a non-linear neural network to solve.

1. Network Architecture:
o Input Layer: Takes two binary inputs, x1x_1x1 and x2x_2x2.
o Hidden Layer: Consists of two neurons with ReLU activation.
o Output Layer: Single neuron with a sigmoid activation function to output a binary
classification (0 or 1).
2. Cost Function: Use binary cross-entropy loss to measure how well the network’s predicted
probability matches the true output of the XOR function.
3. Gradient-Based Optimization:
o Apply backpropagation to compute the gradient of the cross-entropy loss with respect to the
weights and biases.
o Parameter Update: Use a gradient descent optimizer (e.g., SGD or Adam) with a carefully
selected learning rate to iteratively minimize the cost function.
4. Training Process: Initialize weights randomly, perform forward propagation to compute predictions,
and then backpropagate the errors to update parameters. Repeat until the model’s output closely
matches the XOR function across all inputs.

You might also like