0% found this document useful (0 votes)
50 views12 pages

Section - C: Unit 1

one

Uploaded by

foreducation1602
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views12 pages

Section - C: Unit 1

one

Uploaded by

foreducation1602
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Section – C

UNIT 1
1. Give the comparison between Deep Learning and Machine Learning? Give any two
applications of Deep Learning?

Comparison between Deep Learning and Machine Learning:


 Scope: ML focuses on structured data, while DL can handle unstructured data.
 Feature Engineering: ML requires manual feature engineering; DL automates this
process.
 Algorithms: ML uses algorithms like SVM, decision trees, etc.; DL relies on neural
networks.
 Data Requirements: ML performs well with small datasets; DL requires large
datasets.
 Performance: DL excels in complex tasks like image recognition but is resource-
intensive.

Applications of Deep Learning:


 Image recognition (e.g., facial recognition).
 Natural Language Processing (e.g., translation, chatbots).

2. Explain types of data layers while working on data pipeline in tensorflow?


Types of Data Layers in TensorFlow Data Pipeline:
 Data Input Layer: Reads raw data (e.g., tf.data.Dataset).
 Data Preprocessing Layer: Normalization, augmentation, etc.
 Transformation Layer: Converts data into a format suitable for the model.
 Batching and Shuffling: Efficient data loading during training.

3. Draw a simple neural network which consist of an input layer(4 nodes) ,two hidden
layer(3 nodes) and one output layer(2 node). Calculate total number of trainable
parameter (weight and bias) in that neural network.
Simple Neural Network and Trainable Parameters:
 Structure: 4 input nodes, 2 hidden layers (3 nodes each), 2 output nodes.
 Weights:
Input to hidden1: 4×3=12
Hidden1 to hidden2: 3×3=9
Hidden2 to output: 3×2=6
Total weights: 12+9+6=27
 Biases: One bias per node except input: 3+3+2=8
Total Parameters: 27+8=35.

4. Draw and explain shallow and deep artificial neural network?


Shallow vs. Deep Artificial Neural Networks:
 Shallow: Single hidden layer, simpler tasks.
 Deep: Multiple hidden layers, solves complex tasks (e.g., image processing).

5. Explain different type of activation function used in neural network.


Activation Functions:
 Sigmoid: Outputs between 0 and 1, used in binary classification.
 ReLU: Rectified Linear Unit, resolves vanishing gradients.
 Tanh: Outputs between -1 and 1, works well with normalized data.
 Softmax: Converts outputs into probabilities, used in multi-class classification.

6. What is Backward Propagation? How do you update the value of Weights and bias In
neural network?
Backward Propagation and Updating Weights/Bias:
 Concept: Backpropagation computes the gradient of the loss function with respect
to each parameter.
 Steps:
o Compute forward pass.
o Calculate the error at output.
o Backpropagate the error through layers using chain rule.

7. Explain the concept of tensors in Machine learning and deep learning along with the
code.

8. How do you build Neural network using Keras? Explain it with an example along with
necessary code
Steps:
 Define the model (Sequential or Functional).
 Add layers (e.g., Dense for fully connected layers).
 Compile the model (specify optimizer, loss, and metrics).
 Train the model using fit.

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Dense

model = Sequential([
Dense(32, activation='relu', input_shape=(10,)),
Dense(16, activation='relu'),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)

9. Explain Tensorflow on the basis of computational graph?


ensorFlow and Computational Graphs:
 Concept: Computational graphs represent mathematical operations as nodes and
data as edges.
 Benefits:
o Parallel computation.
o Easy debugging of operations.
o Efficient deployment to multiple devices.
 Example:
Nodes represent operations (e.g., addition), edges carry data (tensors).

10. What is Activation Function? Explain the types of activation functions used in the
Neural network?
UNIT 2

1. Explain Graph and Session and its relation to Neural Networks?


 Graph and Session in TensorFlow:
 Graph: Defines computation.
 Session: Executes the graph.

2. What are Tensors? Explain its different dimensions with examples?


 Tensors and Dimensions:
 Scalar: 0D tensor.
 Vector: 1D tensor.
 Matrix: 2D tensor.
 Higher dimensions: 3D and beyond.

3. What is Activation Function? Explain the types of activation functions used in the
Neural network.

4. Explain the concept of optimization in Neural network. Explain any 3 difference


between batch gradient descent and stochastic gradient descent.

 Optimization in Neural Network:


 Goal: Minimize loss function.
 Gradient Descent Approaches:
o Batch: Entire dataset.
o Stochastic: Single sample.
o Mini-batch: Subset of data.

5. What is Shallow feed forward network and deep feed forward neural network.
6. Explain gradient descent. How to train a neural network using forward propagation
and backward propagation?
 Training Neural Network:
 Forward Propagation: Compute predictions.
 Backward Propagation: Update weights.

7. Write down the code to create a sequential model using Keras. What is epochs in
Neural network training.

from keras.models import Sequential


from keras.layers import Dense
# Create a sequential model
model = Sequential([
Dense(32, input_dim=10, activation='relu'), # Input and hidden layer
Dense(16, activation='relu'), # Hidden layer
Dense(1, activation='sigmoid') # Output layer
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

 Epochs:
o Number of passes over the entire dataset during training.
o Example: Training with 50 epochs means the model iterates through the
dataset 50 times.

8. Explain Tensorflow on the basis of computational graph?


TensorFlow and Computational Graph
 TensorFlow uses a computational graph to represent mathematical computations as
a directed graph:
o Nodes: Represent operations (e.g., addition, multiplication).
o Edges: Represent data flow (tensors).
 Advantages:
o Parallel computation.
o Efficient optimization through graph-based execution.
Example:
import tensorflow as tf
a = tf.constant(2)
b = tf.constant(3)
c = a + b # Computational graph
print(c.numpy()) # Execute the graph

9. Explain types of data layers while working on data pipeline in tensorflow?


Data Layers in TensorFlow Pipelines
 Types:
1. Input Layer: Ingests raw data (e.g., CSV, images).
2. Transformation Layer: Preprocesses data (e.g., normalization,
augmentation).
3. Batching Layer: Organizes data into mini-batches for training.
4. Prefetching Layer: Optimizes input pipeline speed by fetching data
asynchronously.
 import tensorflow as tf
 dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5])
 dataset = dataset.map(lambda x: x * 2).batch(2).prefetch(1)
 for batch in dataset:
 print(batch)

10. What are constants, Variable and placeholder in Tensorflow version1.0 ? Explain in
detail with example?

 Constants, Variables, Placeholders in TensorFlow 1.0:


 Constant: Immutable values.
 Variable: Changeable during training.
 Placeholder: Input data holder.
a = tf.constant(5)
b = tf.Variable(10)
c = tf.placeholder(tf.float32)

UNIT 3

1. What is Backward Propagation? Write down all the steps involved in updating the
value of Weights and bias using Gradient Descent along with the equations in
Backward propagation
Backward Propagation
Backward propagation (backprop) adjusts the weights and biases of a neural
network to minimize the loss function.
 Steps Involved:
1. Forward Pass: Compute the predicted output.
2. Compute Loss: Measure the difference between the predicted and actual
values using a loss function.
3. Backward Pass: Calculate gradients of the loss with respect to weights and
biases using the chain rule.
4. Update Parameters: Update weights and biases using

where η is the learning rate.

2. Write short note on Gradient Descent ?


Gradient Descent
Gradient Descent is an optimization method for minimizing a loss function by
iteratively adjusting parameters.
 Types:
o Batch Gradient Descent: Processes the entire dataset in one iteration; slower
but stable.
o Stochastic Gradient Descent (SGD): Uses one sample at a time; faster but
noisy.
o Mini-batch Gradient Descent: Processes small batches; balances speed and
stability.

3. How to declare a variable and placeholder in tensorflow version1.0? Write down the
necessary code.
Saddle Point
A saddle point occurs when the gradient is zero, but the point is neither a local
minimum nor a maximum. For example, in a 3D surface, a saddle point looks like a
mountain ridge.
 Challenges: The network can get stuck at saddle points, especially in high-
dimensional spaces.
 Solutions:
o Use adaptive optimization algorithms (e.g., Adam or RMSProp).
o Add momentum to help escape flat regions.

4. Explain the concept of saddle point and how to overcome it?


Feature Engineering & Dropout Layers
 Feature Engineering:
Transform raw data into meaningful features to improve model accuracy.
Techniques include normalization, one-hot encoding, and feature selection.
 Dropout Layers:
A regularization technique where random neurons are deactivated during training to
prevent overfitting. This encourages the network to develop redundant pathways,
improving generalization.

5. Explain the concept of feature engineering. Why do we use drop out layers in neural
network?
6. Explain gradient descent. How to train a neural network using forward propagation
and backward propagation?

7. Explain Information Theory and its relevance in Artificial Intelligence?


8. Explain Loss functions in Deep Learning?

Loss functions in deep learning are mathematical functions used to measure the
difference between the predicted output of the model and the actual target value.
The goal is to minimize the loss function during training so that the model's
predictions become more accurate over time.
Common Types of Loss Functions:
1. Mean Squared Error (MSE):
2. Cross-Entropy Loss (Log Loss):
3. Hinge Loss:
4. Huber Loss:
5. Kullback-Leibler Divergence (KL Divergence):

9. Explain in detail about the Gradient Descent. How the batch Gradient descent is
different form the Stochastic and mini batch Gradient descent?

Gradient Descent is an optimization algorithm used to minimize the loss function of


a machine learning model. It works by iteratively adjusting the model parameters
(weights) in the direction opposite to the gradient of the loss function with respect
to the parameters. This helps to reduce the loss function value and ultimately find
the optimal model parameters.
Types of Gradient Descent:
1. Batch Gradient Descent (BGD):
o BGD computes the gradient using the entire dataset for each iteration.
o It guarantees convergence to the global minimum for convex functions.
o Pros: Converges smoothly and is deterministic.
o Cons: Computationally expensive and slow for large datasets.
2. Stochastic Gradient Descent (SGD):
o SGD computes the gradient using only a single data point per iteration.
o Pros: Faster and less memory-intensive compared to BGD.
o Cons: The updates are noisy, which can make the convergence process more
erratic.
3. Mini-Batch Gradient Descent:
o This is a compromise between BGD and SGD, where the gradient is computed
on a small random subset (mini-batch) of the dataset.
o Pros: Faster than BGD and less noisy than SGD. It also benefits from parallel
computation.
o Cons: It can still be computationally demanding for very large datasets, but it
strikes a balance between efficiency and convergence.
In summary:
 BGD uses the full dataset, which is slow but precise.
 SGD uses one data point, making it fast but noisy.
 Mini-Batch uses a small subset of the dataset, balancing speed and stability.

UNIT 4

1. What is Momentum Gradient descent? What are some algorithms have been
developed using the idea of using different learning rates for different weights

Adaptive Learning Rate Algorithms:


 Adagrad: Adapts the learning rate based on the historical sum of squared gradients.
It works well for sparse data but can decrease learning rates too much over time.
 Adadelta: A modification of Adagrad that aims to overcome its rapidly decreasing
learning rate by using a moving window of past gradients instead of the cumulative
sum.
 Adam (Adaptive Moment Estimation): Combines ideas from momentum and
Adagrad, using both first-order (mean) and second-order (variance) moments to
adapt learning rates for each parameter.

2. Explain the complete layered structure of Keras API?

Layered Structure of Keras API


 Keras is a high-level neural network API built on top of TensorFlow, designed for fast
experimentation.
 Layers: Keras has various layer types like Dense, Conv2D, LSTM, etc., each
representing a different building block of a neural network.
o Input Layer: The entry point where the data is fed into the network.
o Hidden Layers: Layers between input and output that perform computations.
o Output Layer: Produces the final predictions.
 Models: Keras supports two main models:
o Sequential Model: Layers are stacked on top of each other in a linear fashion.
o Functional API: Allows for more complex architectures, including multi-input,
multi-output, and shared layers.
 Optimizers: Keras provides various optimizers (e.g., SGD, Adam, RMSprop) to
minimize the loss function.
 Loss Functions & Metrics: Predefined functions to evaluate model performance
(e.g., categorical_crossentropy, mean_squared_error).

3. Explain how back propogation works in detail?


Backpropagation Process
 Backpropagation is the key algorithm for training neural networks. It adjusts weights
by propagating errors backward from the output layer to the input layer.
1. Forward Pass: Inputs are passed through the network to generate
predictions.
2. Compute Loss: The difference between the predicted output and the actual
target is computed using a loss function.
3. Backward Pass:
 Compute the gradient of the loss function with respect to each weight
using the chain rule.
 Gradients are propagated backward, from the output layer to the
input layer, updating the weights of each layer.
4. Update Weights: Weights are updated using an optimization algorithm (e.g.,
gradient descent) to minimize the loss.

4. Explain Gradient Descent and its challenges in detail?


Gradient Descent and Its Challenges
 Gradient Descent is an iterative optimization algorithm used to minimize the loss
function in machine learning models. The idea is to adjust the parameters (weights)
in the opposite direction of the gradient to find the optimal solution.
 Challenges:
o Local Minima/Plateaus: Gradient descent may converge to a local minimum
instead of the global minimum, especially in non-convex functions.
o Vanishing/Exploding Gradients: In deep networks, gradients can become too
small (vanishing) or too large (exploding), causing training to be slow or
unstable.
o Learning Rate Selection: The choice of learning rate is critical; too high a rate
can cause overshooting, while too low can slow down convergence.
5. What is Gradient Descent and explain its three approaches?
Gradient Descent and Its Three Approaches
 Gradient Descent: The algorithm updates the model's parameters by moving them
in the opposite direction of the gradient of the loss function.
 Three Approaches:
1. Batch Gradient Descent: Uses the entire dataset to compute the gradient in
each iteration, leading to accurate but slow updates.
2. Stochastic Gradient Descent (SGD): Uses one data point per iteration, leading
to faster but noisier updates.
3. Mini-Batch Gradient Descent: Uses a small subset (mini-batch) of data
points, striking a balance between speed and accuracy.

6. What is Momentum Gradient descent? What are some key differences between the
Adadelta, Adagrad and Adam optimizer?
Momentum Gradient Descent and Key Differences Between Optimizers
 Momentum Gradient Descent: Improves the standard gradient descent by adding a
momentum term, helping the optimizer move faster in the right direction and
avoiding oscillations.
 Key Differences Between Adadelta, Adagrad, and Adam:
1. Adagrad: Adapts the learning rate for each parameter based on the sum of
past squared gradients. It can lead to rapid decay of learning rates.
2. Adadelta: Improves upon Adagrad by considering a moving average of
squared gradients instead of the sum, preventing learning rates from
shrinking too quickly.
3. Adam: Combines the benefits of both momentum and adaptive learning
rates, using both first and second moments of the gradient. It is widely used
for a variety of tasks due to its efficiency and fast convergence.

You might also like