0% found this document useful (0 votes)

64 views35 pages

Deep Learning

deep learning notes

Uploaded by

sindhuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views35 pages

Deep Learning

deep learning notes

Uploaded by

sindhuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Assignment 1

1 point
Consider the following table, where x1x1 and x2x2 are features and yy is a label

Assume that the elements in ww are initialized to zero and the perception learning algorithm is used to
update the weights ww. If the learning algorithm runs for long enough iterations, then

The algorithm never converges

The algorithm converges (i.e., no further weight updates) after some iterations
The classification error remains greater than zero

The classification error becomes zero eventually

Yes, the answer is correct.
Score: 1
Accepted Answers:
The algorithm converges (i.e., no further weight updates) after some iterations
The classification error becomes zero eventually
1 point
In the perceptron model, the weight ww vector is perpendicular to the linear decision boundary at all times.

True

False
Yes, the answer is correct.
Score: 1
Accepted Answers:
True
1 point
What is the perceptron algorithm used for?

Clustering data points

Classifying data
Solving optimization problems

Finding the shortest path in a graph

Yes, the answer is correct.
Score: 1
Accepted Answers:
Classifying data
1 point
Choose the correct input-output pair for the given MP
Neuron. f(x)={1,0,if x1+x2+x3>2otherwise f(x)={1,if x1+x2+x3>20,otherwise

y=1y=1 for (x1,x2,x3)=(0,1,1)(x1,x2,x3)=(0,1,1)

y=0y=0 for (x1,x2,x3)=(0,0,1)(x1,x2,x3)=(0,0,1)

y=1y=1 for (x1,x2,x3)=(0,0,0)(x1,x2,x3)=(0,0,0)

y=1y=1 for (x1,x2,x3)=(1,1,1)(x1,x2,x3)=(1,1,1)

y=0y=0 for (x1,x2,x3)=(1,0,1)(x1,x2,x3)=(1,0,1)

Yes, the answer is correct.
Score: 1
Accepted Answers:
y=0y=0 for (x1,x2,x3)=(0,0,1)(x1,x2,x3)=(0,0,1)
y=1y=1 for (x1,x2,x3)=(1,1,1)(x1,x2,x3)=(1,1,1)
y=0y=0 for (x1,x2,x3)=(1,0,1)(x1,x2,x3)=(1,0,1)
1 point
Which of the following Boolean functions can be implemented using a perceptron?

NOR
NAND
NOT

XOR
Yes, the answer is correct.
Score: 1
Accepted Answers:
NOR
NAND
NOT
1 point
Which of the following threshold values of MP neuron implements AND Boolean function? Assume that the
number of inputs to the neuron is 7 and the neuron does not have any inhibitory inputs.

88
Yes, the answer is correct.
Score: 1
Accepted Answers:
77
1 point
Suppose we have a boolean function that takes 4 inputs x1,x2,x3,x4?x1,x2,x3,x4? We have an MP
neuron with parameter θ=3θ=3. For how many inputs will this MP neuron give output y=1y=1?

1616
Yes, the answer is correct.
Score: 1
Accepted Answers:
55
1 point
Consider points shown in the picture. The vector w=[−1−1]w=[−1−1]. As per this weight vector, the
Perceptron algorithm will predict which classes for the data points x1x1 and x2x2.

NOTE:
y={1−1if wTx>0if wTx≤0y={1if wTx>0−1if wTx≤0

x1=−1x1=−1

x1=1x1=1
x2=−1x2=−1

x2=1x2=1
Yes, the answer is correct.
Score: 1
Accepted Answers:
x1=−1x1=−1
x2=1x2=1
1 point
Consider the following table, where x1x1 and x2x2 are features (packed into a single
vector x=[x1x2]x=[x1x2]) and yy is a label:

Suppose that the perceptron model is used to classify the data points. Suppose further that the
weights ww are initialized to w=[11]w=[11]. The following rule is used for classification,

y={10if wTx>0if wTx≤0y={1if wTx>00if wTx≤0

The perceptron learning algorithm is used to update the weight vector ww. Then, how many times the
weight vector ww will get updated during the entire training process?

2
1
0

Not possible to determine

Yes, the answer is correct.
Score: 1
Accepted Answers:
0
1 point
Which Boolean function with two inputs x1x1 and x2x2 is represented by the following decision
boundary? (Points on boundary or right of the decision boundary to be classified 1)
How many boolean functions can be designed for 3 inputs?

8
16
256

64
Yes, the answer is correct.
Score: 1
Accepted Answers:
256
1 point
Which of the following statements is(are) true about the following function?
σ(z)=11+e−(z)σ(z)=11+e−(z)

The function is bounded between 0 and 1

The function attains its maximum when z→∞z→∞

The function is continuously differentiable

The function is monotonic

Yes, the answer is correct.
Score: 1
Accepted Answers:
The function is bounded between 0 and 1
The function attains its maximum when z→∞z→∞
The function is continuously differentiable
The function is monotonic
1 point
You are training a model using the gradient descent algorithm and notice that the loss decreases and then
increases after each successive epoch (pass through the data). Which of the following techniques would
you employ to enhance the likelihood of the gradient descent algorithm converging? (Here, ηη refers to the
step size.)

Decrease the value of ηη

Increase the value of ηη

Set η=1η=1

Set η=0η=0

Yes, the answer is correct.

Score: 1
Accepted Answers:
Decrease the value of ηη
1 point
Which of the following statements is true about the representation power of a multilayer network of
perceptions?

A multilayer network of perceptrons can represent any function.

A multilayer network of perceptrons can represent any linear function.
A multilayer network of perceptrons can represent any boolean function.

A multilayer network of perceptrons can represent any continuous function.

No, the answer is incorrect.
Score: 0
Accepted Answers:
A multilayer network of perceptrons can represent any boolean function.
1 point
How many boolean functions can be designed for 4 inputs?

65,536
8
256

64
Yes, the answer is correct.
Score: 1
Accepted Answers:
65,536
1 point
We have a function that we want to approximate using 150 rectangles (towers). How many neurons are
required to construct the required network?

301
451
150

500
No, the answer is incorrect.
Score: 0
Accepted Answers:
301
1 point
What happens to the output of the sigmoid function as |x||x| becomes very large for input x?Select all
relevant operations

The output approaches 0.5

The output approaches 1.
The output oscillates between 0 and 1.

The output approaches 0.

Yes, the answer is correct.
Score: 1
Accepted Answers:
The output approaches 1.
The output approaches 0.
1 point
We have a classification problem with labels 0 and 1. We train a logistic model and find out
that ω0ω0 learned by our model is -17. We are to predict the label of a new test point xx using this trained
model. If ωTx=1ωTx=1, which of the following statements is True?

We cannot make any prediction as the value of ωTxωTx does not make sense
The label of the test point is 0.
The label of the test point is 1.

We cannot make any prediction as we do not know the value of xx.

Yes, the answer is correct.
Score: 1
Accepted Answers:
The label of the test point is 0.
1 point
Suppose we have a function f(x1,x2)=x21+3x2+25f(x1,x2)=x12+3x2+25 which we want to minimize
the given function using the gradient descent algorithm. We initialize (x1,x2)=(0,0)(x1,x2)=(0,0). What
will be the value of x1x1 after ten updates in the gradient descent process?(Let ηη be 1)

0
-3
−4.5

−3
Yes, the answer is correct.
Score: 1
Accepted Answers:
0
1 point
What is the purpose of the gradient descent algorithm in machine learning?

To minimize the loss function

To maximize the loss function
To minimize the output function

To maximize the output function

Yes, the answer is correct.
Score: 1
Accepted Answers:
To minimize the loss function

AND
OR
XOR

NAND
Yes, the answer is correct.
Score: 1
Accepted Answers:
OR
1 point
Choose the correct input-output pair for the given MP Neuron.

y={1,0,ifx1+x2+x3≥2otherwisey={1,ifx1+x2+x3≥20,otherwise

y=1y=1 for (x1,x2,x3)=(0,1,1)(x1,x2,x3)=(0,1,1)

y=0y=0 for (x1,x2,x3)=(0,0,1)(x1,x2,x3)=(0,0,1)

y=1y=1 for (x1,x2,x3)=(1,1,1)(x1,x2,x3)=(1,1,1)

y=0y=0 for (x1,x2,x3)=(1,0,0)(x1,x2,x3)=(1,0,0)

Yes, the answer is correct.
Score: 1
Accepted Answers:
y=1y=1 for (x1,x2,x3)=(0,1,1)(x1,x2,x3)=(0,1,1)
y=0y=0 for (x1,x2,x3)=(0,0,1)(x1,x2,x3)=(0,0,1)
y=1y=1 for (x1,x2,x3)=(1,1,1)(x1,x2,x3)=(1,1,1)
y=0y=0 for (x1,x2,x3)=(1,0,0)(x1,x2,x3)=(1,0,0)
1 point
Suppose we have a boolean function that takes 4 inputs x1, x2, x3, x4? We have an MP neuron with
parameter θ=2θ=2. For how many inputs will this MP neuron give output y=1?y=1?

11
21
15

8
No, the answer is incorrect.
Score: 0
Accepted Answers:
11
1 point
We are given the following data:

Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while
training)

Yes
No
Yes, the answer is correct.
Score: 1
Accepted Answers:
No
1 point
We are given the following dataset with features as (x1,x2) and y as the label (-1,1). If we apply the
perception algorithm on the following dataset with w initialized as (0,0). What will be the value of w when
the algorithm converges? (Start the algorithm from (2,2)

(-2,2)
(2,1)
(2,-1)

None of These
Yes, the answer is correct.
Score: 1
Accepted Answers:
(2,-1)
1 point
Consider points shown in the picture. The vector w is (-1,0). As per this weight vector, the Perceptron
algorithm will predict which classes for the data points x1 and x2.

x1=1
x2=1
x1=-1
x2=-1
Yes, the answer is correct.
Score: 1
Accepted Answers:
x2=1
x1=-1
1 point
Given an MP neuron with the inputs as x1,x2,x3,x4,x5 and threshold θ=3θ=3 where x5 is inhibitory input.
For input (1,1,1,0,1) what will be the value of yy?

y=0y=0

y=1y=1 since θ≥3θ≥3

y=1/2y=1/2

Insufficient information
Yes, the answer is correct.
Score: 1
Accepted Answers:
y=0y=0
1 point
An MP neuron takes two inputs x1 and x2. Its threshold is θ=0θ=0. Select all the boolean functions this
MP neuron may represent.

AND
NOT
OR

NOR
Yes, the answer is correct.
Score: 1
Accepted Answers:
NOR
1 point
What is the output of a perceptron with weight vector w=[2−31]w=[2−31] and bias b=−2b=−2 when
the input is x=[10−1]x=[10−1]?

0
1
-1

2
Yes, the answer is correct.
Score: 1
Accepted Answers:
-1
1 point
What is the ”winter of AI” referring to in the history of artificial intelligence?
The period during winter when AI technologies are least effective due to cold temperatures
A phase marked by decreased funding and interest in AI research.
The season when AI algorithms perform at their peak efficiency.

A period characterized by rapid advancements and breakthroughs in AI technologies.

Yes, the answer is correct.
Score: 1
Accepted Answers:
A phase marked by decreased funding and interest in AI research.

Assignment 2
How many boolean functions can be designed for 3 inputs?

8
16
256

64
Yes, the answer is correct.
Score: 1
Accepted Answers:
256
1 point
Which of the following statements is(are) true about the following function?
σ(z)=11+e−(z)σ(z)=11+e−(z)

The function is bounded between 0 and 1

The function attains its maximum when z→∞z→∞

The function is continuously differentiable

The function is monotonic

Decrease the value of ηη

Increase the value of ηη

Set η=1η=1

Set η=0η=0

Yes, the answer is correct.

Score: 1
Accepted Answers:
Decrease the value of ηη
1 point
Which of the following statements is true about the representation power of a multilayer network of
perceptions?

A multilayer network of perceptrons can represent any function.

A multilayer network of perceptrons can represent any linear function.
A multilayer network of perceptrons can represent any boolean function.

A multilayer network of perceptrons can represent any continuous function.

No, the answer is incorrect.
Score: 0
Accepted Answers:
A multilayer network of perceptrons can represent any boolean function.
1 point
How many boolean functions can be designed for 4 inputs?

65,536
8
256

301
451
150

500
No, the answer is incorrect.
Score: 0
Accepted Answers:
301
1 point
What happens to the output of the sigmoid function as |x||x| becomes very large for input x?Select all
relevant operations
The output approaches 0.5
The output approaches 1.
The output oscillates between 0 and 1.

The output approaches 0.

We cannot make any prediction as the value of ωTxωTx does not make sense
The label of the test point is 0.
The label of the test point is 1.

We cannot make any prediction as we do not know the value of xx.

0
-3
−4.5

−3
Yes, the answer is correct.
Score: 1
Accepted Answers:
0
1 point
What is the purpose of the gradient descent algorithm in machine learning?

To minimize the loss function

To maximize the loss function
To minimize the output function

To maximize the output function

Yes, the answer is correct.
Score: 1
Accepted Answers:
To minimize the loss function

Assignment 4
A team has a data set that contains 1000 samples for training a feed-forward neural network. Suppose they
decided to use stochastic gradient descent algorithm to update the weights. How many times do the
weights get updated after training the network for 5 epochs?

1000
5000
100

5
Yes, the answer is correct.
Score: 1
Accepted Answers:
5000
1 point
What is the primary benefit of using Adagrad compared to other optimization algorithms?

It converges faster than other optimization algorithms.

It is more memory-efficient than other optimization algorithms.
It is less sensitive to the choice of hyperparameters(learning rate).

It is less likely to get stuck in local optima than other optimization algorithms.
Yes, the answer is correct.
Score: 1
Accepted Answers:
It is less sensitive to the choice of hyperparameters(learning rate).
1 point
What are the benefits of using stochastic gradient descent compared to vanilla gradient descent?

SGD converges more quickly than vanilla gradient descent.

SGD is computationally efficient for large datasets.
SGD theoretically guarantees that the descent direction is optimal.
SGD experiences less oscillation compared to vanilla gradient descent.
Yes, the answer is correct.
Score: 1
Accepted Answers:
SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.
1 point
A team has a data set that contains 100 samples for training a feed-forward neural network. Suppose they
decided to use the gradient descent algorithm to update the weights. Suppose further that they use line
search algorithm for the learning rate as follows, η=[0.01,0.1,1,2,10]η=[0.01,0.1,1,2,10]. How many
times do the weights get updated after training the network for 10 epochs? (Note, for each weight update
the loss has to decrease)

100
5
500
10

50
Yes, the answer is correct.
Score: 1
Accepted Answers:
10
1 point
Select the true statements about the factor ββ used in the momentum based gradient descent algorithm.

Setting β=0.1β=0.1 allows the algorithm to move faster than the vanilla gradient descent algorithm

Setting β=0β=0 makes it equivalent to the vanilla gradient descent algorithm

Setting β=1β=1 makes it equivalent to the vanilla gradient descent algorithm

Oscillation around the minimum will be less if we set β=0.1β=0.1 than setting β=0.99β=0.99

Yes, the answer is correct.

Score: 1
Accepted Answers:
Setting β=0.1β=0.1 allows the algorithm to move faster than the vanilla gradient descent algorithm
Setting β=0β=0 makes it equivalent to the vanilla gradient descent algorithm
Oscillation around the minimum will be less if we set β=0.1β=0.1 than setting β=0.99β=0.99

1 point
What is the advantage of using mini-batch gradient descent over batch gradient descent?

Mini-batch gradient descent is more computationally efficient than batch gradient descent.
Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient
descent.
Mini batch gradient descent gives us a better solution.

Mini-batch gradient descent can converge faster than batch gradient descent.
Partially Correct.
Score: 0.5
Accepted Answers:
Mini-batch gradient descent is more computationally efficient than batch gradient descent.
Mini-batch gradient descent can converge faster than batch gradient descent.
1 point
We have following functions x3,ln(x),ex,xx3,ln(x),ex,x and 4. Which of the following functions has the
steepest slope at x=1?

x3x3

ln(x)ln(x)

exex

4
No, the answer is incorrect.
Score: 0
Accepted Answers:
x3x3
1 point
Which of the following are among the disadvantages of Adagrad?

It doesn’t work well for the Sparse matrix.

It usually goes past the minima.
It gets stuck before reaching the minima.

Weight updates are very small at the initial stages of the algorithm.
Yes, the answer is correct.
Score: 1
Accepted Answers:
It gets stuck before reaching the minima.
1 point
Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update
the current position of the parameters?

Momentum optimization
Stochastic gradient descent
Nesterov accelerated gradient descent

Adagrad
Yes, the answer is correct.
Score: 1
Accepted Answers:
Nesterov accelerated gradient descent
1 point
What is the role of activation functions in deep learning?

Activation functions transform the output of a neuron into a non-linear function, allowing the network to
learn complex patterns.
Activation functions make the network faster by reducing the number of iterations needed for training.
Activation functions are used to normalize the input data.
Activation functions are used to compute the loss function.
Yes, the answer is correct.
Score: 1
Accepted Answers:
Activation functions transform the output of a neuron into a non-linear function, allowing the network to
learn complex patterns.

Assignment 5
Which of the following is a measure of the amount of variance explained by a principal component in PCA?

Covariance
Correlation
Mean absolute deviation

Eigenvalue
Yes, the answer is correct.
Score: 1
Accepted Answers:
Eigenvalue
1 point
What is/are the limitations of PCA?

It is computationally less efficient than autoencoders

It can only reduce the dimensionality of a dataset by a fixed amount.
It can only identify linear relationships in the data.

It can be sensitive to outliers in the data.

Partially Correct.
Score: 0.5
Accepted Answers:
It can only identify linear relationships in the data.
It can be sensitive to outliers in the data.
1 point
Which of the following is a property of eigenvalues of a symmetric matrix?

Eigenvalues are always positive

Eigenvalues are always negative
Eigenvalues are always real

Eigenvalues can be complex numbers with imaginary parts non-zero

Yes, the answer is correct.
Score: 1
Accepted Answers:
Eigenvalues are always real
1 point
The eigenvalues of AA are 3,4. Which of the following are the eigenvalues of A3A3?

3, 4
9, 16
27, 64

3–√,4–√3,4
Yes, the answer is correct.
Score: 1
Accepted Answers:
27, 64
1 point
If we have a 12×1212×12 matrix having entries from RR, how many linearly independent eigenvectors
corresponding to real eigenvalues are possible for this matrix?

10
24
12

6
Partially Correct.
Score: 0.33
Accepted Answers:
10
12
6

Questions 6-9 are based on common data.

Consider the following data points x1,x2,x3x1,x2,x3 to answer following

questions: x1=[22]x1=[22], x2=[12]x2=[12], x3=[21]x3=[21]
1 point
What is the mean of the given data points x1,x2,x3x1,x2,x3?

[55][55]

[1.671.67][1.671.67]

[22][22]

[1.51.5][1.51.5]
Yes, the answer is correct.
Score: 1
Accepted Answers:
[1.671.67][1.671.67]
1 point
The covariance matrix C=1n∑ni=1(x−x¯)(x−x¯)TC=1n∑i=1n(x−x¯)(x−x¯)T is given by: (x¯x¯ is
mean of the data points)

[0.22−0.11−0.110.22][0.22−0.11−0.110.22]

[0.33−0.17−0.170.33][0.33−0.17−0.170.33]
[0.22−0.22−0.220.22][0.22−0.22−0.220.22]

[0.33−0.33−0.330.33][0.33−0.33−0.330.33]
Yes, the answer is correct.
Score: 1
Accepted Answers:
[0.22−0.11−0.110.22][0.22−0.11−0.110.22]
1 point
The maximum eigenvalue of the covariance matrix CC is:

0.330.33

0.670.67

0.50.5
Yes, the answer is correct.
Score: 1
Accepted Answers:
0.330.33
1 point
The eigenvector corresponding to the maximum eigenvalue of the given matrix CC is:

[0.710.71][0.710.71]

[−0.710.71][−0.710.71]

[−11][−11]

[11][11]
No, the answer is incorrect.
Score: 0
Accepted Answers:
[−0.710.71][−0.710.71]
OR
[−11][−11]
What is the determinant of a 2×22×2 matrix that has eigenvalues of 44 and 55?

Yes, the answer is correct.

Score: 1
Accepted Answers:
(Type: Numeric) 20

Assignment 6
We are given an autoencoder A. The average activation value of neurons in this network is 0.01. The given
autoencoder is
Contractive autoencoder
Overcomplete neural network
Denoising autoencoder

Sparse autoencoder
No, the answer is incorrect.
Score: 0
Accepted Answers:
Sparse autoencoder
1 point
What is the primary reason for adding corruption to the input data in a denoising autoencoder?

To increase the complexity of the model.

To improve the model’s ability to generalize to unseen data.
To reduce the size of the training dataset.

To increase the training time.

Yes, the answer is correct.
Score: 1
Accepted Answers:
To improve the model’s ability to generalize to unseen data.
1 point
What is/are the primary advantages of Autoencoders over PCA?

Autoencoders are less prone to overfitting than PCA.

Autoencoders are faster and more efficient than PCA.
Autoencoders can capture nonlinear relationships in the input data.

Autoencoders require fewer input data than PCA.

Yes, the answer is correct.
Score: 1
Accepted Answers:
Autoencoders can capture nonlinear relationships in the input data.
1 point
What type of autoencoder is it when the hidden layer’s dimensionality is less than that of the input layer?

Under-complete autoencoder
Complete autoencoder
Overcomplete autoencoder

Sparse autoencoder
Yes, the answer is correct.
Score: 1
Accepted Answers:
Under-complete autoencoder
1 point
Which of the following statements about regularization in autoencoders is always true?

Regularisation reduces the search space of weights for the network.

Regularisation helps to reduce the overfitting in overcomplete autoencoders.
Regularisation shrinks the size of weight vectors learned.

All of these.
No, the answer is incorrect.
Score: 0
Accepted Answers:
Regularisation reduces the search space of weights for the network.
Regularisation helps to reduce the overfitting in overcomplete autoencoders.
1 point
What are the advantages of using a denoising autoencoder?

Robustness to noisy input data

Reduction of the risk of overfitting
Faster training time

It promotes sparsity in the hidden layer

Yes, the answer is correct.
Score: 1
Accepted Answers:
Robustness to noisy input data
Reduction of the risk of overfitting
1 point
We are given an autoencoder A. The average activation value of neurons in this network is 0.06. The given
autoencoder is:

Contractive autoencoder
Overcomplete neural network
Sparse autoencoder

Denoising autoencoder
Yes, the answer is correct.
Score: 1
Accepted Answers:
Sparse autoencoder
1 point
If the dimension of the input layer in an over-complete autoencoder is 5, what is the possible dimension of
the hidden layer?

4
2
8

0
Yes, the answer is correct.
Score: 1
Accepted Answers:
8
1 point
What is the primary objective of sparse autoencoders that distinguishes it from vanilla autoencoder?

They learn a low-dimensional representation of the input data

They minimize the reconstruction error between the input and the output
They capture only the important variations/features in the data

They maximize the mutual information between the input and the output
Yes, the answer is correct.
Score: 1
Accepted Answers:
They capture only the important variations/features in the data
1 point
Suppose for one data point we have
features x1,x2,x3,x4,x5x1,x2,x3,x4,x5 as −2,12,4.2,7.6,0−2,12,4.2,7.6,0 then, which of the
following function should we use on the output layer(decoder)?

Logistic
Relu
Tanh

Linear
Yes, the answer is correct.
Score: 1
Accepted Answers:
Linear

Assignment 8
Which of the following activation functions is not zero-centered?

Sigmoid
Tanh
ReLU

Softmax
Partially Correct.
Score: 0.34
Accepted Answers:
Sigmoid
ReLU
Softmax
1 point
Which of the following are common issues caused by saturating neurons in deep networks?

Vanishing gradients
Slow convergence during training
Overfitting

Increased model complexity

No, the answer is incorrect.
Score: 0
Accepted Answers:
Vanishing gradients
Slow convergence during training
1 point
What are the challenges associated with using the Tanh(x) activation function?
It is not zero centered
Computationally expensive
Non-differentiable at 0

Saturation
Yes, the answer is correct.
Score: 1
Accepted Answers:
Computationally expensive
Saturation
1 point
Which of the following activation functions is preferred to avoid the vanishing gradient problem?

Sigmoid
Tanh
ReLU

None of these
Yes, the answer is correct.
Score: 1
Accepted Answers:
ReLU
1 point
How does pre-training prevent overfitting in deep networks?

It adds regularization
It initializes the weights near local minima
It constrains the weights to a certain region

It eliminates the need for fine-tuning

Partially Correct.
Score: 0.5
Accepted Answers:
It adds regularization
It constrains the weights to a certain region
1 point
We train a feed-forward neural network and notice that all the weights for a particular neuron are equal.
What could be the possible causes of this issue?

Weights were initialized randomly

Weights were initialized to high values
Weights were initialized to equal values

Weights were initialized to zero

Partially Correct.
Score: 0.5
Accepted Answers:
Weights were initialized to equal values
Weights were initialized to zero
1 point
Which of the following methods can help to avoid saturation in deep learning?
Using a different activation function.
Increasing the learning rate.
Increasing the model complexity

All of the above.

Yes, the answer is correct.
Score: 1
Accepted Answers:
Using a different activation function.
1 point
Which of the following is an advantage of unsupervised pre-training in deep learning?

It helps in reducing overfitting

Pre-trained models converge faster
It improves the accuracy of the model

It requires fewer computational resources

Partially Correct.
Score: 0.33
Accepted Answers:
It helps in reducing overfitting
Pre-trained models converge faster
It improves the accuracy of the model
1 point
How can you tell if your network is suffering from the Dead ReLU problem?

The loss function is not decreasing during training

The accuracy of the network is not improving
A large number of neurons have zero output

The network is overfitting to the training data

Yes, the answer is correct.
Score: 1
Accepted Answers:
A large number of neurons have zero output
1 point
In Batch Normalization, which parameter is learned during training?

Mean
Variance

γγ

ϵϵ
Yes, the answer is correct.
Score: 1
Accepted Answers:
γ

Assignment 9
Consider the following corpus: "human machine interface for computer applications. user opinion of
computer system response time. user interface management system. system engineering for improved
response time". What is the size of the vocabulary of the above corpus?

13
14
15

16
No, the answer is incorrect.
Score: 0
Accepted Answers:
15
1 point
At the input layer of a continuous bag of words model, we multiply a one-hot vector x∈R|V|x∈R|V| with the
parameter matrix W∈Rk×|V|W∈Rk×|V|. What does each column of WW correspond to?

the representation of the ii-th word in the vocabulary

the ii-th eigen vector of the co-occurrence matrix

No, the answer is incorrect.

Score: 0
Accepted Answers:
the representation of the ii-th word in the vocabulary

Suppose that we use the continuous bag of words (CBOW) model to find vector representations of words.
Suppose further that we use a context window of size 3 (that is, given the 3 context words, predict the
target word P(wt|(wi,wj,wk))P(wt|(wi,wj,wk))). The size of word vectors (vector representation of words)
is chosen to be 100 and the vocabulary contains 10,000 words. The input to the network is the one-hot
encoding (also called 1-of-VV encoding) of word(s). How many parameters (weights), excluding bias, are
there in WwordWword? Enter the answer in thousands. For example, if your answer is 50,000, then just
enter 50.
No, the answer is incorrect.
Score: 0
Accepted Answers:
(Type: Numeric) 1000
1 point
1 point
Let count(w,c)count(w,c) be the number of times the words ww and cc appear together in the corpus
(i.e., occur within a window of few words around each other). Further,
let count(w)count(w) and count(c)count(c) be the total number of times the word ww and cc appear
in the corpus respectively and let NN be the total number of words in the corpus. The PMI
between ww and cc is then given by:

logcount(w,c)∗count(w)N∗count(c)log⁡count(w,c)∗count(w)N∗count(c)

logcount(w,c)∗count(c)N∗count(w)log⁡count(w,c)∗count(c)N∗count(w)

logcount(w,c)∗Ncount(w)∗count(c)log⁡count(w,c)∗Ncount(w)∗count(c)
No, the answer is incorrect.
Score: 0
Accepted Answers:
logcount(w,c)∗Ncount(w)∗count(c)log⁡count(w,c)∗Ncount(w)∗count(c)
1 point
Suppose we are learning the representations of words using Glove representations. If we observe that the
cosine similarity between two representations vivi and vjvj for words `ii' and `jj' is very high. which of the
following statements is true?( parameter bibi = 0.02 and bjbj = 0.07)

Xij=0.02Xij=0.02

Xij=0.2Xij=0.2

Xij=0.88Xij=0.88

Xij=0Xij=0
No, the answer is incorrect.
Score: 0
Accepted Answers:
Xij=0.88Xij=0.88
1 point
Which of the following is a disadvantage of one hot encoding?

It requires a large amount of memory to store the vectors

It can result in a high-dimensional sparse representation
It cannot capture the semantic similarity between words

All of the above

No, the answer is incorrect.
Score: 0
Accepted Answers:
All of the above
1 point
Which of the following is true about the input representation in the CBOW model?

Each word is represented as a one-hot vector

Each word is represented as a continuous vector
Each word is represented as a sequence of one-hot vectors

Each word is represented as a sequence of continuous vectors

No, the answer is incorrect.
Score: 0
Accepted Answers:
Each word is represented as a one-hot vector
1 point
What is the role of the softmax function in the skip-gram method?

To calculate the dot product between the target word and the context words
To transform the dot product into a probability distribution
To calculate the distance between the target word and the context words

To adjust the weights of the neural network during training

No, the answer is incorrect.
Score: 0
Accepted Answers:
To transform the dot product into a probability distribution
1 point
What is the computational complexity of computing the softmax function in the output layer of a neural
network?

O(n)O(n)

O(n2)O(n2)

O(nlogn)O(nlogn)

O(logn)O(logn)
No, the answer is incorrect.
Score: 0
Accepted Answers:
O(n)O(n)
1 point
How does Hierarchical Softmax reduce the computational complexity of computing the softmax function?

It replaces the softmax function with a linear function

It uses a binary tree to approximate the softmax function
It uses a heuristic to compute the softmax function faster

It does not reduce the computational complexity of computing the softmax function
No, the answer is incorrect.
Score: 0
Accepted Answers:
It uses a binary tree to approximate the softmax function

Assignment 10
Consider an input image of size 1000×1000×101000×1000×10 where 10 refers to the number of
channels (Such images do exist!). Suppose we want to apply a convolution operation on the entire image
by sliding a kernel of size 1×1×d1×1×d. What should be the depth d of the kernel?

Yes, the answer is correct.

Score: 1
Accepted Answers:
(Type: Numeric) 10
1 point
1 point
For the same input image in Q1, suppose that we apply the following kernels of differing sizes.

K1:3×3K1:3×3
K2:7×7K2:7×7
K3:17×17K3:17×17
K4:41×41K4:41×41

Assume that stride s=1s=1 and no zero padding. Among all these kernels which one shrinks the output
dimensions the most?
K1K1

K2K2

K3K3

K4K4
Yes, the answer is correct.
Score: 1
Accepted Answers:
K4K4
1 point
Which of the following statements about CNN is (are) true?

CNN is a feed-forward network

Weight sharing helps CNN layers to reduce the number of parameters
CNN is suitable only for natural images

The shape of the input to the CNN network should be square

Yes, the answer is correct.
Score: 1
Accepted Answers:
CNN is a feed-forward network
Weight sharing helps CNN layers to reduce the number of parameters

Consider an input image of size 100×100×1100×100×1. Suppose that we used kernel of

size 3×33×3, zero padding P=1P=1 and stride value S=3S=3. What will be the output dimension?

No, the answer is incorrect.

Score: 0
Accepted Answers:
(Type: Numeric) 34
0 points
1 point
Consider an input image of size 100×100×3100×100×3. Suppose that we use 10 kernels (filters) each
of size 1×11×1, zero padding P=1P=1 and stride value S=2S=2. How many parameters are there?
(assume no bias terms)

5
10
15

30
Yes, the answer is correct.
Score: 1
Accepted Answers:
30
1 point
Which statement is true about the size of filters in CNNs?

The size of the filter does not affect the features it captures.
The size of the filter only affects the computation time.
Larger filters capture more global features.

Smaller filters capture more local features.

Yes, the answer is correct.
Score: 1
Accepted Answers:
Larger filters capture more global features.
Smaller filters capture more local features.
1 point
What is the motivation behind using multiple filters in one Convolution layer?

Reduced complexity of the network

Reduced size of the convolved image
Insufficient information

Each filter captures some feature of the image separately

Yes, the answer is correct.
Score: 1
Accepted Answers:
Each filter captures some feature of the image separately
1 point
Which of the following architectures has the highest no of layers?

AlexNet
GoogleNet
ResNet

VGG
Yes, the answer is correct.
Score: 1
Accepted Answers:
ResNet
1 point
What is the purpose of guided backpropagation in CNNs?

To train the CNN to improve its accuracy on a given task.

To reduce the size of the input images in order to speed up computation.
To visualize which pixels in an image are most important for a particular class prediction.

None of the above.

Yes, the answer is correct.
Score: 1
Accepted Answers:
To visualize which pixels in an image are most important for a particular class prediction.
1 point
Which of the following statements is true regarding the occlusion experiment in a CNN?

It is a technique used to prevent overfitting in deep learning models.

It is used to increase the number of filters in a convolutional layer.
It is used to determine the importance of each feature map in the output of the network.
It involves masking a portion of the input image with a patch of zeroes.
Partially Correct.
Score: 0.5
Accepted Answers:
It is used to determine the importance of each feature map in the output of the network.
It involves masking a portion of the input image with a patch of zeroes.

Assignment 11
Suppose that we need to develop an RNN model for sentiment classification. The input to the model is a
sentence composed of five words and the output is the sentiments (positive or negative). Assume that each
word is represented as a vector of length 70×170×1 and the output labels are one-hot encoded. Further,
the state vector stst is initialized with all zeros of size 50×150×1. How many parameters (including bias)
are there in the network?
Yes, the answer is correct.
Score: 1
Accepted Answers:
(Type: Numeric) 6152
1 point
1 point
Select the true statements about BPTT?

The gradients of Loss with respect to parameters are added across time steps
The gradients of Loss with respect to parameters are subtracted across time steps
The gradient may vanish or explode, in general, if timesteps are too large

The gradient may vanish or explode if timesteps are too small

Yes, the answer is correct.
Score: 1
Accepted Answers:
The gradients of Loss with respect to parameters are added across time steps
The gradient may vanish or explode, in general, if timesteps are too large
1 point
Select the correct statements about GRUs

GRUs have fewer parameters compared to LSTMs

GRUs use a single gate to control both input and forget mechanisms
GRUs are less effective than LSTMs in handling long-term dependencies

GRUs are a type of feedforward neural network

Yes, the answer is correct.
Score: 1
Accepted Answers:
GRUs have fewer parameters compared to LSTMs
GRUs use a single gate to control both input and forget mechanisms
1 point
The statement that LSTM and GRU solves both the problem of vanishing and exploding gradients in RNN
is

True

False
No, the answer is incorrect.
Score: 0
Accepted Answers:
False
1 point
How does LSTM prevent the problem of vanishing gradients?

Different activation functions, such as ReLU, are used instead of sigmoid in LSTM
Gradients are normalized during backpropagation
The learning rate is increased in LSTM

Forget gates regulate the flow of gradients during backpropagation

Yes, the answer is correct.
Score: 1
Accepted Answers:
Forget gates regulate the flow of gradients during backpropagation
We construct an RNN for the sentiment classification of text where a text can have positive sentiment or
negative sentiment. Suppose the dimension of one-hot encoded-words is R100×1R100×1, dimension of
state vector sisi is R50×1R50×1. What is the total number of parameters in the network? (Don’t include
biases also in the network)
No, the answer is incorrect.
Score: 0
Accepted Answers:
(Type: Range) 7599.5,7601.5
1 point
1 point
Arrange the following sequence in the order they are performed by LSTM at time step t.
[Selectively read, Selectively write, Selectively forget]

Selectively read, Selectively write, Selectively forget

Selectively write, Selectively read, Selectively forget
Selectively read, Selectively forget, Selectively write

Selectively forget, Selectively write, Selectively read

No, the answer is incorrect.
Score: 0
Accepted Answers:
Selectively read, Selectively forget, Selectively write
1 point
What is the objective(loss) function in the RNN?

Cross Entropy
Sum of cross-entropy
Squared error

Accuracy
No, the answer is incorrect.
Score: 0
Accepted Answers:
Sum of cross-entropy
1 point
Which of the following is a limitation of traditional feedforward neural networks in handling sequential data?

They can only process fixed-length input sequences

They are highly optimizable using the gradient descent methods
They can’t model temporal dependencies between sequential data

All of These
Partially Correct.
Score: 0.5
Accepted Answers:
They can only process fixed-length input sequences
They can’t model temporal dependencies between sequential data
1 point
Which of the following is true about LSTM and GRU networks?

LSTM networks have more gates than GRU networks

GRU networks have more gates than LSTM networks
LSTM and GRU networks have the same number of gates

Both LSTM and GRU networks have no gates

Yes, the answer is correct.
Score: 1
Accepted Answers:
LSTM networks have more gates than GRU networks

Assignment 12
Which of the following are benefits of using attention mechanisms in neural networks?

Improved handling of long-range dependencies

Enhanced interpretability of model predictions
Reduction in model complexity

Ability to handle variable-length input sequences

Partially Correct.
Score: 0.67
Accepted Answers:
Improved handling of long-range dependencies
Enhanced interpretability of model predictions
Ability to handle variable-length input sequences
1 point
Which of the following is a disadvantage of using an encoder-decoder model for sequence-to-sequence
tasks?

The model requires a large amount of training data

The model is slow to train and requires a lot of computational resources
The generated output sequences may be limited by the capacity of the model

The model is prone to overfitting on the training data

Yes, the answer is correct.
Score: 1
Accepted Answers:
The generated output sequences may be limited by the capacity of the model
1 point
Which of the following attention mechanisms is most commonly used in the Transformer model
architecture?

Dot product attention

Additive attention
Multiplicative attention

All of the above

Yes, the answer is correct.
Score: 1
Accepted Answers:
Dot product attention
1 point
Which scenarios would most benefit from hierarchical attention mechanisms?

Summarizing long text documents

Classifying images in a dataset
Analyzing customer reviews or feedback data

Real-time processing of sensor data

Yes, the answer is correct.
Score: 1
Accepted Answers:
Summarizing long text documents
1 point
In a hierarchical attention network, what are the two primary levels of attention?

Character-level and word-level

Word-level and sentence-level
Sentence-level and document-level

Paragraph-level and document-level

Yes, the answer is correct.
Score: 1
Accepted Answers:
Word-level and sentence-level
1 point
Which of the following is NOT a component of the attention mechanism?

Decoder
Key
Value

Encoder
Partially Correct.
Score: 0.5
Accepted Answers:
Decoder
Encoder
1 point
Which of the following is a major advantage of using an attention mechanism in an encoder-decoder
model?

Reduced computational complexity

Improved generalization to new data
Reduced risk of overfitting

None of These
Yes, the answer is correct.
Score: 1
Accepted Answers:
Improved generalization to new data
1 point
Which of the following output functions is most commonly used in the decoder of an encoder-decoder
model for translation tasks?

Sigmoid
ReLU
Softmax

Tanh
Yes, the answer is correct.
Score: 1
Accepted Answers:
Softmax
1 point
In the encoder-decoder model, what is the role of the decoder?

To generate output based on the input representations.

To encode the input
To learn the attention mechanism

None of the above

Yes, the answer is correct.
Score: 1
Accepted Answers:
To generate output based on the input representations.
1 point
We are performing a task where we generate the summary for an image using the encoder-decoder model.
Choose the correct statements.

LSTM is used as the decoder.

CNN is used as the decoder.
LSTM is used as the encoder.

None of These
Yes, the answer is correct.
Score: 1
Accepted Answers:
LSTM is used as the decoder.

Haykin, Xue-Neural Networks and Learning Machines 3ed Soln
53% (19)
Haykin, Xue-Neural Networks and Learning Machines 3ed Soln
103 pages
Nepalese Currency Recognition System
No ratings yet
Nepalese Currency Recognition System
66 pages
1.deep Learning Assignment1 Solutions 1
100% (3)
1.deep Learning Assignment1 Solutions 1
12 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
Week 1
No ratings yet
Week 1
4 pages
DL Assignment Solutions
No ratings yet
DL Assignment Solutions
64 pages
Week 1 Sol Merged
No ratings yet
Week 1 Sol Merged
39 pages
Week 1 Sol
No ratings yet
Week 1 Sol
3 pages
Week 1
No ratings yet
Week 1
5 pages
Topic 5
No ratings yet
Topic 5
32 pages
Question Bank - Deep Learning
No ratings yet
Question Bank - Deep Learning
25 pages
TT1 QBAns1
No ratings yet
TT1 QBAns1
15 pages
ML Midsem 2022
No ratings yet
ML Midsem 2022
8 pages
Unit I Introduction
No ratings yet
Unit I Introduction
55 pages
NN Theory
No ratings yet
NN Theory
138 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
Ass5 Soln
No ratings yet
Ass5 Soln
6 pages
Neural Network Questions
No ratings yet
Neural Network Questions
17 pages
DeepLearning Practice Question Answers
No ratings yet
DeepLearning Practice Question Answers
43 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
5 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
8 pages
Neural Networks (2010/11) Example Exam, December 2010
No ratings yet
Neural Networks (2010/11) Example Exam, December 2010
2 pages
Deep Learning - IIT Ropar - Unit 5 - Week 2
No ratings yet
Deep Learning - IIT Ropar - Unit 5 - Week 2
4 pages
Week 5
No ratings yet
Week 5
4 pages
Assignment Mtech
No ratings yet
Assignment Mtech
5 pages
NPTEL Live Session Week 1 Deep Learning-IIT Ropar
No ratings yet
NPTEL Live Session Week 1 Deep Learning-IIT Ropar
26 pages
DL Objectives
No ratings yet
DL Objectives
4 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
8 pages
Mod 3
No ratings yet
Mod 3
101 pages
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
No ratings yet
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
11 pages
Haykin Xue Neural Networks and Learning Machines 3ed Soln PDF
50% (2)
Haykin Xue Neural Networks and Learning Machines 3ed Soln PDF
103 pages
Perceptron
No ratings yet
Perceptron
6 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
Tasks On Neurons and ANN
No ratings yet
Tasks On Neurons and ANN
15 pages
Preceptron
No ratings yet
Preceptron
17 pages
Neural Networks Three
No ratings yet
Neural Networks Three
60 pages
20.NeuralNets Short
No ratings yet
20.NeuralNets Short
60 pages
Machine Learning Unit 2 MCQ
No ratings yet
Machine Learning Unit 2 MCQ
17 pages
Sample Final Exam Solutions
No ratings yet
Sample Final Exam Solutions
30 pages
Assignment-1 Modeling and Simulation
No ratings yet
Assignment-1 Modeling and Simulation
2 pages
Perceptron: Neuron Model (Special Form of Single Layer Feed Forward)
No ratings yet
Perceptron: Neuron Model (Special Form of Single Layer Feed Forward)
17 pages
Slide 2
No ratings yet
Slide 2
35 pages
Week 2
No ratings yet
Week 2
3 pages
t4 Sol
No ratings yet
t4 Sol
8 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages
Lecture 18. Backpropagation
No ratings yet
Lecture 18. Backpropagation
55 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Perceptron
No ratings yet
Perceptron
26 pages
Instructions For How To Solve Assignment
No ratings yet
Instructions For How To Solve Assignment
3 pages
Ch1-Fundamental of Neural Network
No ratings yet
Ch1-Fundamental of Neural Network
59 pages
Is The Data Linearly Separable?: A) Yes B) No
No ratings yet
Is The Data Linearly Separable?: A) Yes B) No
19 pages
Soft Computing MCQS
No ratings yet
Soft Computing MCQS
24 pages
ML - Lec 6 - Linear Classifiers
No ratings yet
ML - Lec 6 - Linear Classifiers
55 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
Neural Network Basics
No ratings yet
Neural Network Basics
37 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
JETIRFP06101
No ratings yet
JETIRFP06101
10 pages
Seasonal Crops Disease Prediction and Classification Using Deep Convolutional Encoder Network
No ratings yet
Seasonal Crops Disease Prediction and Classification Using Deep Convolutional Encoder Network
19 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
Final Robotics
No ratings yet
Final Robotics
54 pages
Dropout As A Bayesian Approximation: Representing Model Uncertainty in Deep Learning
No ratings yet
Dropout As A Bayesian Approximation: Representing Model Uncertainty in Deep Learning
12 pages
Chapter07 Working-With-Keras
No ratings yet
Chapter07 Working-With-Keras
12 pages
Semantic Compositionality Through Recursive Matrix-Vector Spaces
No ratings yet
Semantic Compositionality Through Recursive Matrix-Vector Spaces
11 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
Mathematics 08 01245 v2
No ratings yet
Mathematics 08 01245 v2
29 pages
Solution AI Second Internal April 2025
No ratings yet
Solution AI Second Internal April 2025
4 pages
Lab 7
No ratings yet
Lab 7
12 pages
Ex No 07
No ratings yet
Ex No 07
2 pages
Conformal Prediction
No ratings yet
Conformal Prediction
51 pages
Cs230exam Win21
No ratings yet
Cs230exam Win21
21 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Introduction To Cross Entropy Loss
No ratings yet
Introduction To Cross Entropy Loss
13 pages
Linear Classifiers in Python: Chapter3
100% (1)
Linear Classifiers in Python: Chapter3
19 pages
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
No ratings yet
Basics of Deep Learning: Pierre-Marc Jodoin and Christian Desrosiers
183 pages
An Interpretable Deep Learning Method For Bearing Fault Diagnosis
No ratings yet
An Interpretable Deep Learning Method For Bearing Fault Diagnosis
27 pages
Building Makemore Part 5 - Building A WaveNet
No ratings yet
Building Makemore Part 5 - Building A WaveNet
26 pages
Long-Tail Learning Via Logit Adjustment
No ratings yet
Long-Tail Learning Via Logit Adjustment
27 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
Hyperspectral Image Classification Based On Deep Attention Graph Convolutional Network
No ratings yet
Hyperspectral Image Classification Based On Deep Attention Graph Convolutional Network
16 pages
6982 A Study On Relu and Softmax In-2
No ratings yet
6982 A Study On Relu and Softmax In-2
11 pages
Advanced Materials - 2018 - Hu - Memristor Based Analog Computation and Neural Network Classification With A Dot Product
No ratings yet
Advanced Materials - 2018 - Hu - Memristor Based Analog Computation and Neural Network Classification With A Dot Product
10 pages
CS 89 31 Final Project Background Info
No ratings yet
CS 89 31 Final Project Background Info
16 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Algorithmic Language Models With Neurally Compiled Libraries
No ratings yet
Algorithmic Language Models With Neurally Compiled Libraries
12 pages