0% found this document useful (0 votes)

16 views16 pages

WS 2021 Solutions

The document outlines the structure and rules for an exam on Deep Learning at the Technical University of Munich, including attendance check procedures and exam submission guidelines. It consists of multiple-choice questions and short answer questions covering various topics in deep learning, such as data augmentation, neural network architectures, and training techniques. The exam is designed to assess students' understanding of deep learning concepts and their application in practical scenarios.

Uploaded by

aleksanderpiciga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

WS 2021 Solutions

Uploaded by

aleksanderpiciga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Chair of Visual Computing

Department of Informatics
Technical University of Munich

Note:
• During the attendance check a sticker containing a unique code will be put on this
Esolution exam.
Place student sticker here • This code contains a unique number that associates this exam with your registration
number.
• This number is printed both next to the code and to the signature field in the
attendance check list.

Introduction to Deep Learning

Exam: IN2346 / endterm Date: Tuesday 8th February, 2022

Examiner: Prof. Dr. Matthias Nießner Time: 15:00 – 11:30

• The blackened exam has the same layout as the non-blackened exam with the acutal questions,
which is going to be released once the working time starts.

• Only submit your personalized blackened exam. DO NOT submit the non-blackened/non-
personalized exam (clearly indicated with “DO NOT SCAN/UPLOAD”).

• This final exam consists of 16 pages with a total of 7 problems.

Please make sure now that you received a complete copy of the exam.

• The total amount of achievable credits in this simulation is 90 credits.

• No additional resources are allowed.

– Page 1 / 16 –
Problem 1 Multiple Choice (18 credits)

Mark correct answers with a cross ×

To undo a cross, completely fill out the answer option
To re-mark an option, use a human-readable marking ×
Please note:

• For all multiple choice questions any number of answers, i.e. either zero (!), one or multiple answers
can be correct.

• For each question, you’ll receive 2 points if all boxes are answered correctly (i.e. correct answers
are checked, wrong answers are not checked) and 0 otherwise.

1.1 You are training a network to classify images of handwritten digits in the range of [0,...,9] on the
MNIST dataset. Which of the following data augmentation techniques are suitable to use for this task?

× Add Gaussian noise to the images

Vertically flip the images

× Rotation of the images by 10 degrees

× Change the contrast of the images
1.2 What is true about Residual Blocks?
Reduce the number of computations in the forward pass

× Act as a highway for gradient flow

× Enable a more stable training of larger networks
Act as a regularizer

1.3 For a fully-convolutional 2D CNN, if we double the spatial dimensions of input images, ...
... the number of network parameters doubles

× ... the number of network parameters stays the same

... the receptive field of an arbitrary pixel in an intermediate activation map can decrease
√
... the dropout coefficient p must be corrected to p in test time

1.4 What is true about Generative Adversarial Networks?

× The Generator minimizes the probability that the Discriminator is correct
The Generator provides supervision for the Discriminator

× The Discriminator acts as a classifier

The Discriminator samples from a latent space

– Page 2 / 16 –
1.5 Given input x , which of the following statements are always true? Note: For dropout, assume the
same set of neurons are chosen.
BatchNorm(ReLU(x)) ≡ ReLU(BatchNorm(x))

× Dropout(ReLU(x)) ≡ ReLU(Dropout(x))
× MaxPool(ReLU(x)) ≡ ReLU(MaxPool(x))
ReLU(Sigmoid(x)) ≡ Sigmoid(ReLU(x))

1.6 When you are using a deep CNN to train a semantic segmentation model, which of the following can
be chosen to help with overfitting issues?
Decrease the weight decay parameter

× Increase the probability of switching off neurons in dropout

× Apply random Gaussian noise to the input images
Remove parts of the validation set

1.7 In terms of (full-batch) gradient descent (GD) and (mini-batch) stochastic gradient descent (SGD),
which of the following statements are true?
The computed gradient of the loss w.r.t model parameters in SGD is equal to the computed gradient
in GD
× The expected gradient of the loss w.r.t model parameters in SGD is equal to the expected gradient
in GD over the same images

× There exists some batch size, for which the gradient of the loss w.r.t model parameters in SGD is
equal to the gradient in GD
SGD and GD will converge to the same model parameters, but SGD requires less memory at the
expense of more iterations

1.8 What is true about batch normalization assuming your train and test set are sampled from the same
distribution?
Batch normalization cannot not be used together with dropout

× Batch normalization makes the gradients more stable, so we can train deeper networks
At test time, Batch normalization uses a mean and variance computed on test set samples to
normalize the data
× Batch normalization has learnable parameters
1.9 What is true for common architectures like VGG-16 or LeNet? (check all that apply)

× The number of filters tends to increase as we go deeper into the network

The width and height of the activation maps tends to increase as we go deeper into the network

The input can be an image of any size as long as its width and height are equal

× They follow the paradigm: Conv → Pool ... → Conv → Pool → FC ... → FC
(Conv = Conv + activation)

– Page 3 / 16 –
Problem 2 Short Questions (18 credits)

0 2.1 In k -fold cross validation, choosing a larger value for k increases our confidence in the validation
score. What could be a practical disadvantage in doing so? Explain how it arises.
1

2 (1p) Increases training time or more computations. (1p) Making use of more folds, will present the
model with more data to train on, but will require way more time as it has to train and validate K
separate times. (0p) Overfitting. (0p) High variance. (1p) Less data in validation set.

0 2.2 Consider the activation function f : R → R and f (x) = ln(1 + e x ) .

Which one of the following activation functions is most closely approximated by f ? Briefly justify your
1
answer (2 points). What is the benefit of f over the activation function it closely approximates (2 points)?
2
• Tanh
3 • ReLU
4 • Sigmoid

(1p) ReLU.(1p) ReLU is the only function which is unbounded up or any other valid explanation why
ReLU or correct drawing of softplus showing similarity to ReLU. (0p) Positive output since it is also
valid for Sigmoid.
One of both benefits is sufficient: (2p) Unlike ReLU, this function (softplus) is smooth everywhere in
R, so its differentiable everywhere in R. ReLU is not differentiable at 0. (2p) Softplus does not have
a dead area for negative inputs or any explanation related to dead ReLU.
(0p) If not ReLU.

0 2.3 Explain the difference between the validation set and the test set. In your answer, explain the role of
each subset and how they are used differently.
1

2 Validation set is used for testing generalization (0.5p) with different hyperparameters/ hyperpa-
rameter tuning (0.5p). Test set is only used at the end/ Not touched during training (0.5p) to
test generalization on unseen data once (0.5p). For each missing keyword -0.5p.

0 2.4 You notice vanishing/exploding gradients in a deep network using the tanh activation function.
Suggest two possible changes you can make to the network in order to diminish this issue, without
1
changing the number of trainable parameters. Explain how each of these changes helps.
2
(0.5p) For naming a correct change. (0.5p) For correct explanation. (1p) Use ReLU activation,
does not saturate, large consistent gradients. (1p) Add residual connections, highway for gradient
flow, can learn to skip layers. (1p) Xavier initialization, improved weight initialization targets active
area of the activation function. (0p) Gradient clipping (does not resolve vanishing gradients). (0p)
BatchNorm. (0p) Regularization.

– Page 4 / 16 –
2.5 Can two consecutive dropout layers with probabilities q and p be replaced with one dropout 0
operation? Explain.
1

(1p) Yes. (0p) No. (1p) Correct explanation. Neurons zero out independently, so dropout layer with 2
pq . If dropout layer one already zeros out neurons those cant be considered by
probability p + q − 2pq
second layer anymore. Droping probability of pq.

2.6 Can one encounter overfitting in an unsupervised learning setting? If your answer is no, provide a 0
mathematical reasoning. If your answer is yes, provide an example.
1

(1p) Yes. (0p) No. (2p) Valid example: clustering N datapoints with N clusters (k -means with k = N ), 2
autoencoder with large bottleneck/ overfitted on one image, PCA with many components. (0.5p or 3
1p) For mentioning an unsupervised algorithm with vague explanation.

State the name of

2.7 For each of the following functions, describe one common problem when choosing them as the 0
activation function for your deep neural network: (a) Sigmoid, (b) ReLU, (c) Identity
1

(1p) Sigmoid (not zero-centered or saturates), (1p) Identity (does not introduce non-linearity), (1p) 2
ReLU (dead RelU or not zero-centered) 3

– Page 5 / 16 –
Problem 3 Autoencoder (11 credits)
Consider a given unlabeled image dataset consisting of 10 distinct classes of animals.

0 3.1 To train an Autoencoder on images, which type of losses you would use? Name two suitable losses.
1
(1p each) Image reconstruction losses: L1, L2, SSIM, PSNR, MSE .... (-1p) For L2 together with
2 MSE. (0p) For CE, Hinge, BCE, KL-divergence First two named losses count.

0 3.2 Explain the effect of choosing a bottleneck dimension which is too small, and the effect of a too large
bottleneck dimension in Autoencoders.
1

2 (1p) Bottleneck dimension too small leads to poor reconstruction/underfitting, or loss of important
information/ too much compression. (1p) Bottleneck dimension too big leads to no compres-
sion/overfitting/learning identity.

0 3.3 Having trained an Autoencoder on this dataset, how would you use the trained Autoencoder (without
further training/fine-tuning) to partition the dataset into 10 subsets, where each subset consists only of
1
images of a distinct type of animal?
2
(1p) Use the trained encoder to get latent embedding for each unlabeled image. (1p) Do clustering
(e.g k-means with k = 10). Assign each image to it’s cluster centre. (0p) Adding FC layers. (0p)
Using full autoencoder as feature extractor. (0p) Only mentioning clustering.

– Page 6 / 16 –
3.4 We want to use the same network architecture for de-noising and colorizing old, degraded, gray-scale 0
images of animals. Given the dataset you already have, explain the steps you would take to train your
1
model. In your answer, elaborate on your model’s inputs, outputs, and losses.
2

3
(1p) Augment input image by adding noise. (1p) Transform input images by converting to grayscale.
(1p) Use original image as target, use L1/L2 as loss. (0.5p) Name correct loss. (0.5p) Loss between
original RGB images and output of the network. (0p) Only loss name. (0p) Proposing another
architecture.

3.5 Explain the differences between Autoencoders and Variational Autoencoders. How do they differ 0
during training?
1

(2p) Variational autoencoders will constrain the bottleneck distribution into a probability distribution 2
but autoencoders don’t constrain the latent space. (1p) Having a constraint in the latent space. (1p)
Sampling from the latent space. (0.5p) KL - divergence. (-0.5p) Autoencoders generate images.

– Page 7 / 16 –
Problem 4 CNNs (10 credits)
You are given the following network that classifies RGB images into one of 4 classes.
All Conv2d layers use kernel = 3 , padding = 1 , stride = 1 , bias = True and are defined as
Conv2d(< channelsin >, < channelsout >) .
All MaxPool2d layers use stride = 2 , padding = 0 , and are defined as MaxPool(< kernel >) .
The input dimension x of the Linear layer is unknown.
The network’s architecture is as follows:

• Conv2d(3, 8) → MaxPool2d(2) → BatchNorm2d() → ReLU() →

• Conv2d(8, 16) → MaxPool2d(2) → BatchNorm2d() → ReLU() →

• Conv2d(16, 32) → MaxPool2d(2) → BatchNorm2d() → ReLU() →

• Flatten() →

• Linear( x , 4) → Softmax()

0 4.1 In terms of x , what is the total number of trainable parameters of the last linear layer? Include a bias
term in your calculation.
1

2 4x + 4 = 4(x + 1).
(1p) Matrix is shape 4 × x (1p) plus 4 bias terms.
(-1p) Weight or bias wrong/missing.

0 4.2 Given RGB input images of size 80 × 80 pixels, what should the value of x in the Linear layer be?
Explain your calculation.
1

2 (1p) Each conv2d preserves spatial dimensions. Each maxpool reduces spatial dimensions by 2.
Height and width take shape 80 / 2 / 2 / 2 = 10 at linear layer. Depth is 32 as given by final conv2d.
(1p) x = 10 × 10 × 32 = 3200
(0.5p) Same convolution. (0.5p) Maxpool halves spatial dimension. (0.5p) Correct concept:
channels x input x output.

– Page 8 / 16 –
4.3 Explain the main difference between the usage of a BatchNorm layer in a convolutional network in 0
comparison to a fully connected network.
1

(2p) Normalization acts on channel dimension instead of per feature/different channels normaliza- 2
tion/statistics. (1p) Only CNN or only FC. (0p) Over batch/all samples. (0p) Normalize weights. (0p)
Normalize each pixel. (0p) Normalize input data.

4.4 Compute the total number of trainable parameters of the first convolutional layer, Conv2d(3,8). 0

1
(2p) 3 × 3 × 3 × 8 + 8 = 216 + 8 = 224
k × k × channelsin × Nfilters + bias 2
(0.5p) Weights wrong, bias correct. (-1p) Bias missing/wrong. (-0.5p) Correct answer, additionally
specified batchnorm.

4.5 Compute the total number of trainable parameters in all of the BatchNorm layers. 0

1
(2p) Each BatchNorm2d layer has two weights per channel. The number of channels it has is
given by the output of the preceding conv2d layer. Therefore # trainable BatchNorm weights = 2
2 ∗ 8 + 2 ∗ 16 + 2 ∗ 32 = 2 ∗ (56) = 112 weights. (0.5p) Only 2 + 2 + 2 without channels. (1.5p) Correct
expression, final answer wrong.

– Page 9 / 16 –
Problem 5 Optimization and Gradients (16 credits)
You are training a large fully-connected neural network and select as an initial choice an SGD optimizer.
In order to overcome the limitations of SGD, your colleague suggests adding momentum.

0 5.1 Name two limitations of SGD that momentum can potentially solve. Explain how momentum solves
them.
1

2 1) limitations: slow learning / small steps, Can’t escape local minima,

Saddle points SGD is noisy, SGD only has
one lr for all dimensions (1p each, 2p max)
3
2) explanation: speeds up learning if gradient keeps pointing in the same direction, keeps
Helps avoiding
direction saddle points,
of gradient to get out of local minimum, adjusts lr down if oscillating over local minimum,
exponentially weighted moving average reduces noise (0.5p each, 1p max)

0 5.2 One can apply momentum, as shown in the formula:

1 ν k +1 = β · ν k − α · ∇θ L (θk )
2
What do the hyperparameters α and β represent?

1in alpha = learning rate (1pt) beta = accumulation rate of velocity/friction (1pt) momentum (0.5pt),
only accumulation rate (0.5pt)

0 5.3 How does Nesterov Momentum differ from standard momentum? Explain.
1
Demonstrates understanding of Nestrov momentum but no further insights (only 0.5pt). A step in
2 direction of previous momentum/accumulated gradient (only gradient is not enough) (1pt). Gradient
term computed from position calculated with previous gradient i.e. look ahead step (1pt). Gradient
corrects potential overshooting of momentum already in the same step (1pt).
Common mistakes: formulas without explaining them, not mentioning that the “jump” is calculated
using accumulated gradients / previous momentum

0 5.4 Is RMSProp considered a first or second order method (1p)? What is the main difference between
RMSProp and SGD+Momentum?
1

2 First order (1pt) Explanation: RMSProp dampens oscillation /exponentially decaying average of
3 variance/ uses second moment (1pt) SGD + Momentum accumulates gradient / uses first
moment (1pt)

– Page 10 / 16 –
For the following questions, consider the convex optimization objective:

min x 2
x ∈R

5.5 What is the optimal solution of this optimization problem? 0

1
x ∗ = 0 (1p)

5.6 You are working with an initialization of x0 = 5 and a learning rate of lr = 1 . How many iterations 0
would gradient descent (without momentum) need in order to converge to the optimal solution? Explain.
1

Won’t convergence/ infinite iterations (0.5pt). Explanation as overshoot/ oscillate (0.5pt)

5.7 Assuming you instead start with a random initialization of x0 , how could you speed up the convergence 0
of the gradient descent optimizer (without adding momentum) in this case?
1

Reduce lr/ adaptive lr/ dynamic lr/ any form of lr decay or gradient clipping or line search to get lr
(1p). only change lr/ adjust lr/ play with lr/ suitable lr (0.5p)
Common mistakes: xavier initialization, second order method, adam

5.8 What is the main advantage of using a second order method such as Newton’s Method? Why are 0
second order methods not used often in practice for training deep neural networks?
1

Advantages: less iterations (1pt) if only mentioned “converge faster” without specifying in terms of 2
iterations (0.5pt), only 1 step (0.5p), no need to choose learning rate.
Drawbacks: Hessian costly to compute, Second order methods don’t work well with mini-batches
(1pt each; 1pt max)

5.9 How many iterations would Newton’s method need to converge (using the same initialization x0 = 0
5, lr = 1 )? Explain.
1

Only takes 1 iteration(0.5pt). Jumps to minimum right away / convex problem / 2nd order taylor
approximation exactly approximate quadratic problem / calculation that it converges after one step
(0.5pt).
Common mistakes: uses second derivative, uses hessian instead of lr

– Page 11 / 16 –
Problem 6 Derivatives (9 credits)
Consider the formula of the Sigmoid function σ (x) : R → R :
1
σ (x) =
1 + e −x

d σ (x)
0 6.1 Compute the derivative dx in terms of x .
1
d σ 0 · 1 + e −x − 1 · −e −x e −x

= =
dx (1 + e −x )2 (1 + e −x )2
correct intermediate step (0.5p) & correct final answer (1p)

0 6.2 A special property of this function is that its derivative can be expressed in terms of the Sigmoid
function itself. Denote y = σ (x) , and show how the derivative you computed can be re-written in terms of
1
y , the output of the Sigmoid function. Hint: Your answer should only depend on y .

dy
= y(1 − y)
dx
1
y=
(1 + e −x )

1 + e −x 1 e −x
1−y = − =
(1 + e −x ) (1 + e −x ) (1 + e −x )
Final correct answer (1p). Wrong answer with some correct derivation (0.5p).

– Page 12 / 16 –
An affine Layer is described by z = XW + b .
Consider the following affine layer, which has 2 input neurons and 1 output neuron:
" #
1
W=
2
2×1

b = 2 ∈ R1
and input:
" #
1 1
X=
0 −1
2×2

The forward pass of the network would be:

" #" # " # " # " # " #
1 1 1 3 2 5 1
σ (z) = σ (XW + b) = σ ( + 2) = σ ( + ) = σ( )= (rounded up).
0 −1 2 −2 2 0 0.5

Let’s compute the backward pass of the network. Assume L(z) = sum(z)
" #
1
6.3 If y = σ (z) = , calculate the gradient of the output after the Sigmoid activation function 0
0.5
dy 1
w.r.t z , dz :
2
dLy
∂ 1
dL/dz = dL/dy * dy/dz = = y ◦ (1 − y) x 3
∂z
" # " #! " 1 #
1 1 1 0
x ( ◦ 1− =
0.5 0.5 0.25
1
writing the derivative correctly (element-wise multiplication and NOT matrix multiplication) (1p),
correct intermediate calculation (dimensions are correct) (1p), correct final answer (1p)

6.4 We will use the computed gradient to perform back-propagation through the affine layer to the 0
network’s parameters.
1
Let dout be the upstream derivative of the Sigmoid that you have calculated in question 3. Calculate the
dy
derivatives dW and ddby . 2
Hint: Pay attention to the shapes of the results; they should be compatible for a gradient update.
3
Note: In case you skipped the previous question, you can get partial points by writing the correct formulas
using dout symbolically. 4

#"" # " #
T 1 0 0 0
dW = X · dout = =
1 −1 0.25 −0.25
" #
h i 0
db → sum(dout, axis = 0) = 1 1 = 0.25
0.25

dW :(2p) db :(2p). For case, chain rule (0.5p) writing the matrices correctly (e.g XT ∗ dout ) (1p),
correct answer (0.5p). if missed the correct answer by 1/n (-0.5p)

– Page 13 / 16 –
Problem 7 Model Evaluation (8 credits)
Two students, Erika and Max train a neural network for the task of image classification. They use a
dataset which is divided into train and validation sets. They each train their own network for 25 epochs.

0 7.1 Erika selects a model and obtains the following curves. Interpret the model’s behaviour from the
curves. Then, suggest what could Erika do in order to improve its performance?
1

Figure 7.1: Training curves for Erika’s model.

(0.5p) Overfitting , (0.5p) regularization (Dropout, l1, l2 weight decay, data augmentation), early
stopping and reducing capacity.
Common mistakes: Stop training early. No mention of stopping training early based on validation
error.

0 7.2 Max selects a different model and obtains the following curves. Interpret the model’s behaviour
from the curves. Then, suggest what change could Max make to his model in order to improve its
1
performance?
2

Figure 7.2: Training curves for Max’s model.

Underfitting (0.5p), Increase model capacity (1.5p). OR Optimization is not optimal (0.5p), Decrease
learning rate / learning rate decay / use optimizer that corrects a bad learning rate choice (e.g
Adam) / BN (1.5p)
Common mistakes: Just describing what the graphs do. Generalization gap, add regularization

– Page 14 / 16 –
7.3 Both Max and Erika are able to agree on a model architecture and obtain the following curves. 0
However, when deployed in real world, their model seems to perform poorly. What is a possible reason
1
for such an observation and what should they do?
2

Figure 7.3: Training curves for the new model.

Possible reasoning: test and train/val data is sampled from different distributions / domain gap.(1p)
Fix by trying to make test and train data more similar or from same distribution / augmentation /
add another dataset to train / (1p).
Common mistakes: change the test set. Shuffle train and val dataset to train again. If train and
val come from the same distribution (given in the question) shuffling will not help. Overfitting to “val”
data.

After adapting the new network architecture, Max and Erika are training their own model, using the same
architecture, with identical initial weights, using exactly the same hyperparameters. They also use the
same SGD optimizer (no momentum), batch size, and learning rates. The only difference is that Max
normalizes the loss by 1/N (where N is the number of training samples in the dataset) while Erika does
not.

7.4 How does this affect the optimal model weights that minimize this optimization objective? (1p) After 0
10 optimizer steps, will they arrive at the same model parameters? Explain.(2p)
1

It doesn’t affect the optimal model’s optima (1p) The weights will be different (0.5p) after 10 steps 2
(0.5p). A good explanation why (lr is scaled) (1p) Contradictory / Unclear explanation (-0.5p) 3
Common mistakes: 1/ N would make it independent of the size of the dataset. Irrelevant.

– Page 15 / 16 –
Additional space for solutions–clearly mark the (sub)problem your answers are related to and
strike out invalid solutions.

– Page 16 / 16 –

Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Solutions To Deep Learning
No ratings yet
Solutions To Deep Learning
25 pages
Midpaper
No ratings yet
Midpaper
16 pages
Intro To Deep Learning Final Exam IT3320E HUST
No ratings yet
Intro To Deep Learning Final Exam IT3320E HUST
8 pages
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
CS230 Midterm Fall 2022
No ratings yet
CS230 Midterm Fall 2022
14 pages
CS 182 Practice Midterm Questions
No ratings yet
CS 182 Practice Midterm Questions
8 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Deep Learning MCQ Previous Year MCQ
100% (1)
Deep Learning MCQ Previous Year MCQ
11 pages
Question Bank
No ratings yet
Question Bank
14 pages
Genai See
No ratings yet
Genai See
51 pages
Aiml Solved Answers For QP
No ratings yet
Aiml Solved Answers For QP
39 pages
Cs230exam spr21 Soln
No ratings yet
Cs230exam spr21 Soln
21 pages
SS 2020
No ratings yet
SS 2020
21 pages
SS 2021 Solutions
No ratings yet
SS 2021 Solutions
16 pages
CS5242 Neural Networks and Deep Learning: Quiz 1
No ratings yet
CS5242 Neural Networks and Deep Learning: Quiz 1
2 pages
Is The Data Linearly Separable?: A) Yes B) No
No ratings yet
Is The Data Linearly Separable?: A) Yes B) No
19 pages
SS 2021
No ratings yet
SS 2021
16 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
No ratings yet
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
34 pages
APS360H1 20231 631682452284APS360 Midterm Winter 2023
No ratings yet
APS360H1 20231 631682452284APS360 Midterm Winter 2023
16 pages
WS 2021
No ratings yet
WS 2021
16 pages
2024 Exam2 Solution
No ratings yet
2024 Exam2 Solution
11 pages
Cs230exam spr21
No ratings yet
Cs230exam spr21
16 pages
Quiz AI2
No ratings yet
Quiz AI2
11 pages
19CSE456 - VI Sem May 2022
No ratings yet
19CSE456 - VI Sem May 2022
6 pages
Module 2
No ratings yet
Module 2
13 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
Cs230exam Win19 Soln
No ratings yet
Cs230exam Win19 Soln
29 pages
MT1SP19
No ratings yet
MT1SP19
13 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
DSE 3151 25 Sep 2023
No ratings yet
DSE 3151 25 Sep 2023
9 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
Week 7
No ratings yet
Week 7
7 pages
MT1 SP19 Solutions
No ratings yet
MT1 SP19 Solutions
14 pages
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
No ratings yet
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
29 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
Exam DL 2023
No ratings yet
Exam DL 2023
4 pages
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
No ratings yet
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
5 pages
R23 III-I CSE (AI) Computer Vision and Image Processing Question Bank
No ratings yet
R23 III-I CSE (AI) Computer Vision and Image Processing Question Bank
27 pages
Minor 1 - DNN
No ratings yet
Minor 1 - DNN
2 pages
10 Improving Deep Neural Networks Hyperparameter Tuning, Regularization
No ratings yet
10 Improving Deep Neural Networks Hyperparameter Tuning, Regularization
6 pages
Quiz Sol
No ratings yet
Quiz Sol
4 pages
Machine Learning - Info 4122 - 2023
No ratings yet
Machine Learning - Info 4122 - 2023
4 pages
Deep Learning Exam With Answers
No ratings yet
Deep Learning Exam With Answers
4 pages
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
No ratings yet
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
6 pages
Week 8
No ratings yet
Week 8
4 pages
MCQs DL Mid I R20 2023 With Answers
No ratings yet
MCQs DL Mid I R20 2023 With Answers
3 pages
Deep Learning - Average Learner Problems
No ratings yet
Deep Learning - Average Learner Problems
3 pages
Advance Computer Vision 4
No ratings yet
Advance Computer Vision 4
4 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
DL - Midterm - Fall23
No ratings yet
DL - Midterm - Fall23
2 pages
Agentic AI
No ratings yet
Agentic AI
83 pages
Thesis Brain Tumor Detection
100% (3)
Thesis Brain Tumor Detection
6 pages
Report
No ratings yet
Report
21 pages
Generative AI Notes
100% (1)
Generative AI Notes
3 pages
Chap 3.1 Embedding in Tensorflow
No ratings yet
Chap 3.1 Embedding in Tensorflow
23 pages
Anomaly Detection in Iot Systems Using Unsupervised Learning
No ratings yet
Anomaly Detection in Iot Systems Using Unsupervised Learning
42 pages
Classification of Multi-Spectral Data With Fine-Tuning Variants of Representative Models
No ratings yet
Classification of Multi-Spectral Data With Fine-Tuning Variants of Representative Models
23 pages
Machine Translation, Auto Encoders and Decoders
No ratings yet
Machine Translation, Auto Encoders and Decoders
29 pages
ScGen Predicts Single-Cell Perturbation Responses
No ratings yet
ScGen Predicts Single-Cell Perturbation Responses
11 pages
A Comprehensive Survey On Artificial Intelligence and Machine Learning Techniques
No ratings yet
A Comprehensive Survey On Artificial Intelligence and Machine Learning Techniques
7 pages
Ai-Augmented Security Models For Software Development: Leveraging Machine Learning For Threat Detection and Mitigation
No ratings yet
Ai-Augmented Security Models For Software Development: Leveraging Machine Learning For Threat Detection and Mitigation
11 pages
Deepfacedrawing: Deep Generation of Face Images From Sketches
No ratings yet
Deepfacedrawing: Deep Generation of Face Images From Sketches
16 pages
1 s2.0 S0952197623016998 Main
No ratings yet
1 s2.0 S0952197623016998 Main
21 pages
8 Generative AI
No ratings yet
8 Generative AI
36 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
28 pages
Sysid24 0033 MS
No ratings yet
Sysid24 0033 MS
7 pages
Deepfake Technology and Current Legal Status of It: Min Liu, Xijin Zhang
No ratings yet
Deepfake Technology and Current Legal Status of It: Min Liu, Xijin Zhang
7 pages
ITR Report 200968108
No ratings yet
ITR Report 200968108
21 pages
Summary of Research Papers
No ratings yet
Summary of Research Papers
8 pages
A Deep Community Detection Approach in Real Time Networks
No ratings yet
A Deep Community Detection Approach in Real Time Networks
14 pages
Www-Pyimagesearch-Com-2020-02-24-Denoising-Autoencoders-With-Keras-Tensorflow-An (1-18)
No ratings yet
Www-Pyimagesearch-Com-2020-02-24-Denoising-Autoencoders-With-Keras-Tensorflow-An (1-18)
28 pages
17.feature-Based Distant Domain Transfer Learning
No ratings yet
17.feature-Based Distant Domain Transfer Learning
8 pages
Deep Learning I. Introduction (: 1. The History and The Development of Deep Learning
No ratings yet
Deep Learning I. Introduction (: 1. The History and The Development of Deep Learning
21 pages
A General and Adaptive Robust Loss Function: Jonathan T. Barron Google Research
No ratings yet
A General and Adaptive Robust Loss Function: Jonathan T. Barron Google Research
19 pages
Deep Learning For Sensor-Based Activity Recognition: A Survey
No ratings yet
Deep Learning For Sensor-Based Activity Recognition: A Survey
10 pages
CT 1 QP NNDL
No ratings yet
CT 1 QP NNDL
2 pages
Open Source Anomaly Detection in Python
No ratings yet
Open Source Anomaly Detection in Python
4 pages
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet