ClassTest1 DeepLearning
ClassTest1 DeepLearning
A) Multi-head attention
C) Positional encoding
D) Convolutional layers
What is the primary advantage of
the transformer architecture over RNNs?
C) Machine Translation
D) Both A and B
What is the purpose of the [CLS] token in BERT?
A) ignore them
D) Both A and B
What is the primary purpose of self-attention in transformer models?
B) To speed up training