instead of # self-attention layers in ``nn.TransformerEncoder`` are only allowed to attend, # self-attention layers in ``nn.TransformerDecoder`` are only allowed to attend..... Decoder rather than Encoder cc @svekars @carljparker