perhaps there is a misprint at line 40

instead of # self-attention layers in ``nn.TransformerEncoder`` are only allowed to attend,
# self-attention layers in ``nn.TransformerDecoder`` are only allowed to attend.....
Decoder rather than Encoder


cc @svekars @carljparker