Transformer Language Model Tutorial - Incorrect Attention Mask Description

The tutorial [Language Modeling With nn.Transformer and Torchtext](https://pytorch.org/tutorials/beginner/transformer_tutorial.html) describes an attention mask that will prevent `nn.TransformerEncoder` from attending to not-yet-seen tokens:

> Along with the input sequence, a square attention mask is required because the self-attention layers in nn.TransformerEncoder are only allowed to attend the earlier positions in the sequence.

I think the mask is actually upper triangular: a square mask would not prevent a token attending to future token. 

I may have misunderstood the mask description; if not, happy to write the PR that fixes this.



cc @pytorch/team-text-core @Nayef211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions