Skip to content

Transformer tutorial multiplying with sqrt(d_model) #2849

Closed
@RogerJL

Description

@RogerJL

return self.embedding(tokens.long()) * math.sqrt(self.emb_size)

src = self.embedding(src) * math.sqrt(self.d_model)

shouln't this be

src = self.embedding(src) / math.sqrt(self.d_model)

at least that is the impression I got when reading the "Attention is all you need" paper.
Or is there some new research finding that multiplying is better?

cc @sekyondaMeta @svekars @kit1980 @subramen @albanD

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions