0% found this document useful (0 votes)
35 views1 page

BERT

BERT uses attention mechanisms in Transformers to learn contextual relationships between words in text bidirectionally. It takes inputs of word embeddings, positional embeddings to capture word order, and segment embeddings to differentiate sentences. The encoder reads the entire text at once while the decoder generates predictions.

Uploaded by

Narender Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views1 page

BERT

BERT uses attention mechanisms in Transformers to learn contextual relationships between words in text bidirectionally. It takes inputs of word embeddings, positional embeddings to capture word order, and segment embeddings to differentiate sentences. The encoder reads the entire text at once while the decoder generates predictions.

Uploaded by

Narender Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

BERT uses Transformers (attention layers technique) that learns contextual

relations and meaning between words in a text. the basic transformer contains two
separate mechanisms, one is an encoder that reads the text input and a decoder that
creates output(prediction).

directional models read the text in a specific direction, (left to right or right
to left). Transformers encoder reads all the text at once, so we can say
transformers are nondirectional. this property allows transformers to learn the
context of words by taking surrounding words in any direction.

BERT data-input is a combination of 3 embeddings depending on the task we are


performing :

Position Embeddings: BERT learns the position/location of words in a sentence via


positional embeddings. This embedding helps BERT to capture the ‘order’ or
‘sequence’ information of a given sentence.

Segment Embeddings: (Optional Embedding) BERT takes sentence pairs as inputs for
(Question-Answering) tasks. BERT learns a unique embedding for the first and the
second sentences to help the model differentiate between them.

Token Embeddings: Token embedding basically contains all the information of input
text. it is an integer number specified for each unique word token.

You might also like