Open In App

Overview of Word Embedding using Embeddings from Language Models (ELMo)

Last Updated : 21 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Word embeddings enable models to interpret text by converting words into numerical vectors. Traditional methods like Word2Vec and GloVe generate fixed embeddings, assigning the same vector to a word regardless of its context.

ELMo (Embeddings from Language Models) addresses this limitation by producing contextualized embeddings that vary based on surrounding words. This approach allows models to better capture the meaning of words in different contexts, improving performance in tasks like sentiment analysis, named entity recognition and question answering.

ELMo

ELMo (Embeddings from Language Models) generates word vectors by considering the entire sentence. Unlike traditional methods, ELMo derives word meanings from the internal states of a deep bi-directional LSTM network trained as a language model. Its Key characteristics are:

  • Context-aware: Word meaning changes with context.
  • Deep Representations: Uses multiple layers from the language model.
  • Pre-trained + Task-specific: ELMo embeddings are integrated into downstream models and fine-tuned accordingly.

Working of ELMo

1. Pre-training Phase

A bidirectional language model (biLM) is trained on a large text corpus. The model uses two separate LSTMs:

  • The forward LSTM reads the sentence from left to right and predicts the next word.
  • The backward LSTM reads from right to left and predicts the previous word.
Bidirectional-Recurrent-Neural-Network-2
biLM architecture

For each word, the model captures contextual information from both directions. The hidden states from the forward and backward LSTMs are summed up to form a contextualized embedding. These embeddings vary depending on the word’s role in the sentence. ELMo also combines outputs from multiple LSTM layers, capturing both syntactic and semantic patterns.

2. Task-specific Integration

Once trained, the biLM is used to generate embeddings for specific NLP tasks.

  • ELMo embeddings are added to the input of a downstream model, such as a classifier.
  • biLM can be either frozen to preserve general knowledge or fine-tuned on the specific task to improve performance.
  • The downstream model learns to use these embeddings for improved predictions.

This phase allows ELMo to be applied to tasks like named entity recognition, sentiment analysis and text classification where understanding context is crucial.

Real-World Examples

Consider the word "bank" in two different contexts:

  • "She deposited money in the bank." \rightarrow financial institution
  • "He sat by the bank of the river." \rightarrow river edge

Static embeddings would assign the same vector to both, failing to capture the difference. ELMo generates context-dependent vectors which correctly differentiates between these meanings. It adapts based on sentence-level context, providing more accurate representations.

Implementation of ELMo Embeddings

We can implement ELMo embeddings using TensorFlow and TensorFlow Hub. Here is a step-by-step guide with explanations at each stage.

Step 1: Install Required Libraries

Tensorflow is used for building and running deep learning models and tensorflow_hub allows us to load pretrained models such as ELMo. You can install it using:

pip install tensorflow tensorflow_hub

Step 2: Import Libraries and Load ELMo

  • We import TensorFlow and TensorFlow Hub to access the model.
  • We then load the ELMo model from TensorFlow Hub using its URL.
  • The model outputs 1024-dimensional embeddings for each token.
Python
import tensorflow as tf
import tensorflow_hub as hub

elmo = hub.load("https://tfhub.dev/google/elmo/3")

Step 3: Define an Embedding Function

We define a function that:

  • Takes a list of input sentences.
  • Passes them to the ELMo model.
  • Returns a tensor of contextualized word embeddings.
Python
def get_elmo_embedding(sentences):
    embeddings = elmo.signatures["default"](tf.constant(sentences))["elmo"]
    return embeddings

Step 4: Generate Embeddings from Sample Sentences

  • We create a list of sample sentences that include ambiguous words like "bank".
  • We call the get_elmo_embedding() function to generate the embeddings.
  • The result is a 3D tensor with shape (batch_size, max_seq_length, 1024).
Python
sentences = [
    "The bank will approve your loan.",
    "He sat by the bank of the river."
]

embeddings = get_elmo_embedding(sentences)
print(embeddings)

Output:

ELMo-O1
Embeddings matrix

We can see that our model is working fine.

Limitations

  • Ambiguous Contexts: Some sentences may not provide enough information for accurate disambiguation. For example, "The bank was full of fish." could still confuse the model.
  • Computational Overhead: ELMo requires more memory and processing due to biLSTM layers, which can be a constraint in real-time applications.
  • Pretraining Dependency: Performance heavily depends on the quality and size of the pretraining corpus.

Applications of ELMo Embeddings

ELMo significantly improves performance across a variety of NLP tasks:

  • Sentiment Analysis: Detects emotions in text with context-aware understanding.
  • Named Entity Recognition (NER): Identifies names of people, places and organizations more accurately.
  • Question Answering: Helps locate contextually relevant answers in large documents.
  • Text Classification: Enhances accuracy in spam detection, topic classification and intent analysis.
  • Semantic Similarity: Measures context-specific similarity between phrases or documents.

Comparison with Other Models

FeatureWord2Vec / GloVeELMoBERT / RoBERTa
ContextualStatic word representationContextualized based on sentence contextContextualized based on full input
ArchitectureShallow neural networksBidirectional LSTM language modelTransformer-based
Training ObjectiveWord co-occurrence predictionForward and backward language modelingMasked language modeling
Model ComplexityLowModerateHigh
Fine-tuning Not designed for fine-tuningSupports task-specific fine-tuningDesigned for fine-tuning

ELMo introduced the idea of context in word meanings and still influences modern NLP although it has been surpassed by transformer-based models like BERT and RoBERTa in recent years.


Practice Tags :

Similar Reads