System Paradigms in NLP
System Paradigms in NLP
4. *Chapter Goal:* Because of the complexity and data challenges, the chapter aims to
provide a *perspective* on the main *dimensions* (key aspects or viewpoints) used to
tackle the problem of *semantic interpretation* (understanding the meaning of
language).
5. *Chapter Scope:* It's *impossible to cover everything. So, while mentioning some
older (historic) approaches, the chapter will **focus* on the *most common (prevalent),
effective (successful), and practical approaches* – those suitable for real-world
applications.
1
6. *Structure:* These focused-upon approaches will be organized into *three main
categories*, which the rest of the section will likely detail.
is important to get a perspective on the various primary dimensions on which the problem of semantic
"The problem of semantic interpretation": This just means the challenge of figuring out
the meaning of language (words, sentences, etc.). How do we make computers understand what
text means?
"has been tackled": This means how researchers and engineers have tried to solve this
problem.
"various primary dimensions": This refers to the main,
fundamental ways or categories used to approach the problem. Think of "dimensions" like
different angles or aspects to consider (like the System Architecture, Scope, and Coverage
mentioned in your notes). "Primary" just means the most important ones.
"It is important to get a perspective": This means it's valuable to understand these
different main categories or approaches. It gives you a good overview and helps you see the
bigger picture of how people try to make computers understand language meaning.
In simpler terms:
"To understand how people have tried to make computers understand language meaning, it's crucial to know the
main different types of approaches they have used."
1. Introduction
Context: When we build NLP systems, especially those dealing with understanding
meaning (semantic interpretation) across different languages or domains, we face
challenges. Often, perfect hand-labeled data isn't available for every situation we want
to test.
Goal: To understand the main ways (paradigms) researchers and engineers approach
building these semantic interpretation systems. We need a perspective on the primary
dimensions used to classify these approaches.
Focus: While many historical methods exist, we will focus on the more common and
successful approaches suitable for practical applications.
2
Key Concept: The provided text categorizes these system paradigms along three main
dimensions: System Architecture, Scope, and Coverage.
This dimension describes the fundamental mechanism or learning strategy the system
uses to solve an NLP problem. How does the system acquire and use knowledge?
o (a) Knowledge-Based Systems:
Definition: These systems rely on a predefined set of rules or
a knowledge base (like an ontology or database of facts).
How they work: To solve a new problem instance, the system applies
these existing rules or queries its knowledge base.
Example (Inferred): A system that identifies company names in text
based on a predefined list of known companies and rules like "Capitalized
word sequence followed by 'Inc.' or 'Corp.'". An early grammar checker
using hardcoded grammatical rules.
Key Idea: Relies on explicit human-encoded knowledge.
o (b) Unsupervised Systems:
Definition: These systems are designed to function with minimal human
intervention.
How they work: They leverage existing resources (like large amounts of
raw text) and often use techniques that can be bootstrapped – meaning
they start with something small or basic and build up from there
automatically. They discover patterns directly from data without explicit
labels.
Example (Inferred): Topic modeling algorithms (like LDA) that find
themes in a collection of news articles without being told what the themes
are beforehand. Clustering similar documents together based on word
usage.
3
Key Idea: Learning from unlabeled data.
o (c) Supervised Systems:
Definition: These systems require manual annotation (labeling) of a
specific phenomenon (e.g., sentiment, named entities) in a sufficient
quantity of data.
How they work:
1. Annotation: Humans label examples (e.g., marking emails as
"spam" or "not spam").
2. Feature Engineering: Researchers typically define feature
functions to convert each problem instance (e.g., an email) into a
numerical representation (space of features). Features might include
word counts, presence of specific keywords, etc.
3. Model Training: A Machine Learning (ML) model is trained
using these features and labels. The model learns to predict
labels based on the features.
4. Application: The trained model is then applied to unseen data (new
emails) to make predictions.
Example (Inferred): Training a sentiment classifier on movie reviews
labeled as "positive" or "negative". The features might be word counts, and
the model learns weights for words associated with each sentiment.
Key Idea: Learning from labeled examples.
o (d) Semi-Supervised Systems:
Definition: Addresses the challenges of supervised learning: manual
annotation is expensive and often doesn't provide enough data to
fully capture a phenomenon.
How they work: These approaches automatically expand the training
dataset:
4
Method 1: Using machine-generated output directly (e.g., using a
model trained on a small labeled set to label a larger unlabeled set,
then adding these machine-labeled examples to the training data).
Method 2: Bootstrapping off an existing model by having
humans correct its output on new data (a form of active learning or
human-in-the-loop).
Application: Often used to quickly adapt a model trained on one
domain (e.g., news articles) to a new domain (e.g., social media
posts) using a small amount of labeled data from the new domain
combined with larger amounts of unlabeled data.
Example (Inferred): Having a small set of customer emails labeled for
urgency. Train a model. Use this model to predict urgency on 10,000
unlabeled emails. Take the model's most confident predictions, add them to
the training set, and retrain. Or, have the model flag emails it's unsure
about for human review and correction.
Key Idea: Combining a small amount of labeled data with a large amount
of unlabeled data.
3. Dimension 2: Scope
5
Key Idea: Specialization leads to potentially higher performance within
the domain but poor performance outside it.
o (b) Domain-Independent Systems:
Definition: These systems are general enough that their techniques can be
applied to multiple domains with little or no change.
Characteristics: They use methods that don't rely heavily on domain-
specific knowledge.
Example (Inferred): A general part-of-speech tagger designed to work
reasonably well on news text, emails, and web pages. A generic text
summarization algorithm.
Key Idea: Generality allows broad applicability, potentially at the cost of
peak performance in any single specific domain compared to a specialized
system.
4. Dimension 3: Coverage
6
Key Idea: Produces partial or intermediate analysis.
o (b) Deep Systems:
Definition: These systems usually create a terminal representation.
Characteristics: The output representation is intended to be directly
consumed by a machine or application to perform an action or make a
decision. It captures a more complete or actionable meaning.
Example (Inferred): A system that converts the natural language query
"Show me flights from London to Paris tomorrow" into a formal SQL
query that can be directly executed on a database. A system producing a
full logical form or semantic graph representing the sentence's meaning for
a reasoning engine.
Key Idea: Produces a complete, actionable meaning representation.
We have explored three key dimensions for categorizing NLP system paradigms based
on the provided text:
o System Architecture: How the system learns/works (Knowledge-based,
Unsupervised, Supervised, Semi-supervised).
o Scope: How broadly applicable it is (Domain-dependent vs. Domain-
independent).
o Coverage: How deep or complete the output representation is (Shallow vs.
Deep).
Important Note for Students: In practice, real-world NLP systems often represent
a combination of these characteristics. For instance, you might have
a supervised, domain-dependent, shallow system (like a sentiment classifier for product
reviews). Understanding these dimensions helps us analyze, compare, and design NLP
systems effectively, especially when considering trade-offs related to data availability,
cost, desired performance, and application requirements.
7
(Self-Correction/Clarification): Be aware that the term "System Paradigms" can
sometimes be used differently in other NLP contexts (e.g., specifically
for multilingual system architectures like language-specific vs. language-independent
models). The classification presented here (Architecture, Scope, Coverage) is the
specific framework provided in this text source.