Approaches To Natural Language Processing
Approaches To Natural Language Processing
Approaches To Natural Language Processing
1. DISTRIBUTIONAL APPROACHES
Distributional approaches include the large-scale statistical tactics of machine
learning and deep learning. These methods typically turn content into word vectors
for mathematical analysis and perform quite well at tasks such as part-of-speech
tagging (is this a noun or a verb?), dependency parsing (does this part of a sentence
modify another part?), and semantic relatedness (are these different words used in
similar ways?). These NLP tasks don’t rely on understanding the meaning of words,
but rather on the relationship between words themselves.
Such systems are broad, flexible, and scalable. They can be applied widely to
different types of text without the need for hand-engineered features or expert-
encoded domain knowledge. The downside is that they lack true understanding of
real-world semantics and pragmatics. Comparing words to other words, or words to
sentences, or sentences to sentences can all result in different outcomes.
Semantic similarity, for example, does not mean synonymy. A nearest neighbor
calculation may even deem antonyms as related:
Advanced modern neural network models, such as the end-to-end attentional
memory networks pioneered by Facebook or the joint multi-task model invented by
Salesforce can handle simple question and answering tasks, but are still in early pilot
stages for consumer and enterprise use cases. Thus far, Facebook has only publicly
shown that a neural network trained on an absurdly simplified version of The Lord of
The Rings can figure out where the elusive One Ring is located.
Although distributional methods achieve breadth, they cannot handle depth.
Complex and nuanced questions that rely linguistic sophistication and contextual
world knowledge have yet to be answered satisfactorily.
2. FRAME-BASED APPROACH
“A frame is a data-structure for representing a stereotyped situation,” explains Marvin
Minsky in his seminal 1974 paper called “A Framework For Representing
Knowledge.” Think of frames as a canonical representation for which specifics can
be interchanged.
Liang provides the example of a commercial transaction as a frame. In such
situations, you typically have a seller, a buyers, goods being exchanged, and an
exchange price.
Sentences that are syntactically different but semantically identical – such as
“Cynthia sold Bob the bike for $200” and “Bob bought the bike for $200 from
Cynthia” – can be fit into the same frame. Parsing then entails first identifying the
frame being used, then populating the specific frame parameters – i.e. Cynthia,
$200.
The obvious downside of frames is that they require supervision. In some domains,
an expert must create them, which limits the scope of frame-based approaches.
Frames are also necessarily incomplete. Sentences such as “Cynthia visited the bike
shop yesterday” and “Cynthia bought the cheapest bike” cannot be adequately
analyzed with the frame we defined above.
3. MODEL-THEORETICAL APPROACH
The third category of semantic analysis falls under the model-theoretical approach.
To understand this approach, we’ll introduce two important linguistic concepts:
“model theory” and “compositionality”.
Model theory refers to the idea that sentences refer to the world, as in the case with
grounded language (i.e. the block is blue). In compositionality, meanings of the parts
of a sentence can be combined to deduce the whole meaning.
Liang compares this approach to turning language into computer programs. To
determine the answer to the query “what is the largest city in Europe by population”,
you first have to identify the concepts of “city” and “Europe” and funnel down your
search space to cities contained in Europe. Then you would need to sort the
population numbers for each city you’ve shortlisted so far and return the maximum of
this value.
To execute the sentence “Remind me to buy milk after my last meeting on Monday”
requires similar composition breakdown and recombination.
Models vary from needing heavy-handed supervision by experts to light supervision
from average humans on Mechanical Turk. The advantages of model-based
methods include full-world representation, rich semantics, and end-to-end
processing, which enable such approaches to answer difficult and nuanced search
queries.
The major con is that the applications are heavily limited in scope due to the need for
hand-engineered features. Applications of model-theoretic approaches to NLU
generally start from the easiest, most contained use cases and advance from there.
The holy grail of NLU is both breadth and depth, but in practice you need to trade off
between them. Distributional methods have scale and breadth, but shallow
understanding. Model-theoretical methods are labor-intensive and narrow in scope.
Frame-based methods lie in between.
4. INTERACTIVE LEARNING
Paul Grice, a British philosopher of language, described language as a cooperative
game between speaker and listener. Liang is inclined to agree. He believes that a
viable approach to tackling both breadth and depth in language learning is to employ
dynamic, interactive environments where humans teach computers gradually. In
such approaches, the pragmatic needs of language inform the development.
The worst players who take the longest to train the computer often employ
inconsistent terminology or illogical steps.
Liang’s bet is that such approaches would enable computers to solve NLP and NLU
problems end-to-end without explicit models. “Language is intrinsically interactive,”
he adds. “How do we represent knowledge, context, memory? Maybe we shouldn’t
be focused on creating better models, but rather better environments for interactive
learning.”
CONCLUSION
Language is both logical and emotional. We use words to describe both math and
poetry. Accommodating the wide range of our expressions in NLP and NLU
applications may entail combining the approaches outlined above, ranging from the
distributional / breadth-focused methods to model-based systems to interactive
learning environments. We may also need to re-think our approaches entirely, using
interactive human-computer based cooperative learning rather than researcher-
driven models.