0% found this document useful (0 votes)
19 views16 pages

Lecture 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views16 pages

Lecture 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Natural Language Processing

CS 6320
Lecture 1
Introduction to NLP

Instructor: Sanda Harabagiu

1
Definition
• NLP is concerned with the computational techniques used for
processing human language. It creates and implements computer
models for the purpose of performing various natural language tasks.
• These tasks include :
• Mundane applications, e.g. word counting, spell checking, automatic
hyphenation
• Cutting edge applications, e.g. automated question answering on the Web,
building NL interfaces to databases, machine translation, and others.
• What distinguished these applications from other data processing
applications is their use of knowledge of language.

• NLP is playing an increasing role in curbing the information explosion


on Internet and corporate America.
AI vs. NLP
• People refer to many AI techniques – like Chat GPT,
which are in fact novel NLP methods using GPT3 in an
interactive mode.
• GPT4 and GPT3 are Large Language Models (LLMs)
• LLMs have recently revolutionized NLP
• BERT was another important NLP milestone
• NLP nowadays uses Deep Learning techniques
But, the understanding of how language works, and
what aspects of natural language processing
we need to be aware of still requires the comprehension
of classical NLP techniques
Related areas
• NLP is a difficult, and largely unsolved problem. One reason for this
is its multidisciplinary nature:

• Linguistics : How words, phrases, and sentences are


formed.

• Psycholinguistics : How people understand and


communicate using human language.

• Cognitive Modeling: Deals with models and computational


aspects of NL (e.g. algorithms).
Related areas
• Philosophy: relates to the semantics of language; notion of
meaning, how words identify objects. NLP requires considerable
knowledge about the world.

• Computer science: model formulation and implementation using


modern methods.

• Artificial intelligence: issues related to knowledge representation


and reasoning.

• Statistics: many NLP problems are modeled using probabilistic


models.

• Machine learning: automatic learning of rules and procedures


based on lexical, syntactic and semantic features.

• NL Engineering: implementation of large, realistic systems.


Modern software development methods play an important role.
Applications of NLP
• Text - based applications:
• Finding documents on certain topics (document classification)
• Information extraction: extract information related events, relations,
concepts
• Complete understanding of texts: requires a deep structure analysis,
• Reading comprehension
Translation from a language to another,

Summarization,

Knowledge acquisition,

Question-Answering

• Dialogue - based applications (involve human - machine
communication):
• Conversational Agents
• Tutoring systems
• Problem solving.
• Speech processing
Basic levels of language processing 1/2
1. Phonetic - how words are related to the sounds that realize
them. Essential for speech processing.
2. Morphological Knowledge - how words are constructed : e.g
friend, friendly, unfriendly, friendliness.
3. Syntactic Knowledge - how words can be put together to form
correct sentences, and the role each word plays in the sentence.
e.g.:
John ate the cake.
1. Semantic Knowledge - Words and sentence meaning:
They saw a log.
They saw a log yesterday.
He saws a log.
Basic levels of language processing 2/2
5. Pragmatic Knowledge- how sentences are used in different
situations(or contexts).
Mary grabbed her umbrella.
a) It is a cloudy day.
b) She was afraid of dogs.
5. Discourse Knowledge - how the meaning of words and
sentences is effected by the proceeding sentences; pronoun
resolution.
John gave his bike to Bill.
He didn't care much for it anyway.
5. World Knowledge - the vast amount of knowledge necessary
to understand texts. Used to identify beliefs, goals.
6. Language generation - have the machine generate coherent
text or speech.
Examples of NLP difficulties
1. Syntactic ambiguity- when a word has more than one
part of speech:
Example: Rice flies like sand.
Note that these syntactic ambiguities lead to different parse
structures. Sometimes it is possible to use grammar rules
(like subject verb agreement) to disambiguate:
Flying planes are dangerous.
Flying planes is dangerous.
2. Semantic ambiguity- when a word has more than one
possible meaning (or sense):
John killed the wolf.
John killed the project.
John killed that bottle of wine.
John killed Jane. (at tennis , or murdered her)
More Examples of NLP difficulties
3. Structural ambiguity- when a sentence has more than one
possible parse structure; e.g. prepositional attachment .
• Example:
John saw the boy in the park with a telescope.
Another syntactic parse
Additional NLP difficulties

Ambiguities of a sentence:

Example:
I made her duck.

Possible interpretations:

1. I cooked waterfowl for her.


2. I cooked waterfowl belonging to her.
3. I created the (plaster ?) duck she owns.
4. I caused her to quickly lower her head or body
5. I wave my magic wand and turned her into
undifferentiated waterfowl.
State of the art in NLP Research 1/2

• NLP Publications :
• Association of Computational Linguistics (ACL):
• Conferences: ACL, HLT-NAACL, EACL, EMNLP
• Journals: Computational Linguistics, TACL
• AAAI - every year proceedings.
• IJCAI - every year proceedings.
• The Web Conference
• On the WWWeb: http://aclweb.org

• Natural Language Engineering (journal).


State of the art in NLP Research 2/2
• Machine Readable Dictionaries (MRD) WordNet,
LDOCE.
• Large corpora:
• Penn Treebank—contains 2-3 months of Wall Street
Journal articles (~ .5 million words of English, POS
tagged and parsed),
• Brown corpus,
• SemCor.
• Google GiGaword
• Neural Language Processing
Neural Language Learning
• Nowadays, it is the “de facto” way of doing NLP
➢ BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding
❑https://www.aclweb.org/anthology/N19-1423.pdf
o BERT is designed to pretrain deep bidirectional
representations from unlabeled text by jointly conditioning
on both left and right context in all layers. As a result, the
pre-trained BERT model can be finetuned with just one
additional output layer to create state-of-the-art models for
a wide range of tasks, such as question answering and
language inference, without substantial task-specific
architecture modifications.
Evaluation of NLP systems
➢ The General Language Understanding Evaluation
(GLUE) benchmark (Wang et al., 2018a) is a
collection of diverse natural language understanding
tasks.
o https://www.aclweb.org/anthology/W18-5446.pdf
o TAKE-HOME LESSON: In order to build neural
language processing systems, we rely on vast
annotated dataset.
➢ It is IMPOSSIBLE to build NLP systems without
looking and deeply understanding the texts.
➢ ANNOTATION experience is KEY!!!

You might also like