0% found this document useful (0 votes)
21 views

Module 1 Lecture 1

NLP notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Module 1 Lecture 1

NLP notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Module #1

CSE 6002:
Natural Language Processing Techniques
Recap from the Previous Lecture
• Course Handout Discussion
• Project Groups

2
What is NLP?
● Branch of Artificial Intelligence

● Combination of linguistics (understanding how languages work) and


computer science (building systems to solve natural language-
related problems).

3
Stages of Processing
● Phonetics and phonology
● Morphology
● Lexical Analysis
● Syntactic Analysis
● Semantic Analysis
● Pragmatics and Discourse

4
Challenges Associated with Phonetics /
Speech
● Homophones - Words that sound same / similar.
○ After Mahatma Gandhi was killed by Godse, India was mourning. However,
that did not stop some kids playing in the evening in a park. Someone once
asked them “Why are you playing? It is mourning time now.” To which one
of the kids said “Sir. It is not morning time, but it is evening, and we have
just finished our homework!”
● Word boundary - Where to split the words in speech
○ I got a plate.
○ I got up late.
● Disfluency - ah, um, etc…

5
Morphology
● Word formation from root words and morphemes
○ Eg. singular - plural (teacher + s = teachers), gender (lion + ess = lioness),
tense (listen + ing = listening), etc.
● First step in NLP - extract the morphemes of the given word
● Languages rich in morphology - Dravidian languages (Eg. Kannada,
Tamil, Telugu, etc.)
○ Example: Maadidhanu - Maadu (root verb) + past tense + male singular
● Languages poor in morphology - English
○ Example: Did - Do (root verb) + past tense

6
Lexical Analysis
● Words have different meanings.
● Meanings have different words.
Example:
● Where there’s a will…
● There are many relatives

7
Lexical Disambiguation
● Part of Speech disambiguation.
○ Love (is it a verb (I love to eat sushi) or a noun (God’s love is so wonderful)?)

● Sense disambiguation.
○ Bank (I went to the bank on the river to buy fish. vs. I went to the bank on
the river to withdraw Rs. 1000)

8
Syntactic Analysis
• Consider the sentence “I like mangoes” S

VP
NP

NP
V
N

N
I like

mangoes 9
Syntactic Analysis
● S -> NP VP
● NP -> N
● VP -> V NP
● N -> Noun (mangoes) / Pronoun (I)
● VP -> Verb (like)

10
Ambiguity in Parsing
● Natural Language Ambiguity:
I saw a boy with a telescope.
(Who has the telescope?)
● Design Ambiguity:
I saw a boy with a telescope which I dropped. Vs. I saw a boy with a
telescope which he dropped.
(Will the same parse tree be generated using probability?)

11
Semantic Analysis
● Semantic Analysis involves assigning semantic roles to entities in the
text.
Example: John gave the book to Mary.
Agent: John, Recipient: Mary, Object / Theme: the book, etc.
● Semantic ambiguity:
Example: Visiting people involves lot of work.

12
Pragmatics and Discourse
● Study of contexts in which language is used.
○ Example: Coreference Resolution.
● Very hard problem. Requires successful (or satisfactory) solutions of
previous problems.
● Disambiguation clues need not be present within the same
sentence, but can be present anywhere in the text!

13
History of NLP

14
The Imitation Game

● Identify the machine / human participant.

15
ELIZA

● Turing Test inspired the creation of ELIZA (1966)

16
Georgetown Experiment

● Demonstration of machine translation performed by IBM at Georgetown


University in 1954.
● Involved translation of 60+ sentences from Russian to English.
● Consisted of 6 grammar rules and 250 lexical items (stems + endings)
● Initially, it led to lots of research money to be used by governments for
research in MT and NLP. However, real progress was much slower!

17
ALPAC Report

● ALPAC (Automatic Language Processing Advisory Committee) was


formed to evaluate the progress in NLP in general and MT in particular in
1964.
● It published the ALPAC report in 1966, was sceptical of the research
done in the previous decade, and led to large funding cuts for
computational linguistics.

18
Syntactic Structures and Conceptual Dependency Theory

● In 1957, Noam Chomsky came up with Syntactic Structures, which


revolutionized linguistics and grammar.
○ Chomsky used phrase structure rules to generate new sentences.
○ Gave examples of grammatically correct sentences without any meaning.
Example: “Colourless green ideas sleep furiously”
○ Advocated a separation of syntax from semantics

● In 1969, Roger Schank introduced conceptual dependency theory, which


is used for natural language understanding.

19
From Rules to Data

● Starting from the 1980s, we have seen a movement from using rule-
based NLP systems to statistical systems due to the presence of data.
● With data, we can use probability theory to build reasonably robust
systems for language modeling, machine translation, etc.
○ Example: Which one is correct in each pair and why?
■ I saw an elephant. Vs. I saw an equipment.
■ An European war is currently going on. Vs. A European war is currently going on.
■ Tell me something. Vs. Say me something.
● All this is possible because of probability.

20
Example of Machine Translation

● Earlier approach - Rule-based Machine Translation


● Linguists would create multiple rules in source language and target
language.
● People would use dictionaries to map to root words, morphemes, etc.
● Limited in scope. Could not account for many challenges in MT.

21
Example of Machine Translation

● More modern approach - Phrase-based Machine Translation


● Uses a parallel corpus, where sentences in the source language are
mapped to their equivalent in the target language.
● From this, phrases (or n-grams) are mapped from the source to the
target language.
● Example: India’s Prime Minister (EN) <-> Bhaarth ka Pradhaan Mantri
(HI)
● Used data to maximize alignment probability, and language modeling to
get correct target language sentences.

22
Research Activities in NLP

23
Organizations for Research in NLP

● Association for Computational Linguistics (ACL)


● International Committee on Computational Linguistics (ICCL)
● European Language Resource Association (ELRA)
● NLP Association of India (NLP AI)

24
Publication Fora in NLP - Journals

● Computational Linguistics (CL)


● Transactions of the Association for Computational Linguistics (TACL)
● Natural Language Engineering (NLE)
● ACM Transactions on Asian and Low-Resource Language and Information
Processing (TALLIP)
● Language Resources and Evaluation Journal (LREJ)
● ……………

25
Publication Fora in NLP - Conferences

● CORE A* rated conference


○ ACL - Annual Meeting of the Association for Computational Linguistics
● CORE A rated conferences
○ EMNLP - SIGDAT Conference on Empirical Methods in Natural Language Processing
○ NAACL - Conference of the North American Chapter of the Association for Computational Linguistics
○ EACL - Conference of the European Chapter of the Association for Computational Linguistics
○ COLING - ICCL Conference on Computational Linguistics
○ CoNLL - Conference on Natural Language Learning
○ AACL - Conference of the Asia Pacific Chapter of the Association for Computational Linguistics
● CORE B rated conferences
○ IJCNLP - International Joint Conference on Natural Language Processing
○ SIGDIAL - Annual Meeting of the Special Interest Group on Discourse and Dialog
● CORE C rated conferences
○ LREC - Language Resources and Evaluation Conference
● Unrated conferences (but still good venues)
○ ICON - International Conference on Natural Language Processing

26
Publication Fora in NLP - Workshops

● Focussed on research topic / research area


● Examples:
○ WMT - Workshop on Machine Translation
○ BEA - Workshop on NLP for Building Educational Applications
○ SIGBioMed - SIGBioMed Workshop on Biomedical Language Processing
○ BUCC - Workshop on Building and Using Comparable Corpora
○ FEVER - Workshop on Fact Extraction and Verification
○ CMCL - Workshop on Cognitive Modeling and Computational Linguistics
○ ……………

27
Research Areas in NLP (from ACL 2022 CFP)
● Dialogue and Interactive Systems
● Discourse and Pragmatics
● Ethics and NLP
● Generation
● Information Retrieval and Text Mining
● Interpretability and Analysis of Models for NLP
● Linguistic Theories, Cognitive Modeling, and Psycholinguistics
● Machine Translation and Multilinguality
● Multimodality
● NLP Applications
● Phonology, Morphology, and Word Segmentation
● Question Answering
● Resources and Evaluation
● Semantics
● Sentiment Analysis, Stylistic Analysis, and Argument Mining
● Speech
● Summarization
● Tagging, Chunking and Parsing

28
Where to find research in NLP

1. ACL Anthology - https://aclanthology.org/


a. Best place to look for NLP research.
b. Has indexed information based on author, collaborators, etc. especially with
respect to NLP
2. arXiv - https://arxiv.org/
a. Consists of lots of pre-prints (i.e. unpublished work)
b. Anyone can upload research on arXiv
c. Great way to disseminate research
3. Google Scholar - https://scholar.google.com/
a. Searches for arXiv, ACL anthology, and other sources (IJCAI, AAAI, NeurIPS, ACM
Digital library, other journals, etc.)

29

You might also like