0% found this document useful (0 votes)
8 views42 pages

AnandKumar Course Intro IT356

Uploaded by

Simhadri Sevitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views42 pages

AnandKumar Course Intro IT356

Uploaded by

Simhadri Sevitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

IT356 Natural Language Processing

Dr. Anand Kumar M,


Department of Information
Technology
National Institute of
Technology- Karnataka
COURSE OUTCOMES
• To understand the fundamentals of NLP and Research
issues with solution approaches.
• To analyze the patterns and features in the text
automatically using natural language processing
(NLP) concepts including POS tagging, and parsing.
• To implement and evaluate the NLP applications
using machine learning techniques
• To design a NLP product for real-time applications in
various domains using the current approaches of NLP.

2
Course Plan:
• Introductory concepts of Linguistic systems, Language Modeling
and Sequence tagging, Word stemming, tokenization,
normalization, Part of Speech tagging, Traditional models of
distributional semantics,
• Unstructured Text Management, Word and Sentence embeddings,
n-gram models, Maximum Entropy models, Hidden Markov
Models, Viterbi Algorithm, Neural Language Models;
• Information Extraction, Named Entity Recognition, Relation
Extraction; Understanding Semantics, word sense and word
similarity, Lesk Algorithm, Wordnets, Topic Modeling, Dialog
Systems,
• Emerging trends, Research issues, challenges, interesting
applications in various domains.
3
Texts and References:
•Texts and References:
∙ Daniel Jurafsky and James H. Martin. "Speech and Language Processing: An Introduction to
Natural Language Processing, Computational Linguistics and Speech Recognition". Second
Edition. Prentice Hall, 2008
∙ Christopher D. Manning and Hinrich Schütze, "Foundations of Statistical Natural Language
Processing" MIT Press, 1999
∙ Turney, Peter D., and Patrick Pantel. "From frequency to meaning: Vector space models of
semantics." Journal of artificial intelligence research 37 (2010): 141-188.

∙ IMPORTANT NOTE:
1. Course Mini / Minor Project Proposal - Aug 16th, 2023
2. Mid Sem Project Progress Presentation - Sep 11th, 2023
3. Final Project Presentation and Demo - Oct 30th, 2023

Analysis of Algorithms 4
Books etc.
• Main Text(s):
– Speech and NLP: Jurafsky and Martin
– Foundations of Statistical NLP: Manning and Schutze

• Journals
– Computational Linguistics, Natural Language Engineering, AI,
AI Magazine, IEEE SMC, TALIP, Computer Speech and
Language
• Conferences
– ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT,
ICON, SIGIR, WWW, ICML, ECML *ACL, FIRE, SPELLL
Assessment Type and COs

Assessment Type Course Outcomes (COs)


CO1 CO2 CO3 CO4
Mid Sem Theory Exam X X
End Sem Theory Exam X X
Lab Continuous
Evaluations - Assignments
X X X
Mini / Minor Project X X

Analysis of Algorithms 6
Evaluation Plan
• Course Mini / Minor Project: 30%
• Continuous Evaluation & Assignments: 20%
• Mid Sem Exam: 20%
• End Sem Exam: 30%

• Conf/Journal Publication (Bonus Marks)


• Shared Task Competition/Hackathon etc

Analysis of Algorithms 7
Course Minor Project (30%)
• IEEE/ACM Reputed Journals as base papers
• Core Conferences/ Shared Tasks
• Implementation
– Title/Topic/Team Proposal(5)
– Midsem Eval (10)
– End sem Eval (15)
• Plagiarism free Report –Not AI generated
• Conf/Journal Publication (Bonus Marks)
Collaboration Works
• Winnipeg University, Canada
• University of Galway, Ireland
• Legal Summarization– NIT Trichy
• Telugu NLP - NIT-AP / IIIT-Hyd
• LLMs-Eduminster US
• Conversation System -ISRO
Some open Topics
• Finance NLP
• Medical Documents-ClinicalNLP
• Education Documents – NLP
• Social Media Comments –Depression –Mental
Well being
• Legal Documents – Ontology – Document
Retrieval
• LLMs-Llama 2 –ChatGPT
• Conversation System –QA - Chatbot
10
Some open Topics
• Sign Language Translation
• Financial Document Causality Detection”
• Multimodal Argument Mining
• Violence Inciting Text Detection
• Multi-lingual Multi-task Information
Retrieval
• Ontology based Senticnet

11
12
NLP with AI and Deep learning
https://
marutitech.c
om/use-
cases-of-
natural-
language-
processing-
in-
healthcare/
NLP in education
• Innovative Education Applications
• Educational Chatbots
• Automatic Essay/answer Grading – Quality
assessment
• Automatic Question/ Exercise generation
• Behavior analytics.
NLP for Finance an Agriculture
• Sentiment Analysis – Stock Prediction
• Chatbots for Financial/Invesment suggestions
• Chatbots for Farmers (Regional Languages)
• Discovering crop disease trends using farmer
queries
• Terminology Extraction for Document
Matching and Open Data in Agricultural
Domain:
Sub domains
Bio-NLP
• Open Problems

https://towardsdatascience.com/summarising-the-latest-research-on-coronavirus-with-nlp-
and-topic-modelling-28b867ad9860
The NLP Research Community

• Papers
– ACL Anthology has nearly everything, free!
• Over 60,000 papers!
• Free-text searchable
– Great way to learn about current research on a topic
– New search interfaces currently available in beta
» Find recent or highly cited work; follow citations
• Used as a dataset by various projects
– Analyzing the text of the papers (e.g., parsing it)
– Extracting a graph of papers, authors, and institutions
(Who wrote what? Who works where? What cites what?)
The NLP Research Community

• Conferences
– Most work in NLP is published as 9-page conference papers
with 3 double-blind reviewers.
– Main annual conferences: ACL, EMNLP, NAACL
• Also EACL, IJCNLP, COLING … and LREC!
• + various specialized conferences and workshops
– Big events, and growing fast! ACL 2020:
• > 2000 attendees
• 2244 full-length papers submitted (25% accepted)
• 1185 short papers submitted (18% accepted)
• 19 workshops on various topics
• “Best paper” awards – worth reading these papers
The NLP Research Community

• Datasets
– Raw text or speech corpora
• Or just their n-gram counts, for super-big corpora
• Various languages and genres
• Usually there’s some metadata (each document’s date, author, etc.)
• Sometimes  licensing restrictions (proprietary or copyright data)
– Text or speech with manual or automatic annotations
• What kind of annotations? That’s the rest of this lecture …
• May include translations into other languages
– Words and their relationships
• Morphological, semantic, translational, evolutionary
– Grammars
– World Atlas of Linguistic Structures
– Parameters of statistical models (e.g., grammar weights)
The NLP Research Community

• Datasets
– Read papers to find out what datasets others are using
• Linguistic Data Consortium (searchable) hosts many large datasets
• Many projects and competitions post data on their websites
• But sometimes you have to email the author for a copy
– CORPORA mailing list is also good place to ask around
– LREC Conference publishes papers about new datasets & metrics
– Amazon Mechanical Turk – pay humans (very cheaply) to annotate your
data or to correct automatic annotations
• Old task, new domain: Annotate parses etc. on your kind of data
• New task: Annotate something new that you want your system to find
• Auxiliary task: Annotate something new that your system may benefit from
finding (e.g., annotate subjunctive mood to improve translation)
– Can you make annotation so much fun or so worthwhile
that they’ll do it for free?
Thank You

43

You might also like