AnandKumar Course Intro IT356

Uploaded by

Simhadri Sevitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views42 pages

AnandKumar Course Intro IT356

Uploaded by

Simhadri Sevitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

IT356 Natural Language Processing

Dr. Anand Kumar M,

Department of Information
Technology
National Institute of
Technology- Karnataka
COURSE OUTCOMES
• To understand the fundamentals of NLP and Research
issues with solution approaches.
• To analyze the patterns and features in the text
automatically using natural language processing
(NLP) concepts including POS tagging, and parsing.
• To implement and evaluate the NLP applications
using machine learning techniques
• To design a NLP product for real-time applications in
various domains using the current approaches of NLP.

2
Course Plan:
• Introductory concepts of Linguistic systems, Language Modeling
and Sequence tagging, Word stemming, tokenization,
normalization, Part of Speech tagging, Traditional models of
distributional semantics,
• Unstructured Text Management, Word and Sentence embeddings,
n-gram models, Maximum Entropy models, Hidden Markov
Models, Viterbi Algorithm, Neural Language Models;
• Information Extraction, Named Entity Recognition, Relation
Extraction; Understanding Semantics, word sense and word
similarity, Lesk Algorithm, Wordnets, Topic Modeling, Dialog
Systems,
• Emerging trends, Research issues, challenges, interesting
applications in various domains.
3
Texts and References:
•Texts and References:
∙ Daniel Jurafsky and James H. Martin. "Speech and Language Processing: An Introduction to
Natural Language Processing, Computational Linguistics and Speech Recognition". Second
Edition. Prentice Hall, 2008
∙ Christopher D. Manning and Hinrich Schütze, "Foundations of Statistical Natural Language
Processing" MIT Press, 1999
∙ Turney, Peter D., and Patrick Pantel. "From frequency to meaning: Vector space models of
semantics." Journal of artificial intelligence research 37 (2010): 141-188.

∙ IMPORTANT NOTE:
1. Course Mini / Minor Project Proposal - Aug 16th, 2023
2. Mid Sem Project Progress Presentation - Sep 11th, 2023
3. Final Project Presentation and Demo - Oct 30th, 2023

Analysis of Algorithms 4
Books etc.
• Main Text(s):
– Speech and NLP: Jurafsky and Martin
– Foundations of Statistical NLP: Manning and Schutze

• Journals
– Computational Linguistics, Natural Language Engineering, AI,
AI Magazine, IEEE SMC, TALIP, Computer Speech and
Language
• Conferences
– ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT,
ICON, SIGIR, WWW, ICML, ECML *ACL, FIRE, SPELLL
Assessment Type and COs

Assessment Type Course Outcomes (COs)

CO1 CO2 CO3 CO4
Mid Sem Theory Exam X X
End Sem Theory Exam X X
Lab Continuous
Evaluations - Assignments
X X X
Mini / Minor Project X X

Analysis of Algorithms 6
Evaluation Plan
• Course Mini / Minor Project: 30%
• Continuous Evaluation & Assignments: 20%
• Mid Sem Exam: 20%
• End Sem Exam: 30%

• Conf/Journal Publication (Bonus Marks)

• Shared Task Competition/Hackathon etc

Analysis of Algorithms 7
Course Minor Project (30%)
• IEEE/ACM Reputed Journals as base papers
• Core Conferences/ Shared Tasks
• Implementation
– Title/Topic/Team Proposal(5)
– Midsem Eval (10)
– End sem Eval (15)
• Plagiarism free Report –Not AI generated
• Conf/Journal Publication (Bonus Marks)
Collaboration Works
• Winnipeg University, Canada
• University of Galway, Ireland
• Legal Summarization– NIT Trichy
• Telugu NLP - NIT-AP / IIIT-Hyd
• LLMs-Eduminster US
• Conversation System -ISRO
Some open Topics
• Finance NLP
• Medical Documents-ClinicalNLP
• Education Documents – NLP
• Social Media Comments –Depression –Mental
Well being
• Legal Documents – Ontology – Document
Retrieval
• LLMs-Llama 2 –ChatGPT
• Conversation System –QA - Chatbot
10
Some open Topics
• Sign Language Translation
• Financial Document Causality Detection”
• Multimodal Argument Mining
• Violence Inciting Text Detection
• Multi-lingual Multi-task Information
Retrieval
• Ontology based Senticnet

11
12
NLP with AI and Deep learning
https://
marutitech.c
om/use-
cases-of-
natural-
language-
processing-
in-
healthcare/
NLP in education
• Innovative Education Applications
• Educational Chatbots
• Automatic Essay/answer Grading – Quality
assessment
• Automatic Question/ Exercise generation
• Behavior analytics.
NLP for Finance an Agriculture
• Sentiment Analysis – Stock Prediction
• Chatbots for Financial/Invesment suggestions
• Chatbots for Farmers (Regional Languages)
• Discovering crop disease trends using farmer
queries
• Terminology Extraction for Document
Matching and Open Data in Agricultural
Domain:
Sub domains
Bio-NLP
• Open Problems

https://towardsdatascience.com/summarising-the-latest-research-on-coronavirus-with-nlp-
and-topic-modelling-28b867ad9860
The NLP Research Community

• Papers
– ACL Anthology has nearly everything, free!
• Over 60,000 papers!
• Free-text searchable
– Great way to learn about current research on a topic
– New search interfaces currently available in beta
» Find recent or highly cited work; follow citations
• Used as a dataset by various projects
– Analyzing the text of the papers (e.g., parsing it)
– Extracting a graph of papers, authors, and institutions
(Who wrote what? Who works where? What cites what?)
The NLP Research Community

• Conferences
– Most work in NLP is published as 9-page conference papers
with 3 double-blind reviewers.
– Main annual conferences: ACL, EMNLP, NAACL
• Also EACL, IJCNLP, COLING … and LREC!
• + various specialized conferences and workshops
– Big events, and growing fast! ACL 2020:
• > 2000 attendees
• 2244 full-length papers submitted (25% accepted)
• 1185 short papers submitted (18% accepted)
• 19 workshops on various topics
• “Best paper” awards – worth reading these papers
The NLP Research Community

• Datasets
– Raw text or speech corpora
• Or just their n-gram counts, for super-big corpora
• Various languages and genres
• Usually there’s some metadata (each document’s date, author, etc.)
• Sometimes  licensing restrictions (proprietary or copyright data)
– Text or speech with manual or automatic annotations
• What kind of annotations? That’s the rest of this lecture …
• May include translations into other languages
– Words and their relationships
• Morphological, semantic, translational, evolutionary
– Grammars
– World Atlas of Linguistic Structures
– Parameters of statistical models (e.g., grammar weights)
The NLP Research Community

• Datasets
– Read papers to find out what datasets others are using
• Linguistic Data Consortium (searchable) hosts many large datasets
• Many projects and competitions post data on their websites
• But sometimes you have to email the author for a copy
– CORPORA mailing list is also good place to ask around
– LREC Conference publishes papers about new datasets & metrics
– Amazon Mechanical Turk – pay humans (very cheaply) to annotate your
data or to correct automatic annotations
• Old task, new domain: Annotate parses etc. on your kind of data
• New task: Annotate something new that you want your system to find
• Auxiliary task: Annotate something new that your system may benefit from
finding (e.g., annotate subjunctive mood to improve translation)
– Can you make annotation so much fun or so worthwhile
that they’ll do it for free?
Thank You

Experian Dispute Form
100% (3)
Experian Dispute Form
1 page
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
From Everand
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Global Summit On Innovation, Productivity in The Age of AI - Revised
No ratings yet
Global Summit On Innovation, Productivity in The Age of AI - Revised
6 pages
Class 31 - Depreciation and Income Taxes Contd..
No ratings yet
Class 31 - Depreciation and Income Taxes Contd..
33 pages
Lecture 2
No ratings yet
Lecture 2
16 pages
Lecture 1 - Engineering Economics - CS - 25-July-22
No ratings yet
Lecture 1 - Engineering Economics - CS - 25-July-22
12 pages
FIRE FIGHTING TANK - MEP-Model
No ratings yet
FIRE FIGHTING TANK - MEP-Model
1 page
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
Getting Started With RStudio and Installing Packages
No ratings yet
Getting Started With RStudio and Installing Packages
6 pages
Branching and Merging (Web UI)
No ratings yet
Branching and Merging (Web UI)
6 pages
Lecture7 LinearFilters
No ratings yet
Lecture7 LinearFilters
22 pages
Optical Flow
No ratings yet
Optical Flow
58 pages
(10a) How Walmart Canada Uses Blockchain To Solve Supply-Chain Challenges
No ratings yet
(10a) How Walmart Canada Uses Blockchain To Solve Supply-Chain Challenges
8 pages
Policy Gradient Methods-BR
No ratings yet
Policy Gradient Methods-BR
14 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Boe 1
No ratings yet
Boe 1
9 pages
Fundamentals of Quantum Programming in I
No ratings yet
Fundamentals of Quantum Programming in I
354 pages
Fundamentals of Mathematics Unit 2 - V1
No ratings yet
Fundamentals of Mathematics Unit 2 - V1
21 pages
Research On Topology Planning For Wireless Mesh Networks Based On Deep Reinforcement Learning
No ratings yet
Research On Topology Planning For Wireless Mesh Networks Based On Deep Reinforcement Learning
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
13 pages
Emerging ICT Technologies and Cybersecurity: Kutub Thakur Al-Sakib Khan Pathan Sadia Ismat
No ratings yet
Emerging ICT Technologies and Cybersecurity: Kutub Thakur Al-Sakib Khan Pathan Sadia Ismat
291 pages
Environmental Engineering Project S3 1st Semester 2024
No ratings yet
Environmental Engineering Project S3 1st Semester 2024
3 pages
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
No ratings yet
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
20 pages
Major Final Project - BW
No ratings yet
Major Final Project - BW
80 pages
Unit 5 Dev 2023
No ratings yet
Unit 5 Dev 2023
23 pages
Unit 1
No ratings yet
Unit 1
23 pages
Vaixell Teseu
No ratings yet
Vaixell Teseu
5 pages
Cordeau 2002
No ratings yet
Cordeau 2002
11 pages
Shoprite - Navigating A Competitive Market
No ratings yet
Shoprite - Navigating A Competitive Market
4 pages
Kolom Distilasi Tinjauan Umum
No ratings yet
Kolom Distilasi Tinjauan Umum
22 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
No ratings yet
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
14 pages
ME02023011
No ratings yet
ME02023011
3 pages
TMS374 Family In-Circuit Programming: Users Manual Rev. 1.3 2005.05.11
100% (1)
TMS374 Family In-Circuit Programming: Users Manual Rev. 1.3 2005.05.11
10 pages
IT257 DAA Linear Programming
No ratings yet
IT257 DAA Linear Programming
42 pages
Preliminaryproject
No ratings yet
Preliminaryproject
9 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
Presentation 1
No ratings yet
Presentation 1
10 pages
ChatGPT-NLP Course Summary
No ratings yet
ChatGPT-NLP Course Summary
34 pages
Unit 1
No ratings yet
Unit 1
99 pages
Akchukwu Wisdom Chidi Seminar Corrected Version
No ratings yet
Akchukwu Wisdom Chidi Seminar Corrected Version
17 pages
Valet Parking Management System - Requirement Specification - V1
No ratings yet
Valet Parking Management System - Requirement Specification - V1
14 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Outline
No ratings yet
Outline
34 pages
Sika RainTite
No ratings yet
Sika RainTite
2 pages
Unit 4
No ratings yet
Unit 4
39 pages
HP-MP: Compact Pulverizing Mill and Pellet Press
No ratings yet
HP-MP: Compact Pulverizing Mill and Pellet Press
6 pages
A Survey NLP Natural Language Processing and Trans
No ratings yet
A Survey NLP Natural Language Processing and Trans
12 pages
Sans 10292
No ratings yet
Sans 10292
31 pages
NLP M1 Students
No ratings yet
NLP M1 Students
17 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
Lect36 Tasks
No ratings yet
Lect36 Tasks
95 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
57 pages
Mil 1ST Sem 2ND Quarter Week 1
No ratings yet
Mil 1ST Sem 2ND Quarter Week 1
8 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
Tips For Managing Virtual Teams 24 03 PDF
No ratings yet
Tips For Managing Virtual Teams 24 03 PDF
1 page
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Lecture1 Intro
No ratings yet
Lecture1 Intro
54 pages
NLP PPT1
No ratings yet
NLP PPT1
29 pages
NLP Intro Logistics MIHE
No ratings yet
NLP Intro Logistics MIHE
21 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
31 pages
Lecture 01
No ratings yet
Lecture 01
44 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
9 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
No ratings yet
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
55 pages
Syllabus NLP
100% (1)
Syllabus NLP
2 pages
Natural Language Processing - Session 1 - Introduction
100% (1)
Natural Language Processing - Session 1 - Introduction
55 pages
Lecture01 Introduction
No ratings yet
Lecture01 Introduction
35 pages
Natural Language Processing (NLP) : April 2024
No ratings yet
Natural Language Processing (NLP) : April 2024
88 pages
CC-LINK Interface: SR83 Digital Controller
No ratings yet
CC-LINK Interface: SR83 Digital Controller
24 pages
NLP Unit-1 - 1
No ratings yet
NLP Unit-1 - 1
24 pages
Manual Fujikura 70s
No ratings yet
Manual Fujikura 70s
98 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
Lect1 Intro 3jan08
No ratings yet
Lect1 Intro 3jan08
94 pages
Lect36 Tasks
No ratings yet
Lect36 Tasks
95 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
87 pages
Lakshmi Priya Vellineni - Module 4 Assignment
No ratings yet
Lakshmi Priya Vellineni - Module 4 Assignment
5 pages
NLP Notes
No ratings yet
NLP Notes
90 pages
Prepare Installer Lesson
No ratings yet
Prepare Installer Lesson
25 pages
Module 1 Lecture 1
No ratings yet
Module 1 Lecture 1
29 pages
Introduction
No ratings yet
Introduction
29 pages
TELSMITH Rotary Drum Scrubber New
100% (4)
TELSMITH Rotary Drum Scrubber New
7 pages
tdt4310 2024 Lect1 Full
No ratings yet
tdt4310 2024 Lect1 Full
42 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
NLP Notes
No ratings yet
NLP Notes
16 pages
Unit 1
No ratings yet
Unit 1
35 pages
Lecture 01
No ratings yet
Lecture 01
22 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
Lect36 Tasks
No ratings yet
Lect36 Tasks
115 pages
NLP Course File Notes
No ratings yet
NLP Course File Notes
71 pages
Brochure CMU NLP 24-08-2022 V13
No ratings yet
Brochure CMU NLP 24-08-2022 V13
13 pages
Nlpslide
No ratings yet
Nlpslide
21 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
68 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
Advances in Natural Language Processing - A Survey of Current Research Trends, Development Tools and Industry Ap..
No ratings yet
Advances in Natural Language Processing - A Survey of Current Research Trends, Development Tools and Industry Ap..
4 pages
Natural Language Processing: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
No ratings yet
Natural Language Processing: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
61 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
Lecture-1-Introduction To Natural Language Processing-2021
No ratings yet
Lecture-1-Introduction To Natural Language Processing-2021
46 pages