0% found this document useful (0 votes)
88 views

Lecture-1-Introduction To Natural Language Processing-2021

The document outlines the assessment criteria for the Natural Language Processing course, including two quizzes worth 20 marks each, an end-semester exam worth 60 marks, and an oral examination worth 30 marks, with a minimum of 40 marks required to pass. The evaluation of the end-semester exam is subject to change according to guidelines from the Dean of Academics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Lecture-1-Introduction To Natural Language Processing-2021

The document outlines the assessment criteria for the Natural Language Processing course, including two quizzes worth 20 marks each, an end-semester exam worth 60 marks, and an oral examination worth 30 marks, with a minimum of 40 marks required to pass. The evaluation of the end-semester exam is subject to change according to guidelines from the Dean of Academics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

DE-III

Natural Language Processing


Dr. Yashodhara Haribhakta
Department of Computer Engg. & I.T.,
College of Engineering Pune
Email: [email protected]
Theory Assessment
T1/Quiz/Surprise test – 20 marks
T2/Quiz/Surprise test – 20 marks
End-Sem – 60 marks (Prorated for 30 Marks)
Oral Examination – 30 marks (Internal + External
Examiner)

Note :
1) Minimum marks – 40 marks for passing in this
subject.
2) The above evaluation of endsem is subject to
change as per guidelines by Dean Academics
NLP-Syllabus
NLP-Syllabus
NLP-Syllabus
Course Objectives
• Introduce the fundamental
concepts and techniques Problem

of natural language Semantics NLP


Trinity
processing (NLP) by studying Parsing

the phonological, Part of Speech

morphological, syntactic and Tagging

semantic processing. Morph


Analysis Marathi French

• To gain an in-depth HMM


Hindi English
understanding of algorithms CRF Language

available for the processing MEMM

Algorithm
of linguistic text information RNN
and the underlying
computational properties
of natural languages .
Why do we need to study NLP?
Introduction

• What is NLP?
Introduction

• What is NLP?
― Processing text data so that able to infer some information
which is useful.
Introduction

• What is NLP?
― Processing text data so that able to infer some information
which is useful.
• What is the main goal of NLP?
Introduction

• What is NLP?
― Processing text data so that able to infer some information
which is useful.
• What is the main goal of NLP?
1. Fundamental and Scientific Goal
 Deep Understanding of natural language
Introduction

• What is NLP?
― Processing text data so that able to infer some information
which is useful.
• What is the main goal of NLP?
1. Fundamental and Scientific Goal
 Deep Understanding of natural language
2. Practical and Engineering goal
 Design, implement and test systems that process
natural language for practical applications
Course Outcomes
Students should be able to:
1. Demonstrate the understanding of basic text
processing techniques in NLP
2. Analyze the morphological analyzers and
stemmers
3. Build language models and demonstrate
WSD using knowledge base WordNet for
English langauge
4. Design, Implement and evaluate the POS
taggers and parsers
NLP Perspective in AI
Language Families
Multilinguality: Indian situation
• Language families
– Indo Aryan
– Dravidian
– Austro-Asiatic
– Tibeto-Burman
• Languages that are ranked
within 20 in the world in terms
of the populations speaking
them, are
– Hindi : 3rd (~350 milion)
– Bangla: 7th (~230 million)
– Marathi:15th (~84 million)
Natural Language Processing:
Background & Relevance in Indian
Scenario
Background: Indian Context
 India is a multi-lingual country with great linguistic and
cultural diversities
 22 official languages mentioned in the Indian constitution
 However, Census of India in 2011 reported-
 121 major languages
 1,599 other regional languages
 2,371 scripts
 30 languages are spoken by more than one million
native speakers
 121 are spoken by more than 10,000 people
 20% understand English
 80% cannot understand
https://en.wikipedia.org/wiki/Languages_of_India
Background
 Phenomenal growth in the number of internet users, social
media (Facebook,Twitter etc.)
 Increasing tendency of using Indian language contents
for exchanging information
 Digital divide cannot be tackled unless citizens are
given flexibility in communicating in their own languages

Natural Language Processing (NLP) that deals with


developing theories and techniques for effective
communication in human languages play an
important role towards creating this digital society
Motivation
TDIL: MeiTY, Govt. of India
TDIL : Technology Development for Indian Languages Programme
initiated by the Ministry of Electronics & Information
Technology, Govt. of India
Objective:
―objective of developing Information Processing Tools
and Techniques to facilitate human-machine
interaction without language barrier;
― creating and accessing multilingual knowledge
resources; and
―integrating them to develop innovative user products
and services.
TDIL: Some major Machine Translation Projects
1. Development of English to Indian Language Machine Translation
System (Anuvadaksh): Translator for English to Hindi/ Marathi/
Bangla/ Oriya/ Tamil/ Urdu/ Gujrati/ Bodo
2. Development of English to Indian Language Machine Translation
System with Angla-Bharti Technology: ANGLABHARTI represents a
machine-aided translation methodology specifically designed
for translating English to Indian languages, like, English to
Bangla/ Punjabi/ Malaylam/ Urdu/ Hindi/ Telugu
3. Development of Indian Language to Indian Language
Machine Translation System (Sampark)- 18 pairs of languages, like,
-Hindi to Bengali, Bengali to Hindi, Marathi to Hindi, Hindi to
Marathi, Hindi to Punjabi, Punjabi to Hindi, Hindi to Tamil, Tamil to
Hindi, Hindi to Kannada, Kannada to Hindi, Hindi to Telugu, Telugu
to Hindi, Hindi to Urdu, Urdu-Hindi, Malaylam to Tamil,Tamil to
Malaylam,Tamil to Telugu, Telugu to Tamil
TDIL: Some major initiatives
• Development of Cross-Lingual Information Access (CLIA)
―Assamese, Bengali, Hindi, Oriya, Punjabi, Tamil, Telugu, Marathi,
Gujarati
• Development of Robust Document Analysis & Recognition System for
Indian Languages (OCR) - 14 languages
―Assamese, Bengali, Devanagri, Gujarati, Gurumukhi, Kannada,
Malaylam, Manipuri, Marathi, Oriya,Tamil,Telugu,Tibetan, Urdu
• Development of Text to Speech System in Indian Languages
• Development of Automatic Speech Recognition System in Indian
Languages
• Development of Sanskrit Machine Translation System
• Development of Hindi to English Machine Translationin Judicial Domain
Languages and the Institutes working
on different language
Language Institute
Assamese Guwahati University, Guwahati, Assam
Bengali Indian Statistical Institute, Kolkata, West Bengal
Bodo Guwahati University, Guwahati, Assam
Gujarati Dharamsinh Desai University, Nadiad, Gujarat

Hindi IIT Bombay, Mumbai, Maharashtra

Kannada Mysore University, Mysore, Karnataka

Kashmiri Kashmir University, Srinagar, Jammu and Kashmir

Konkani Goa University, Taleigao, Goa


Malayalam Amrita University, Coimbatore, Tamil Nadu
Marathi IIT Bombay, Mumbai, Maharashtra
Meitei Manipur University, Imphal, Manipur
Nepali Assam University, Silchar, Assam
Oriya Hyderabad Central University, Hyderabad, Andhra Pradesh

Punjabi Thapar University and Punjabi University, Patiala, Punjab

Sanskrit IIT Bombay, Mumbai, Maharashtra


Tamil Tamil University, Thanjavur, Tamil Nadu
Telugu Dravidian University, Kuppam, Andhra Pradesh
Urdu Jawaharlal Nehru University, New Delhi
Applications
NLP: In Business
 Searching –Autocorrect/Autocomplete
 Translation from one language to another language
 Survey Analysis
 Sentiment Analysis: Analyzing public opinion
 Email Filters: Filtering out irrelevant emails
 Information Extraction
Application
1. Search Autocorrect and Autocomplete
Application
2. Language Translator
Application
2. Language Translator
Application
2. Language Translator
Application
2. Language Translator
Application
3. Sentiment Analysis
Application
4. Survey Analysis
Application
5. Targeted Advertising
Application
6. Hiring and Recruitment
Application
7. Grammar Checkers
Application
8. Email Filtering
Have you ever used Gmail ?
Application
9. Question Answering/ Chatbots

• chatbots are able to read a conversation,


understand what a user said, and construct a new
utterance to keep the conversation going and solve
a task.
Application
10. Automatic Summarization

Some of the APIs : Aylien Text Analysis,


MeaningCloud Summarization, ML Analyzer,
Summarize Text, Text Summary.
Application
11. Information Retrieval
- Lot of unstructured data. Identify entities of
interest and their relationship.
Example:
College of Engineering Pune, selected Dr. Jibi Abraham, as
Dean Academics from June 2019, responsible for all
academic based activities. She was institute MIS-Incharge
before this. She succeeds Dr. M.S. Sutaone, who is now
Deputy Director of the institute.
Person Name Institute /Company Post State Year
Name
Dr. Jibi Abraham COEP Dean Academics Start 2019
Dr. Jibi Abraham COEP MIS-Incharge End 2019
Dr. M.S.Sutaone COEP Deputy Director Start 2019
Applications
12. Multi-Lingual NLP Systems
– systems that can converse or understand or
interact in multiple languages at the same time
– Enable conversation between a Marathi and
Kannada speaker
– Possible due to the use of pre-trained language
models
NLP: In Governance
 Uses of NLP in Government
Websites
 Making e-governance related
information to be available in
multiple languages
 Natural Language Generation
in e-Governance
 Chatbot
 E.g. farmer can not read or write,
but with the multilingual support
and NLP generation, s/he can
communicate the query in any
language and get it resolved
NLP: In Healthcare
 NLP in Healthcare
 the healthcare system is to provide better and 24/7 Electronic health
record experience
 for doing Predictive analytics, Prescriptive analytics
 Patients can interact in his/her own language
 Easier for a patient to understand health status

 Identification of the patients which require Improved Care


Coordination
 Automated detection of cancer, detection of the root causes related to
any disorder are some of the examples
NLP: In Finance
 Credit Scoring Method
 Estimate risk factor of giving loan with the past histories
 E.g. Lenddo EFL (with 115 employees), a Singapore-based company
developed a software called Lenddo Score which uses machine
learning and NLP to assess and calculate an individual’s
creditworthiness.
 Fraud detection in banking
 Stock market prediction- based on sentiment
NLP: In Other domains
 National Security
 Sentiment in Cross-border languages
 Hate Speech, Radicalization
 NLP in Recruitment
 searching the appropriate applications from the data, and it also can
be used for selecting the best applications from the data available
 NLP in CodeMixing
 Code- mixing refers to the mixing of two or more languages or
language varieties in speech/text

Kolkata to Varanasi ka kya distance hai

Entity English Hindi


NLP: Projected Growth
 Growing in an exponentialmanner
 Expectedto touch the marketof more than $25 billion in 2022
 With compound growth rateof 16% annually
 Reasons behind this growth
 Rising of the Chatbots
 Urge of discovering the customer insights
 Transfer of technology of messaging from manual to
automated
 Translation of contents, and
 many other tasks which are required to be automated and
involve language/Speech at some point
 Etc.
Major Industries: Amazon, Google, Microsoft, Facebook,IBM,etc.

You might also like