Pert23 - NLP
Pert23 - NLP
Non-official Slides
Session 23
2
Outline
1. Natural Language Processing
2. Language Models
3. Text Classification
4. Information Retrieval
5. Information Extraction
6. Summary
3
Natural Language Processing
• Agent which want to add the information needs to understand
(at least partially) of the human language (natural language)
– Text Classification
– Information Retrieval
– Information Extraction
4
Language Models
• One common factor in searching information is the language
models
– For example: The rules say that the "meaning" of "2 + 2" is
4, and the meaning of “ 1 / 0 ” is that an error is signaled
5
Language Models
• N-gram character models (can be words, or other units)
6
Language Models
• N-gram models for language identification
• How?
7
Language Models
• Smoothing n-gram models
8
Language Models
• Smoothing n-gram models
9
Language Models
• Model evaluation
– Evaluation metric:
10
Text Classification
• Given a text of some kind, decide which of predefined set of
classes it belongs to (categorization)
• Training data
11
Text Classification
• Another way to think about classification is as a problem in data
compression. A lossless compression algorithm takes a sequence
of symbols, detects repeated patterns in it, and writes a
description of the sequence that is more compact than the
original.
12
Information Retrieval
• Information retrieval (Googling) is the task of finding
documents that are relevant to a user’s need for information
– A corpus of documents
– A result set
13
Information Retrieval
• IR scoring functions
14
Information Retrieval
• BM25 function
15
Information Retrieval
• BM25 function
16
Information Retrieval
• IR system evaluation
30
– 𝑅𝑒𝑐𝑎𝑙𝑙 = = 0.60
30+20
30
– 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = 0.75
30+10
18
Information Retrieval
• PageRank algorithm
– It was one of the two original ideas that set Google’s search
apart from other Web search engines (1997)
20
Information Retrieval
• Question answering
22
Information Extraction
• The simplest type of information extraction system is an
attribute-based extraction systems
24
Information Extraction
26
Information Extraction
• Machine reading
27
Information Extraction
28
Summary
• Text classification can be done with naïve Bayes n-gram
models or with any of the classification algorithms
29
References
• Stuart Russell, Peter Norvig. 2010. Artificial Intelligence : A
Modern Approach. Pearson Education. New Jersey.
ISBN:9780132071482
• http://aima.cs.berkeley.edu
30