0% found this document useful (0 votes)
21 views30 pages

Pert23 - NLP

Uploaded by

82gfmcz5fs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Pert23 - NLP

Uploaded by

82gfmcz5fs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Course : Artificial Intelligence (COMP6065)

Non-official Slides

Natural Language Processing

Session 23

Revised by Williem, S. Kom., Ph.D.


1
Learning Outcomes
At the end of this session, students will be able to:

• LO 6 : Apply how to process natural language and other


perceptual signs in order that an agent can interact
intelligently with the world

2
Outline
1. Natural Language Processing

2. Language Models

3. Text Classification

4. Information Retrieval

5. Information Extraction

6. Summary

3
Natural Language Processing
• Agent which want to add the information needs to understand
(at least partially) of the human language (natural language)

– To communicate with humans

– To acquire information from written language

• There are 3 ways to acquire information:

– Text Classification

– Information Retrieval

– Information Extraction

4
Language Models
• One common factor in searching information is the language
models

• Formal languages also have rules that define the meaning or


semantics of a program;

– For example: The rules say that the "meaning" of "2 + 2" is
4, and the meaning of “ 1 / 0 ” is that an error is signaled

• Natural languages is ambiguous and difficult to deal (large


and changing)

5
Language Models
• N-gram character models (can be words, or other units)

– We count the probability of n sequences of character

– I.e. in one web collection, P(“the”) = 0.027 and P(“zgq”) =


0.000000002

– A model of the probability distribution of n-letter


sequences

– It is also defined as a Markov Chain of order n-1

6
Language Models
• N-gram models for language identification

– Given, a text, determine what natural language it is written in

– I.e. “Hello, World” and “Halo, dunia”

• How?

– We build the trigram character model of each language

– We measure the prior probability of each language

7
Language Models
• Smoothing n-gram models

– N-gram only estimate the true probability distribution


(high probability for common words  “ th” = 1.5%)

– How about for uncommon words? “ ht” = ??

• No dictionary words start with ht

• The program issues an http request = ??

– The process of adjusting the probability of low-


frequency counts

8
Language Models
• Smoothing n-gram models

– Backoff model, in which we start by estimating n-gram


counts, but for any particular sequence that has low count,
we back off to n-1 grams.

– Linear interpolation smoothing is a backoff model that


combines trigram, bigram, and unigram

9
Language Models
• Model evaluation

– To choose what model we use

– Perform cross validation

• Training and validation corpus

– Evaluation metric:

• Reciprocal of probability, normalized by sequence


length

10
Text Classification
• Given a text of some kind, decide which of predefined set of
classes it belongs to (categorization)

– I.e. spam detection (spam and ham)

• Training data

11
Text Classification
• Another way to think about classification is as a problem in data
compression. A lossless compression algorithm takes a sequence
of symbols, detects repeated patterns in it, and writes a
description of the sequence that is more compact than the
original.

• For example, the text "0.142857142857142857" might be


compressed to “0.[142857]*3”. Compression algorithms work by
building dictionaries of subsequences of the text, and then
referring to entries in the dictionary.

12
Information Retrieval
• Information retrieval (Googling) is the task of finding
documents that are relevant to a user’s need for information

• An information retrieval (IR) system can be characterized by

– A corpus of documents

– Queries posed in a query language

– A result set

– A presentation of the result set

• The earliest IR systems worked on a Boolean keyword model

13
Information Retrieval
• IR scoring functions

– Instead of using Boolean model, most IR systems use


models based on statistics of word counts

• I.e. BM25 scoring function

– A scoring function takes a document and a query and


returns a numeric score (relevancy score)

– In BM25 function, the score is a linear weighted


combination of scores for each of the words

14
Information Retrieval
• BM25 function

– Three factors affect the weight of a query term

• The frequency with which a query term appears in


document (TF = term frequency)

• The inverse document frequency of the term (IDF)

– Document frequency of the term (DF)

• The length of document

15
Information Retrieval
• BM25 function

– The parameters are b and k

– L is the average document length in the corpus

16
Information Retrieval
• IR system evaluation

– There are two measures used in the scoring

• Recall: The proportion of all relevant documents in the


collection that are in the results set

• Precision: The proportion of documents in the result


set that are actually relevant

– IR system results in 100 documents


In Result Not In Result
Relevant 30 20
Not Relevant 10 40 17
Information Retrieval
• IR system evaluation
In Result Not In Result
Relevant 30 20
Not Relevant 10 40

30
– 𝑅𝑒𝑐𝑎𝑙𝑙 = = 0.60
30+20

30
– 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = 0.75
30+10

18
Information Retrieval
• PageRank algorithm

– It was one of the two original ideas that set Google’s search
apart from other Web search engines (1997)

– If the query [IBM] how do we ensure that the IBM home


page (ibm.com ) is the first in a sequence of query results,
even if other pages have a more frequency of IBM word .

– The concept is that ibm.com has many in-links (links to


pages ibm.com), then it certainly would be ranked first in
the results.
19
Information Retrieval
• PageRank algorithm

– PageRank is designed to weight links from high-quality


sites more heavily

– It can be computed by an iterative procedure: start with all


pages having PR(p) = 1, and iterate the algorithm until
convergence

20
Information Retrieval
• Question answering

– Is a somewhat different task, in which the query really is a


question, and the answer is not a ranked list of documents
but rather a short response

– Based on the premise that the question could be answered


on many web pages, then the problem in question-and-
answer is considered as the issue of precision (accuracy),
not a recall (completeness).

• We only have to find the answer


21
Information Extraction
• Information extraction is the process of acquiring knowledge
by skimming a text and looking for occurrences of a particular
class of object and for relationship among objects

– I.e. extract instances of addresses from web pages

• In a limited domain, it can be done with high accuracy

• In a general domain, more complex linguistic models and


learning techniques are necessary

22
Information Extraction
• The simplest type of information extraction system is an
attribute-based extraction systems

– Assumes that the entire text refers to a single object

– I.e. The problem of extracting from the text “IBM ThinkBook


970. Our price: $399.00” the attributes {Manufacturer=IBM,
Model=ThinkBook970, Price=$399.00”

• We can address the problem by defining a template

– Defined by a finite state automaton, regex (regular


expression)
23
Information Extraction
• The regex template for prices in dollars:

• The upgrade version of attribute-based extraction systems are


relational extraction systems

– I.e. FASTUS which handles news stories about corporate


mergers and acquisitions

24
Information Extraction

• FASTUS consists of five stages:


– Tokenization
– Complex-word handling
– Basic-group handling
– Complex-phrase handling
– Structure merging
25
Information Extraction
• A different application of extraction technology is building a
large knowledge base of facts from a corpus

• This is different in three ways:


– First it is open-ended—we want to acquire facts about all
types of domains, not just one specific domain

– Second, with a large corpus, this task is dominated by


precision, not recall

– Third, the results can be statistical aggregates gathered from


multiple sources

26
Information Extraction
• Machine reading

– A machine that behaves more like a human reader who


learns from the text itself

– A representative machine-reading system is TEXTRUNNER


(Banko and Etzioni, 2008)

• I.e. from the parse of the sentence “Einstein received


the Nobel Prize in 1921,” TEXTRUNNER is able to extract
the relation (“Einstein”, “received”, “Nobel Prize”)

27
Information Extraction

• 8 general templates cover about 95% of relation in English exp.

• TEXT RUNNER achieves a precision of 88% and recall of 45%


(F1 of 60%) on a large Web corpus.

28
Summary
• Text classification can be done with naïve Bayes n-gram
models or with any of the classification algorithms

• Information retrieval systems use a very simple language


model

• Information extraction systems use a more complex model


that includes limited notions of syntax and semantics

29
References
• Stuart Russell, Peter Norvig. 2010. Artificial Intelligence : A
Modern Approach. Pearson Education. New Jersey.
ISBN:9780132071482

• http://aima.cs.berkeley.edu

30

You might also like