0% found this document useful (0 votes)

137 views187 pages

18-IntroNLP II PDF

The document defines minimum edit distance and describes how it can be used for spell checking, machine translation, and computational biology applications. Minimum edit distance is the minimum number of edit operations (insertions, deletions, substitutions) needed to transform one string into another. It is computed using dynamic programming to find the optimal alignment between two strings or sequences. Weights can be added to account for different costs of edit operations. This has applications in areas like biology where certain mutations may be more common.

Uploaded by

aortizavila2544

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views187 pages

18-IntroNLP II PDF

Uploaded by

aortizavila2544

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 187

Definition of Minimum Edit Distance

▪ Spell correction
▪ The user typed “graffe”
• Computational Biology
Which is closest? • Align two sequences of nucleotides
▪ graf
▪ graft AGGCTATCACCTGACCTCCAGGCCGATGCCC
▪ grail
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
▪ giraffe

• Resulting alignment:
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

• Also for Machine Translation, Information Extraction, Speech

Recognition
▪ The minimum edit distance between two strings
▪ Is the minimum number of editing operations
▪ Insertion
▪ Deletion
▪ Substitution

▪ Needed to transform one into the other

▪ Two strings and their alignment:
▪ If each operation has cost of 1
▪ Distance between these is 5

▪ If substitutions cost 2 (Levenshtein)

▪ Distance between them is 8
▪ Given a sequence of bases

AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
▪ An alignment:

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
▪ Given two sequences, align each letter to a letter or gap
▪ Evaluating Machine Translation and speech recognition

R Spokesman confirms senior government adviser was shot

H Spokesman said the senior adviser was shot dead
S I D I
▪ Named Entity Extraction and Entity Coreference
▪ IBM Inc. announced today
▪ IBM profits
▪ Stanford President John Hennessy announced yesterday
▪ for Stanford University President John Hennessy
▪ Searching for a path (sequence of edits) from the start string to the final string:
▪ Initial state: the word we’re transforming
▪ Operators: insert, delete, substitute
▪ Goal state: the word we’re trying to get to
▪ Path cost: what we want to minimize: the number of edits

8
▪ But the space of all edit sequences is huge!
▪ We can’t afford to navigate naïvely
▪ Lots of distinct paths wind up at the same state.
▪ We don’t have to keep track of all of them
▪ Just the shortest path to each of those revisted states.

9
▪For two strings
▪ X of length n
▪ Y of length m
▪We define D(i,j)
▪ the edit distance between X[1..i] and Y[1..j]
▪ i.e., the first i characters of X and the first j characters
of Y
▪ The edit distance between X and Y is thus D(n,m)
Definition of Minimum Edit Distance
Computing Minimum Edit Distance
▪ Dynamic programming: A tabular computation of D(n,m)
▪ Solving problems by combining solutions to subproblems.
▪ Bottom-up
▪ We compute D(i,j) for small i,j
▪ And compute larger D(i,j) based on previously computed smaller values
▪ i.e., compute D(i,j) for all i (0 < i < n) and j (0 < j < m)
▪ Initialization
D(i,0) = i
D(0,j) = j
▪ Recurrence Relation:
For each i = 1…M
For each j = 1…N
D(i-1,j) + 1
D(i,j)= min D(i,j-1) + 1
D(i-1,j-1) + 2; if X(i) ≠ Y(j)
0; if X(i) = Y(j)
▪ Termination:
D(N,M) is distance
N 9
O 8
I 7

T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
The Edit Distance Table
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
N 9
O 8
I 7

T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
The Edit Distance Table
N 9 8 9 10 11 12 11 10 9 8
O 8 7 8 9 10 11 10 9 8 9
I 7 6 7 8 9 10 9 8 9 10
T 6 5 6 7 8 9 8 9 10 11
N 5 4 5 6 7 8 9 10 11 10
E 4 3 4 5 6 7 8 9 10 9
T 3 4 5 6 7 8 7 8 9 8
N 2 3 4 5 6 7 8 7 8 7
I 1 2 3 4 5 6 7 6 7 8
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
Computing Minimum Edit Distance
Backtrace for Computing Alignments
▪ Edit distance isn’t sufficient
▪ We often need to align each character of the two strings to each other

▪ We do this by keeping a “backtrace”

▪ Every time we enter a cell, remember where we came from
▪ When we reach the end,
▪ Trace back the path from the upper right corner to read off the alignment
N 9
O 8
I 7

T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
▪ Base conditions: Termination:
D(i,0) = i D(0,j) = j D(N,M) is distance
▪ Recurrence Relation:
For each i = 1…M
For each j = 1…N
D(i-1,j) + 1 deletion
D(i,j)= min D(i,j-1) + 1 insertion
D(i-1,j-1) + 2; if X(i) ≠ Y(j) substitution
0; if X(i) = Y(j)
LEFT insertion
ptr(i,j)= DOWN deletion
DIAG substitution
x0 …………………… xN

Every non-decreasing path

from (0,0) to (M, N)

corresponds to
an alignment
of the two sequences

An optimal alignment is composed

y0 ……………………………… yM of optimal subalignments
Slide adapted from Serafim Batzoglou
▪ Two strings and their alignment:
▪Time:
O(nm)
▪Space:
O(nm)
▪Backtrace
O(n+m)
Backtrace for Computing Alignments
Weighted Minimum Edit Distance
▪ Why would we add weights to the computation?
▪ Spell Correction: some letters are more likely to be mistyped than others
▪ Biology: certain kinds of deletions or insertions are more likely than others
▪ Initialization:
D(0,0) = 0
D(i,0) = D(i-1,0) + del[x(i)]; 1 < i ≤ N
D(0,j) = D(0,j-1) + ins[y(j)]; 1 < j ≤ M

▪ Recurrence Relation:

D(i-1,j) + del[x(i)]
D(i,j)= min D(i,j-1) + ins[y(j)]
D(i-1,j-1) + sub[x(i),y(j)]

▪ Termination:
D(N,M) is distance
…The 1950s were not good years for mathematical research. [the] Secretary of Defense
…had a pathological fear and hatred of the word, research…

I decided therefore to use the word, “programming”.

I wanted to get across the idea that this was dynamic, this was multistage… I thought, let’s
… take a word that has an absolutely precise meaning, namely dynamic… it’s impossible
to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will
possibly give it a pejorative meaning. It’s impossible.

Thus, I thought dynamic programming was a good name. It was something not even a
Congressman could object to.”

Richard Bellman, “Eye of the Hurricane: an autobiography” 1984.

Weighted Minimum Edit Distance
Minimum Edit Distance in Computational Biology
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
▪ Comparing genes or regions from different
species
▪ to find important regions
▪ determine function
▪ uncover evolutionary forces

▪ Assembling fragments to sequence DNA

▪ Compare individuals to looking for mutations
▪In Natural Language Processing
▪We generally talk about distance
(minimized)
▪ And weights
▪In Computational Biology
▪We generally talk about similarity
(maximized)
▪ And scores
▪ Initialization:
D(i,0) = -i * d
D(0,j) = -j * d

▪ Recurrence Relation:

D(i-1,j) - d
D(i,j)= min D(i,j-1) - d
D(i-1,j-1) + s[x(i),y(j)]

▪ Termination:
D(N,M) is distance
x1 ……………………………… xM
y1 …………………… yN

(Note that the origin is at the

upper left.)

Slide adapted from Serafim Batzoglou

▪ Maybe it is OK to have an unlimited # of gaps in the
beginning and end:

----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC
GCGAGTTCATCTATCAC--GACCGC--GGTCG--------------

• If so, we don’t want to penalize gaps at the ends

Slide from Serafim Batzoglou

Example:
2 overlapping“reads” from a
sequencing project

Example:
Search for a mouse gene
within a human chromosome

Slide from Serafim Batzoglou

x1 ……………………………… xM Changes:
y1 …………………… yN

1. Initialization
For all i, j,
F(i, 0) = 0
F(0, j) = 0

2. Termination
maxi F(i, N)
FOPT = max
maxj F(M, j)

Slide from Serafim Batzoglou

Given two strings x = x1……xM,
y = y1……yN
Find substrings x’, y’ whose similarity
(optimal global alignment value)
is maximum

x = aaaacccccggggtta
y = ttcccgggaaccaacc

Slide from Serafim Batzoglou

Idea: Ignore badly aligning regions

Modifications to Needleman-Wunsch:

Initialization: F(0, j) = 0
F(i, 0) = 0

0
Iteration: F(i, j) = max F(i – 1, j) – d
F(i, j – 1) – d
F(i – 1, j – 1) + s(xi, yj)
Slide from Serafim Batzoglou
Termination:
1. If we want the best local alignment…

FOPT = maxi,j F(i, j)

Find FOPT and trace back

2. If we want all local alignments scoring > t

?? For all i, j find F(i, j) > t, and trace back?

Complicated by overlapping local alignments

Slide from Serafim Batzoglou

A T T A T C
X = ATCAT 0 0 0 0 0 0 0
Y = ATTATC A 0
Let: T 0
m = 1 (1 point for match)
d = 1 (-1 point for del/ins/sub) C 0
A 0
T 0
A T T A T C
X = ATCAT 0 0 0 0 0 0 0
Y = ATTATC A 0 1 0 0 1 0 0
T 0 0 2 1 0 2 0
C 0 0 1 1 0 1 3
A 0 1 0 0 2 1 2
T 0 0 2 0 1 3 2
A T T A T C
X = ATCAT 0 0 0 0 0 0 0
Y = ATTATC A 0 1 0 0 1 0 0
T 0 0 2 1 0 2 0
C 0 0 1 1 0 1 3
A 0 1 0 0 2 1 2
T 0 0 2 0 1 3 2
A T T A T C
X = ATCAT 0 0 0 0 0 0 0
Y = ATTATC A 0 1 0 0 1 0 0
T 0 0 2 1 0 2 0
C 0 0 1 1 0 1 3
A 0 1 0 0 2 1 2
T 0 0 2 0 1 3 2
Minimum Edit Distance in Computational Biology
Introduction to N-grams
▪ Today’s goal: assign a probability to a sentence
▪ Machine Translation:
▪ P(high winds tonite) > P(large winds tonite)
▪ Spell Correction
▪ The office is about fifteen minuets from my house
Why? ▪ P(about fifteen minutes from) > P(about fifteen minuets
from)
▪ Speech Recognition
▪ P(I saw a van) >> P(eyes awe of an)
▪ + Summarization, question-answering, etc., etc.!!
▪Goal: compute the probability of a sentence or sequence
of words:
P(W) = P(w1,w2,w3,w4,w5…wn)

▪Related task: probability of an upcoming word:

P(w5|w1,w2,w3,w4)

▪A model that computes either of these:

P(W) or P(w |w ,w …w )
n 1 2is called a language model.
n-1

▪ Better: the grammar But language model or LM is standard

▪ How to compute this joint probability:

▪P(its, water, is, so, transparent, that)

▪ Intuition: let’s rely on the Chain Rule of Probability

▪Recall the definition of conditional probabilities
Rewriting:

P(“its water is so transparent”) =

P(its) × P(water|its) × P(is|its water)
× P(so|its water is) × P(transparent|its water is
so)
▪ Could we just count and divide?

P(the | its water is so transparent that) =

Count(its water is so transparent that the)
Count(its water is so transparent that)
▪ No! Too many possible sentences!
▪ We’ll never see enough data for estimating these
▪Simplifying assumption: Andrei Markov

P(the | its water is so transparent that) » P(the | that)

▪Or maybe

P(the | its water is so transparent that) » P(the | transparent that)

P(w1w2 … wn ) » Õ P(wi | wi-k … wi-1 )
i

▪In other words, we approximate each

component in the product
P(wi | w1w2 … wi-1) » P(wi | wi-k … wi-1)
P(w1w2 … wn ) » Õ P(w i )
i
Some automatically generated sentences from a unigram model

fifth, an, of, futures, the, an, incorporated, a,

a, the, inflation, most, dollars, quarter, in, is,
mass

thrift, did, eighty, said, hard, 'm, july, bullish

that, or, limited, the

Condition on the previous word:

P(wi | w1w2 … wi-1) » P(wi | wi-1)

texaco, rose, one, in, this, issue, is, pursuing, growth, in,
a, boiler, house, said, mr., gurria, mexico, 's, motion,
control, proposal, without, permission, from, five, hundred,
fifty, five, yen

outside, new, car, parking, lot, of, the, agreement, reached

this, would, be, a, record, november

▪We can extend to trigrams, 4-grams, 5-grams
▪In general this is an insufficient model of language
▪ because language has long-distance dependencies:

“The computer which I had just put into the machine

room on the fifth floor crashed.”

▪But we can often get away with N-gram models

Introduction to N-grams
Estimating N-gram Probabilities
▪ The Maximum Likelihood Estimate

count(wi-1,wi )
P(wi | w i-1) =
count(w i-1 )

c(wi-1,wi )
P(wi | w i-1 ) =
c(wi-1)
<s> I am Sam </s>
c(wi-1,wi )
P(wi | w i-1 ) = <s> Sam I am </s>
c(wi-1) <s> I do not like green eggs and ham </s>
▪ can you tell me about any good cantonese restaurants close by
▪ mid priced thai food is what i’m looking for
▪ tell me about chez panisse
▪ can you give me a listing of the kinds of food that are available
▪ i’m looking for a good place to eat breakfast
▪ when is caffe venezia open during the day
▪ Out of 9222 sentences
▪ Normalize by unigrams:

log(p1 ´ p2 ´ p3 ´ p4 ) = log p1 + log p2 + log p3 + log p4

▪SRILM
▪http://www.speech.sri.com/projects/srilm
/
…
▪ serve as the incoming 92
▪ serve as the incubator 99
▪ serve as the independent 794
▪ serve as the index 223
▪ serve as the indication 72
▪ serve as the indicator 120
▪ serve as the indicators 45
▪ serve as the indispensable 111
▪ serve as the indispensible 40
▪ serve as the individual 234

http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
▪ http://ngrams.googlelabs.com/
Estimating N-gram Probabilities
Evaluating Perplexity
▪ Does our language model prefer good sentences to bad ones?
▪ Assign higher probability to “real” or “frequently observed” sentences
▪ Than “ungrammatical” or “rarely observed” sentences?

▪ We train parameters of our model on a training set.

▪ We test the model’s performance on data we haven’t seen.
▪ A test set is an unseen dataset that is different from our training set, totally unused.
▪ An evaluation metric tells us how well our model does on the test set.
▪ Best evaluation for comparing models A and B
▪ Put each model in a task
▪ spelling corrector, speech recognizer, MT system
▪ Run the task, get an accuracy for A and for B
▪ How many misspelled words corrected properly
▪ How many words translated correctly
▪ Compare accuracy for A and B
▪Extrinsic evaluation
▪ Time-consuming; can take days or weeks
▪So
▪ Sometimes use intrinsic evaluation: perplexity
▪ Bad approximation
▪ unless the test data looks just like the training data
▪ So generally only useful in pilot experiments
▪ But is helpful to think about.
mushrooms 0.1
▪ The Shannon Game: pepperoni 0.1
▪ How well can we predict the next word?
anchovies 0.01
I always order pizza with cheese and ____
….
The 33rd President of the US was ____
fried rice 0.0001
I saw a ____ ….
▪ Unigrams are terrible at this game. (Why?)
and 1e-100
▪ A better model of a text
▪ is one which assigns a higher probability to the word that actually occurs
The best language model is one that best predicts an unseen test set
• Gives the highest P(sentence) -
1
PP(W ) = P(w1w2 ...wN ) N
Perplexity is the inverse probability of
the test set, normalized by the number
of words: 1
= N
P(w1w2 ...wN )

Chain rule:

For bigrams:

Minimizing perplexity is the same as maximizing probability

▪ From Josh Goodman
▪ How hard is the task of recognizing digits ‘0,1,2,3,4,5,6,7,8,9’
▪ Perplexity 10

▪ How hard is recognizing (30,000) names at Microsoft.

▪ Perplexity = 30,000

▪ If a system has to recognize

▪ Operator (1 in 4)
▪ Sales (1 in 4)
▪ Technical Support (1 in 4)
▪ 30,000 names (1 in 120,000 each)
▪ Perplexity is 53
▪ Perplexity is weighted equivalent branching factor
▪ Let’s suppose a sentence consisting of random digits
▪ What is the perplexity of this sentence according to a model that assign P=1/10 to each digit?
▪Training 38 million words, test 1.5 million words,
WSJ
N-gram Unigra Bigram Trigra
Order m m
Perplexi 962 170 109
ty
Evaluation and Perplexity
Generalization and Zeros
▪ Choose a random bigram
<s> I
(<s>, w) according to its probability I want
▪ Now choose a random bigram (w, want to
x) according to its probability to eat
▪ And so on until we choose </s> eat Chinese
▪ Then string the words together Chinese food
food </s>
I want to eat Chinese food
▪N=884,647 tokens, V=29,066
▪Shakespeare produced 300,000 bigram types out
of V2= 844 million possible bigrams.
▪So 99.96% of the possible bigrams were never seen
(have zero entries in the table)
▪Quadrigrams worse: What's coming out looks
like Shakespeare because it is Shakespeare
▪N-grams only work well for word prediction if the
test corpus looks like the training corpus
▪In real life, it often doesn’t
▪We need to train robust models that generalize!
▪One kind of generalization: Zeros!
▪Things that don’t ever occur in the training set
▪But occur in the test set
▪Training set: • Test set
… denied the allegations … denied the offer
… denied the reports … denied the loan
… denied the claims
… denied the request
P(“offer” | denied the) = 0
▪ Bigrams with zero probability
▪ mean that we will assign 0 probability to the test set!

▪ And hence we cannot compute perplexity (can’t divide by 0)!

Generalization and Zeros
Smoothing: Add-one (Laplace) smoothing
▪ When we have sparse statistics:
P(w | denied the)

allegations
3 allegations

outcome
reports
2 reports

attack
…

claims

request
1 claims

man
1 request
7 total
Steal probability mass to generalize better
P(w | denied the)
2.5 allegations

allegations
1.5 reports

allegations

outcome
0.5 claims

reports

attack
0.5 request
…

man
claims

request
2 other
7 total
▪ Also called Laplace smoothing
▪ Pretend we saw each word one more time than we
did
▪ Just add one to all the counts!
c(wi-1, wi )
PMLE (wi | wi-1 ) =
▪ MLE estimate: c(wi-1 )

▪ Add-1 estimate: c(wi-1, wi ) +1

PAdd-1 (wi | wi-1 ) =
c(wi-1 ) +V
▪ The maximum likelihood estimate
▪ of some parameter of a model M from a training set T
▪ maximizes the likelihood of the training set T given the model M

▪ Suppose the word “bagel” occurs 400 times in a corpus of a million words
▪ What is the probability that a random word from some other text will be
“bagel”?
▪ MLE estimate is 400/1,000,000 = .0004
▪ This may be a bad estimate for some other corpus
▪ But it is the estimate that makes it most likely that “bagel” will occur 400 times in a
million word corpus.
▪ So add-1 isn’t used for N-grams:
▪ We’ll see better methods

▪ But add-1 is used to smooth other NLP models

▪ For text classification
▪ In domains where the number of zeros isn’t so huge.
Smoothing: Add-one (Laplace) smoothing
Interpolation, Backoff, and Web-Scale LM’s
▪ Sometimes it helps to use less context
▪ Condition on less context for contexts you haven’t learned much about

▪ Backoff:
▪ use trigram if you have good evidence,
▪ otherwise bigram, otherwise unigram

▪ Interpolation:
▪ mix unigram, bigram, trigram

▪ Interpolation works better

▪Simple interpolation

▪Lambdas conditional on context:

▪ Use a held-out corpus

Held-
Test
Training Data Out
Data
Data
▪ Choose λs to maximize the probability of held-out data:

▪ Fix the N-gram probabilities (on the training data)

▪ Then search for λs that give largest probability to held-out set:

log P(w1...wn | M(l1...lk )) = å log PM ( l1... lk ) (wi | wi-1 )

i
▪ If we know all the words in advanced
▪ Vocabulary V is fixed
▪ Closed vocabulary task

▪ Often we don’t know this

▪ Out Of Vocabulary = OOV words
▪ Open vocabulary task

▪ Instead: create an unknown word token <UNK>

▪ Training of <UNK> probabilities
▪ Create a fixed lexicon L of size V
▪ At text normalization phase, any training word not in L changed to <UNK>
▪ Now we train its probabilities like a normal word
▪ At decoding time
▪ If text input: Use UNK probabilities for any word not in training
▪ How to deal with, e.g., Google N-gram corpus
▪ Pruning
▪ Only store N-grams with count > threshold.
▪ Remove singletons of higher-order n-grams
▪ Entropy-based pruning

▪ Efficiency
▪ Efficient data structures like tries
▪ Bloom filters: approximate language models
▪ Store words as indexes, not strings
▪ Use Huffman coding to fit large numbers of words into two bytes
▪ Quantize probabilities (4-8 bits instead of 8-byte float)
▪“Stupid backoff” (Brants et al. 2007)
▪No discounting, just use relative frequencies
ì i
ïï count(wi-k+1 )
i-k+1 ) > 0
i
i-1
if count(w
S(wi | wi-k+1 ) = í count(wi-k+1 )
i-1

ï i-1
ïî 0.4S(wi | wi-k+2 ) otherwise

count(wi )
S(wi ) =
N 115
▪Add-1 smoothing:
▪ OK for text categorization, not for language
modeling
▪The most commonly used method:
▪ Extended Interpolated Kneser-Ney
▪For very large N-grams like the Web:
▪ Stupid backoff
116
▪ Discriminative models:
▪ choose n-gram weights to improve a task, not to fit the training set

▪ Parsing-based models
▪ Caching Models
▪ Recently used words are more likely to appear

▪ These perform very poorly for speech recognition (why?)

c(w Î history)
PCACHE (w | history) = l P(wi | wi-2 wi-1 ) + (1- l )
| history |
Interpolation, Backoff, and Web-Scale LM’s
<s> I am Sam </s>
<s> Sam I am </s>
<s> I am Sam </s>
<s> I do not like green eggs and Sam </s>
Using a biagram language model with add-one smoothing,
what is P(Sam | am)?

119
The Task of Text Classification
▪ 1787-8: anonymous essays try to convince New York to ratify U.S Constitution: Jay,
Madison, Hamilton.
▪ Authorship of 12 of the letters in dispute

▪ 1963: solved by Mosteller and Wallace using Bayesian methods

James Madison Alexander Hamilton

1. By 1925 present-day Vietnam was divided into three parts under French
colonial rule. The southern region embracing Saigon and the Mekong delta
was the colony of Cochin-China; the central area with its imperial capital at
Hue was the protectorate of Annam…
2. Clara never failed to be astonished by the extraordinary felicity of her own
name. She found it hard to trust herself to the mercy of fate, which had
managed over the years to convert her greatest shame into one of her greatest
assets…

S. Argamon, M. Koppel, J. Fine, A. R. Shimoni, 2003. “Gender, Genre, and Writing Style in Formal Written Texts,” Text, volume 23,
number 3, pp. 321–346
▪ unbelievably disappointing

▪ Full of zany characters and richly applied satire, and some great plot twists

▪ this is the greatest screwball comedy ever filmed

▪ It was pathetic. The worst part about it was the boxing scenes.

124
MEDLINE Article MeSH Subject Category Hierarchy
▪ Antogonists and Inhibitors
▪ Blood Supply
▪ Chemistry
▪ Drug Therapy
? ▪ Embryology
▪ Epidemiology
▪…

125
▪Assigning subject categories, topics, or genres
▪Spam detection
▪Authorship identification
▪Age/gender identification
▪Language Identification
▪Sentiment analysis
▪…
▪Input:
▪ a document d
▪ a fixed set of classes C = {c1, c2,…, cJ}

▪Output: a predicted class c  C

▪ Rules based on combinations of words or other features
▪ spam: black-list-address OR (“dollars” AND“have been selected”)

▪ Accuracy can be high

▪ If rules carefully refined by expert

▪ But building and maintaining these rules is expensive

▪Input:
▪ a document d
▪ a fixed set of classes C = {c1, c2,…, cJ}
▪ A training set of m hand-labeled documents
(d1,c1),....,(dm,cm)
▪Output:
▪ a learned classifier γ:d → c

129
▪Any kind of classifier
▪ Naïve Bayes
▪ Logistic regression
▪ Support-vector machines
▪ k-Nearest Neighbors

▪…
The Task of Text Classification
Naïve Bayes
▪Simple (“naïve”) classification method based on Bayes
rule
▪Relies on very simple representation of document
▪Bag of words
I love this movie! It's sweet,
but with satirical humor. The

γ
dialogue is great and the

)=c
adventure scenes are fun… It
manages to be whimsical and
romantic while laughing at the
conventions of the fairy tale
genre. I would recommend it to
just about anyone. I've seen

(
it several times, and I'm
always happy to see it again
whenever I have a friend who
hasn't seen it yet.
I love this movie! It's sweet,
but with satirical humor. The

γ
dialogue is great and the

)=c
adventure scenes are fun… It
manages to be whimsical and
romantic while laughing at the
conventions of the fairy tale
genre. I would recommend it to
just about anyone. I've seen

(
it several times, and I'm
always happy to see it again
whenever I have a friend who
hasn't seen it yet.
x love xxxxxxxxxxxxxxxx sweet
xxxxxxx satirical xxxxxxxxxx

γ
xxxxxxxxxxx great xxxxxxx

)=c
xxxxxxxxxxxxxxxxxxx fun xxxx
xxxxxxxxxxxxx whimsical xxxx
romantic xxxx laughing
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx recommend xxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

(
xx several xxxxxxxxxxxxxxxxx
xxxxx happy xxxxxxxxx again
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxx
great 2

γ
love 2
recommend
laugh
1
1
)=c
(
happy 1
... ...
Test
?
document

Machine Garbage
parser Learning NLP Collection Planning GUI
language
label learning parser garbage planning ...
translation training tag collection temporal
… algorithm training memory reasoning
shrinkage translation optimization plan
network... language... region... language...
• For a document d and a class c

P(d | c)P(c)
P(c | d) =
P(d)
cMAP = argmax P(c | d) MAP is “maximum a
posteriori” = most
cÎC likely class

P(d | c)P(c)
= argmax Bayes Rule

cÎC P(d)
= argmax P(d | c)P(c) Dropping the
denominator
cÎC
cMAP = argmax P(d | c)P(c)
cÎC
Document d
= argmax P(x1, x2,… , xn | c)P(c) represented as
features x1..xn
cÎC
cMAP = argmax P(x1, x2,… , xn | c)P(c)
cÎC

O(|X|n•|C|) parameters How often does this

class occur?

Could only be estimated if a very, very

large number of training examples was We can just count the
available. relative frequencies in a
corpus
P(x1, x2,… , xn | c)
▪Bag of Words assumption: Assume position doesn’t
matter
▪Conditional Independence: Assume the feature
probabilities P(xi|cj) are independent given the class c.

P(x1,… , xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

cMAP = argmax P(x1, x2,… , xn | c)P(c)
cÎC

cNB = argmax P(c j )Õ P(x | c)

cÎC xÎX
positions  all word positions in test document

cNB = argmax P(c j )

c j ÎC
Õ P(xi | c j )
iÎ positions
Sec.13.3

▪First attempt: maximum likelihood estimates

▪simply use the frequencies in the data

doccount(C = c j )
P̂(c j ) =
N doc
count(wi , c j )
P̂(wi | c j ) =
å count(w, c j )
wÎV
count(wi , c j ) fraction of times word wi appears
P̂(wi | c j ) =
å count(w, c j ) among all words in documents of topic cj
wÎV

▪ Create mega-document for topic j by concatenating all docs in this topic

▪ Use frequency of w in mega-document

Sec.13.3

▪ What if we have seen no training documents with the word fantastic and classified in the topic
positive (thumbs-up)?

count("fantastic", positive)
P̂("fantastic" positive) = = 0
å count(w, positive)
wÎV

▪ Zero probabilities cannot be conditioned away, no matter the other evidence!

cMAP = argmax c P̂(c)Õ P̂(xi | c)

i
count(wi , c) +1
P̂(wi | c) =
å (count(w, c))+1)
wÎV

count(wi , c) +1
=
æ ö
çç å count(w, c)÷÷ + V
è wÎV ø
• From training corpus, extract Vocabulary
▪ Calculate P(cj) terms • Calculate P(wk | cj) terms
▪ For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
| docs j |
P(c j ) ¬ nk + a
| total # documents| P(wk | c j ) ¬
n + a | Vocabulary |
Add one extra word to the vocabulary, the “unknown word” wu

count(wu, c) +1
P̂(wu | c) =
æ ö
çç å count(w, c)÷÷ + V +1
è wÎV ø
1
=
æ ö
çç å count(w, c)÷÷ + V +1
è wÎV ø
Naïve Bayes: Relationship to
Language Modeling
c=China

X1=Shanghai X2=and X3=Shenzhen X4=issue X5=bonds

153
▪ Naïve bayes classifiers can use any sort of
feature
▪ URL, email address, dictionaries, network features
▪ But if, as in the previous slides
▪ We use only word features
▪ we use all of the words in the text (not a subset)
▪ Then
▪ Naïve bayes has an important similarity to language
modeling.

154
Sec.13.2.1

▪ Assigning each word: P(word | c)

▪ Assigning each sentence: P(s|c)=P(word|c)

Class pos
0.1 I
I love this fun film
0.1 love
0.1 0.1 .05 0.01 0.1
0.01this
0.05fun
0.1 film P(s | pos) = 0.0000005
Sec.13.2.1

▪ Which class assigns the higher probability to s?

Model pos Model neg

0.1 I 0.2 I I love this fun film
0.1 love 0.001 love
0.1 0.1 0.01 0.05 0.1
0.01this 0.01this 0.2 0.001 0.01 0.005 0.1

0.05fun 0.005 fun

0.1 film 0.1 film P(s|pos) > P(s|neg)
Do Words Class
Nc c
P̂(c) = Training 1 Chinese Beijing Chinese c
N 2 Chinese Chinese Shanghai c
count(w, c) +1 3 Chinese Macao c
P̂(w | c) = 4 Tokyo Japan Chinese j
count(c)+ | V |
Test 5 Chinese Chinese Chinese Tokyo ?
Prior Japan
s: 3 Choosing a class:
P(c)= 4
1 P(c|d5)  3/4 * (3/7)3 * 1/14 * 1/14
P(j)=
4 ≈ 0.0003
Conditional Probabilities:
P(Chinese|c) =(5+1) / (8+6) = 6/14 = 3/7
P(Tokyo|c) = (0+1) / (8+6) = 1/14 P(j|d5)  1/4 * (2/9)3 * 2/9 * 2/9
P(Japan|c) = (0+1) / (8+6) = 1/14 ≈ 0.0001
P(Chinese|j) = (1+1) / (3+6) = 2/9
P(Tokyo|j) = (1+1) / (3+6) = 2/9
P(Japan|j) = (1+1) / (3+6) = 2/9 157
▪ SpamAssassin Features:
▪ Mentions Generic Viagra
▪ Online Pharmacy
▪ Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)
▪ Phrase: impress ... girl
▪ From: starts with many numbers
▪ Subject is all capitals
▪ HTML has a low ratio of text to image area
▪ One hundred percent guaranteed
▪ Claims you can be removed from the list
▪ 'Prestigious Non-Accredited Universities'
▪ http://spamassassin.apache.org/tests_3_3_x.html
▪ Very Fast, low storage requirements
▪ Robust to Irrelevant Features
Irrelevant Features cancel each other without affecting results
▪ Very good in domains with many equally important features
Decision Trees suffer from fragmentation in such cases – especially if little data
▪ Optimal if the independence assumptions hold: If
assumed independence is correct, then it
is the Bayes Optimal Classifier for problem
▪ A good dependable baseline for text classification

▪ But we will see other classifiers that give better accuracy

The Task of Text Classification
Precision, Recall & F1Score
correct not correct
selected tp fp
not selected fn tn
▪ Precision: % of selected items that are correct
Recall: % of correct items that are selected

correct not correct

selected tp fp
not selected fn tn
▪ A combined measure that assesses the P/R tradeoff is F measure (weighted
harmonic mean):

( b 2 + 1) PR
1
F= =
1
a + (1 - a )
1 b 2
P+R
P R
▪ The harmonic mean is a very conservative average; see IIR § 8.3
▪ People usually use balanced F1 measure
▪ i.e., with  = 1 (that is,  = ½): F = 2PR/(P+R)
Precision, Recall & F1Score
Text Classification: Evaluation
Sec.14.5

▪ Dealing with any-of or multivalue classification

▪ A document can belong to 0, 1, or >1 classes.

▪ For each class c∈C

▪ Build a classifier γc to distinguish c from all other classes c’ ∈C

▪ Given test doc d,

▪ Evaluate it for membership in each class using each γc
▪ d belongs to any class for which γc returns true

167
Sec.14.5

▪ One-of or multinomial classification

▪ Classes are mutually exclusive: each document in exactly one class

▪ For each class c∈C

▪ Build a classifier γc to distinguish c from all other classes c’ ∈C

▪ Given test doc d,

▪ Evaluate it for membership in each class using each γc
▪ d belongs to the one class with maximum score

168
Sec. 15.2.4

▪ Most (over)used data set, 21,578 docs (each 90 types, 200 toknens)
▪ 9603 training, 3299 test articles (ModApte/Lewis split)
▪ 118 categories
▪ An article can be in more than one category
▪ Learn 118 binary category distinctions

▪ Average document (with at least one category) has 1.24 classes

▪ Only about 10 out of 118 categories are large
• Earn (2877, 1087) • Trade (369,119)
• Acquisitions (1650, 179) • Interest (347, 131)
Common categories • Ship (197, 89)
• Money-fx (538, 179)
(#train, #test) • Grain (433, 149) • Wheat (212, 71) 169
• Crude (389, 189) • Corn (182, 56)
Sec. 15.2.4

<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="12981"

NEWID="798">
<DATE> 2-MAR-1987 16:51:43.42</DATE>
<TOPICS><D>livestock</D><D>hog</D></TOPICS>
<TITLE>AMERICAN PORK CONGRESS KICKS OFF TOMORROW</TITLE>
<DATELINE> CHICAGO, March 2 - </DATELINE><BODY>The American Pork Congress kicks off tomorrow,
March 3, in Indianapolis with 160 of the nations pork producers from 44 member states determining industry positions
on a number of issues, according to the National Pork Producers Council, NPPC.
Delegates to the three day Congress will be considering 26 resolutions concerning various issues, including the future
direction of farm policy and the tax law as it applies to the agriculture sector. The delegates will also debate whether to
endorse concepts of a national PRV (pseudorabies virus) control and eradication program, the NPPC said.
A large trade show, in conjunction with the congress, will feature the latest in technology in all areas of the industry,
the NPPC added. Reuter
170
</BODY></TEXT></REUTERS>
▪ For each pair of classes <c1,c2> how many documents from c1 were incorrectly assigned to c2?
▪ c3,2: 90 wheat documents incorrectly assigned to poultry

Docs in test Assign Assign Assign Assigne Assign Assigne

set ed ed ed d coffee ed d trade
UK poultry wheat interes
t
True UK 95 1 13 0 1 0
True poultry 0 1 0 0 0 0
True wheat 10 90 0 1 0 0
True coffee 0 0 0 34 3 7
True interest - 1 2 13 26 5
True trade 0 0 2 14 5 10
171
Sec. 15.2.4

Recall: cii
Fraction of docs in class i classified correctly: å cij
j

Precision: cii
Fraction of docs assigned class i that are å c ji
actually about class i: j

å cii
i
Accuracy: (1 - error rate)
åå cij
Fraction of docs classified correctly: j i
172
Sec. 15.2.4

▪ If we have more than one class, how do we

combine multiple performance measures into
one quantity?
▪Macroaveraging: Compute performance for each
class, then average.
▪Microaveraging: Collect decisions for all classes,
compute contingency table, evaluate.
173
Sec. 15.2.4

Class 1 Class 2 Micro Ave. Table

Truth: Truth: Truth: Truth: Truth: Truth:
yes no yes no yes no
Classifier: yes 10 10 Classifier: yes 90 10 Classifier: yes 100 20

Classifier: no 10 970 Classifier: no 10 890 Classifier: no 20 1860

• Macroaveraged precision: (0.5 + 0.9)/2 = 0.7

• Microaveraged precision: 100/120 = .83
• Microaveraged score is dominated by score on common classes
174
Training set Development Test Set Test Set

▪ Metric: P/R/F1 or Accuracy

▪ Unseen test set
▪ avoid overfitting (‘tuning to the test set’) Training Set Dev Test
▪ more conservative estimate of performance

▪ Cross-validation over multiple splits Training Set Dev Test

▪ Handle sampling errors from different datasets
▪ Pool results over each split Dev Test Training Set
▪ Compute pooled dev set performance

Test Set
Text Classification: Evaluation
Text Classification: Practical Issues
Sec. 15.3.1

▪ Gee, I’m building a text classifier for real, now!

▪ What should I do?

178
Sec. 15.3.1

If (wheat or grain) and not (whole or bread) then

Categorize as grain

▪Need careful crafting

▪ Human tuning on development data
▪ Time-consuming: 2 days per class

179
Sec. 15.3.1

▪Use Naïve Bayes

▪ Naïve Bayes is a “high-bias” algorithm (Ng and Jordan 2002 NIPS)
▪Get more labeled data
▪ Find clever ways to get humans to label data for you
▪Try semi-supervised training methods:
▪ Bootstrapping, EM over unlabeled documents, …

180
Sec. 15.3.1

▪Perfect for all the clever classifiers

▪ SVM
▪ Regularized Logistic Regression
▪You can even use user-interpretable decision
trees
▪ Users like to hack
▪ Management likes quick fixes

181
Sec. 15.3.1

▪Can achieve high accuracy!

▪At a cost:
▪ SVMs (train time) or kNN (test time) can be too slow
▪ Regularized logistic regression can be somewhat better
▪So Naïve Bayes can come back into its own again!

182
Sec. 15.3.1

▪With enough data

▪ Classifier may not matter

183
Brill and Banko on spelling correction
▪ Automatic classification
▪ Manual review of uncertain/difficult/"new” cases

184
▪ Multiplying lots of probabilities can result in floating-point underflow.
▪ Since log(xy) = log(x) + log(y)
▪ Better to sum logs of probabilities instead of multiplying probabilities.
▪ Class with highest un-normalized log probability score is still most
probable.
cNB = argmax log P(c j ) +
c j ÎC
å log P(xi | c j )
iÎ positions

▪ Model is now just max of sum of weights

Sec. 15.3.2

▪ Domain-specific features and weights: very important in real

performance
▪ Sometimes need to collapse terms:
▪ Part numbers, chemical formulas, …
▪ But stemming generally doesn’t help

▪ Upweighting: Counting a word as if it occurred twice:

▪ title words (Cohen & Singer 1996)
▪ first sentence of each paragraph (Murata, 1999)
▪ In sentences that contain title words (Ko et al, 2002)

186
Text Classification: Practical Issues

भारतीय कला एवं संस्कृति
No ratings yet
भारतीय कला एवं संस्कृति
265 pages
Isc English Paper-1
No ratings yet
Isc English Paper-1
35 pages
First Conditional: Real Possibility: IF Condition Result Present Simple WILL + Base Verb
100% (3)
First Conditional: Real Possibility: IF Condition Result Present Simple WILL + Base Verb
13 pages
Calculating Minimum Edit Distance
0% (1)
Calculating Minimum Edit Distance
52 pages
RU CARRIER 69NT40-541-301-314-328 SPLST
No ratings yet
RU CARRIER 69NT40-541-301-314-328 SPLST
93 pages
Defini'on of Minimum Edit Distance
No ratings yet
Defini'on of Minimum Edit Distance
52 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
Chapter 6 Therapeutic Communication
100% (1)
Chapter 6 Therapeutic Communication
2 pages
Edit Dist
No ratings yet
Edit Dist
24 pages
03 Med
No ratings yet
03 Med
52 pages
Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)
No ratings yet
Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)
13 pages
String Edit PDF
No ratings yet
String Edit PDF
39 pages
03 Med
No ratings yet
03 Med
35 pages
EditDistance
No ratings yet
EditDistance
28 pages
Edit Dist
No ratings yet
Edit Dist
35 pages
2 EditDistance 2023
No ratings yet
2 EditDistance 2023
35 pages
06DynamicProgrammingII 2x2
No ratings yet
06DynamicProgrammingII 2x2
17 pages
Lecture # 15 - New
No ratings yet
Lecture # 15 - New
70 pages
Lec 6
No ratings yet
Lec 6
19 pages
B505 Lec.10 DynamicProgramming 1
No ratings yet
B505 Lec.10 DynamicProgramming 1
19 pages
Week 2
No ratings yet
Week 2
222 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Properties of Verbs
No ratings yet
Properties of Verbs
2 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
Compu'ng Minimum Edit Distance
No ratings yet
Compu'ng Minimum Edit Distance
8 pages
2 EditDistance 2022
No ratings yet
2 EditDistance 2022
37 pages
Speech Recognition: Lecture 11: Advanced Topics
No ratings yet
Speech Recognition: Lecture 11: Advanced Topics
35 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
Approximate Matching
No ratings yet
Approximate Matching
16 pages
Final Exam Fall 23
No ratings yet
Final Exam Fall 23
10 pages
Prep PDF
No ratings yet
Prep PDF
1 page
Place of Articulation
No ratings yet
Place of Articulation
10 pages
L3 Edit Distance
No ratings yet
L3 Edit Distance
23 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
CS253 Report 3 Wilhelm Aaron
No ratings yet
CS253 Report 3 Wilhelm Aaron
35 pages
q1 Answer
No ratings yet
q1 Answer
2 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
Edit Distance
No ratings yet
Edit Distance
19 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
Lec10 12 Edit Distance
No ratings yet
Lec10 12 Edit Distance
54 pages
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
No ratings yet
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
18 pages
05 Minimum Edit Distance in Computational Biology 9-29
No ratings yet
05 Minimum Edit Distance in Computational Biology 9-29
4 pages
01 Defining Minimum Edit Distance 7-04
No ratings yet
01 Defining Minimum Edit Distance 7-04
3 pages
String Matching
No ratings yet
String Matching
66 pages
05 Dynamic Programming I I
No ratings yet
05 Dynamic Programming I I
64 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
04 Weighted Minimum Edit Distance 2-47
No ratings yet
04 Weighted Minimum Edit Distance 2-47
2 pages
Multimedia Application L3
No ratings yet
Multimedia Application L3
49 pages
Levenshtein
No ratings yet
Levenshtein
14 pages
Lecture1 2
No ratings yet
Lecture1 2
44 pages
Lecture 24
No ratings yet
Lecture 24
10 pages
DP and Edit Dist
No ratings yet
DP and Edit Dist
30 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Dynamic Programming and Single Word Recognizers (Part 1)
No ratings yet
Dynamic Programming and Single Word Recognizers (Part 1)
25 pages
Dynamic Programming - 2
No ratings yet
Dynamic Programming - 2
24 pages
ADE 16-17 Sol
No ratings yet
ADE 16-17 Sol
13 pages
Dynamic Programming Algorithms: Last Updated: 10/18/2023
No ratings yet
Dynamic Programming Algorithms: Last Updated: 10/18/2023
38 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
03 Text Processing - Minimum Edit Distance
No ratings yet
03 Text Processing - Minimum Edit Distance
41 pages
String Matching
No ratings yet
String Matching
116 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
03 Text Processing - Minimum Edit Distance
No ratings yet
03 Text Processing - Minimum Edit Distance
41 pages
JP Lesson Plan
No ratings yet
JP Lesson Plan
8 pages
Milieu Communication Training For Late Talkers
No ratings yet
Milieu Communication Training For Late Talkers
7 pages
Lekl 122
100% (1)
Lekl 122
13 pages
Unit 11 Relative Clauses
No ratings yet
Unit 11 Relative Clauses
6 pages
Orwell Politics and The English Language Thesis Statement
100% (3)
Orwell Politics and The English Language Thesis Statement
7 pages
Below Is An Example of A Table Called "Persons":: Structured Query Language (SQL)
No ratings yet
Below Is An Example of A Table Called "Persons":: Structured Query Language (SQL)
7 pages
Bahasa Inggris BS KLS VIII 5
No ratings yet
Bahasa Inggris BS KLS VIII 5
65 pages
Relationship Matchmaker - Google Forms
No ratings yet
Relationship Matchmaker - Google Forms
1,332 pages
Result Class X-15!5!2023
No ratings yet
Result Class X-15!5!2023
3 pages
Introduction To Legal English
No ratings yet
Introduction To Legal English
53 pages
Benin 1
No ratings yet
Benin 1
39 pages
Daily Use Vocabulary English To Urdu Ebook PDF
No ratings yet
Daily Use Vocabulary English To Urdu Ebook PDF
13 pages
Cyberbullying Detection Based On Semantic-Enhanced Marginalized Denoising Auto-Encoder PDF
No ratings yet
Cyberbullying Detection Based On Semantic-Enhanced Marginalized Denoising Auto-Encoder PDF
12 pages
Banglore Exporters
No ratings yet
Banglore Exporters
35 pages
6 Đề thi học kì 2 tiếng anh 9
No ratings yet
6 Đề thi học kì 2 tiếng anh 9
21 pages
On Going EAPP Exam 2022-2023
No ratings yet
On Going EAPP Exam 2022-2023
7 pages
For and Against Essay
No ratings yet
For and Against Essay
2 pages
Actividad 1 Guia 2 Ingles 8-03
No ratings yet
Actividad 1 Guia 2 Ingles 8-03
4 pages
Lesson Plan No 4
No ratings yet
Lesson Plan No 4
1 page
Satish Mahajan
No ratings yet
Satish Mahajan
3 pages
Reading ELA
No ratings yet
Reading ELA
4 pages
Literary Analysis Rubric
No ratings yet
Literary Analysis Rubric
1 page
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet