0% found this document useful (0 votes)
10 views85 pages

Spelling Correction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views85 pages

Spelling Correction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

SPELLING CORRECTION

SPELLING CORRECTION
HOW SIMILAR ARE TWO STRINGS??
• The idea comes from machine translation, information extraction and
speech recognition.
EDIT DISTANCE
• How many operations are needed to from one string to another
string?
MINIMUM EDIT DISTANCE
MINIMUM EDIT DISTANCE
ALIGNMENT IN COMPUTATIONAL BIOLOGY
OTHER USES OF EDIT DISTANCE IN NLP
HOW TO FIND THE MINIMUM EDIT DISTANCE?
MINIMUM EDIT AS SEARCH
DEFINING MINIMUM EDIT DISTANCE MATRIX
COMPUTING MINIMUM EDIT DISTANCE
DYNAMIC PROGRAMMING ALGORITHM
FUNCTION
EDIT DISTANCE TABLE
EDIT DISTANCE TABLE
MINIMUM EDIT WITH BACKTRACE
ADDING BACKTRACE TO MINIMUM EDIT
THE DISTANCE MATRIX
PERFORMANCE
WEIGHTED EDIT DISTANCE
CONFUSION MATRIX FOR EDIT DISTANCE
KEYBOARD DESIGN
WEIGHTED MINIMUM EDIT DISTANCE
HOW TO MODIFY THE ALGO WITH
TRANSPOSE
HOW TO FIND DICTIONARY ENTRIES WITH
SMALLEST EDIT DISTANCE
HOW TO FIND DICTIONARY ENTRIES WITH
SMALLER EDIT DISTANCE
HOW TO FIND DICTIONARY ENTRIES WITH
SMALLER EDIT DISTANCE
SPELLING CORRECTION
NON-WORD SPELLING ERRORS
NOISY CHANNEL
NOISY CHANNEL
NON-WORD SPELLING ERROR- “acress”
WORDS WITHIN EDIT DISTANCE 1 OF acress
CANDIDATE GENERATION
COMPUTING ERROR PROBABILITY:
CONFUSION MATRIX
CHANNEL MODEL
CHANNEL MODEL FOR ACRESS
NOISY CHANNEL PROBABILITIES FOR ACRESS
USING A BIGRAM MODEL
REAL-WORD SPELLING ERROR
NOISY CHANNEL FOR REAL-WORD SPELLING
ERRORS
NOISY CHANNEL FOR REAL-WORD SPELLING
ERRORS
SIMPLIFICATION: ONE ERROR PER SENTENCE
GETTING THE PROBABILTI VALUES
PROBABILITY OF NO ERROR
COMPUTING P(W)
CONTEXT SENSITIVE SPELLING CORRECTION
PROBABLISTIC LANGUAGE MODEL:
APPLICATIONS
COMPLETION PREDICTION
PROBABLISTIC LANGUAGE MODELING
COMPUTING P(W)
THE CHAIN RULE
PROBABILITY OF WORDS IN SENTENCE
ESTIMATING THESE PROBABILITY VALUES
MARKOV ASSUMPTION
MARKOV ASSUMPTION
N-GRAM MODELS
N-GRAM MODELS
ESTIMATING N-GRAM PROBABILITIES
AN EXAMPLE
BIGRAM COUNT FROM 9222 RESTAURANT
SENTENCES
COMPUTING BIGRAM PROBABILITIES
COMPUTING SENTENCE PROBABILITIES
WHAT LANGUAGE DOES N-GRAM REPRESENT
PRACTICAL ISSUES
LANGUAGE MODELLING TOOLKIT
GOOGLE N-GRAMS
EXAMPLE FROM THE 4-GRAM DATA
GOOGLE BOOKS N-GRAM DATA
EVALUATING LANGUAGE MODEL
EXTRINSIC EVALUATION OF LANGUAGE
MODELS
INTRINSIC EVALUATION
PERPLEXITY
EXAMPLE: A SIMPLE SCENARIO
LOWER PERPLEXITY: BETTER MODEL
SHANNON VISUALIZATION METHOD
SHAKESPEARE AS CORPUS
APPROXIMATING SHAKESPEARE
PROBLEM WITH MLE ESTIMATES
LANGUAGE MODELING: SMOOTHING
LAPLACE SMOOTHING (ADD-ONE
ESTIMATION)
RECONSTITUTED COUNTS AS EFFECT OF
SMOOTHING
COMPARING WITH BIGRAMS: RESTAURANT
CORPUS
MORE GENERAL FORMULATIONS: ADD-K

You might also like