0% found this document useful (0 votes)

80 views39 pages

Web Search Engine

This document proposes a method to measure semantic similarity between words using information from web search engines. Page counts and snippets are used as features to train an SVM classifier to detect synonymous and non-synonymous word pairs. Experiments on benchmark datasets show the method outperforms baselines and other web-based similarity measures.

Uploaded by

Ashok Oruganti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views39 pages

Web Search Engine

Uploaded by

Ashok Oruganti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

A WEB SEARCH ENGINE-BASED APPROACH TO MEASURE SEMANTIC SIMILARITY BETWEEN WORDS

AGENDA
1 INTRODUCTION 2 RELATED WORK 3 METHOD 4 EXPERIMENTS 5 CONCLUSION

INTRODUCTION
Accurately measuring the semantic similarity between words is an important problem in web mining, information retrieval, and natural language processing Semantically related words of a particular word are listed in manually created general-purpose lexical ontologies such as WordNet

INTRODUCTION
We propose an automatic method to estimate the semantic similarity between words or entities using web search engines.

Page counts and snippets are two useful information sources provided by most web search engines.

INTRODUCTION
Page count of a query is an estimate of the number of pages that contain the query words.

Snippets, a brief window of text extracted by a search engine around the query term in a document

REINTRODUCTIONLATED WORK
Outline Resnik [8] proposed a similarity measure using information content. Li et al. [9] combined structural semantic information from a lexical taxonomy and information content from a corpus in a nonlinear model Cilibrasi and Vitanyi [12] proposed a distance metric between words using only page counts retrieved from a web search engine. Sahami and Heilman [2] measured semantic similarity between two queries using snippets returned for those queries by a search engine

INTRODUCTION
Sahami and Heilman [2] measured semantic similarity between two queries using snippets returned for those queries by a search engine Chen et al. [4] proposed a double-checking model using text snippets returned by a web search engine to compute semantic similarity between words In query expansion [18], a user query is modified using synonymous words to improve the relevancy of the search.

METHOD
Given two words P and Q Sim(P,Q) If P and Q are highly similar
=>sim(P,Q) -> 1

if P and Q are not semantically similar

=>sim(P,Q) -> 0

METHOD

METHOD
Page Count-Based Co-Occurrence Measures car AND automobile
the same is 11,300,000

car AND apple

the same is 49,000,000

METHOD
four popular co-occurrence measures Jaccard, Overlap (Simpson), Dice, and Pointwise mutual information (PMI), we use the notation H(P) to denote the page counts for the query P in a search engine.

METHOD

METHOD
Lexical Pattern Extraction

METHOD
The parameters LThe maximum length of a subsequence is L words gdo not skip more than g number of words consecutively Gthe total number of words skipped in a subsequence should not exceed G Twe count the frequency of all generated subsequences and only use subsequences that occur more than T times as lexical patterns.

METHOD
Lexical Pattern Clustering Typically, a semantic relation can be expressed using more than one pattern. X is a Y, and X is a large Y.
is-a relation between X and Y

designate a, the word-pair frequency vector of pattern a

METHOD

METHOD
Measuring Semantic Similarity A pair of words (P,Q) (N + 4)-dimensional feature vector fPQ. (N + 1)st, (N + 2)nd, (N + 3)rd, and (N + 4)th features are set, respectively, to WebJaccard, WebOverlap, WebDice, and WebPMI
N+1 N+2 N+3 N+4

fPQ
N cluster of lexical pattern

METHOD
we assign a weight wij to a pattern ai that is in a cluster cj as follows:

(a) is the total frequency of a pattern a in all word pairs

METHOD
Finally, we compute the value of the jth feature in the feature vector for a word pair (P , Q) as follows:

METHOD
To train a two-class SVM to detect synonymous and nonsynonymous word pairs S={(Pk , Qk , yk)}

METHOD
Training We randomly select 3,000 nouns from WordNet, and extract a pair of synonymous words from a synset of each selected noun Extrac nonsynonymous word pairs
random shuffling technique

(A,B)(C,D) => (A,C)(B,D)

METHOD

METHOD
We determine the clustering threshold as follows: W denote the set of synonymous word pairs fW be the centroid vector of all feature vectors representing synonymous word pairs

METHOD
Next, we compute the average Mahalanobis distance, D()

Mahalanobis distance

METHOD
Finally, we set the optimum value of clustering threshold index 0 1
X1 3 X2 5

2 X3 1

D() as average similarity, and minimize this quantity

METHOD

EXPERIMENTS
Benchmark Data Sets
Miller-Charles (MC: 28 pairs, 38 annotators) - Pearson Rubenstein- Goodenough (RG 65 pairs, 36 annotators) - Spearman WordSimilarity(WS: 353 pairs, 13 annotators) - Spearman

EXPERIMENTS
Semantic Similarity

EXPERIMENTS

similarity measures

EXPERIMENTS

human ratings

EXPERIMENTS

EXPERIMENTS
Community Mining We select 50 personal names from five communities: tennis players, golfers, actors, politicians, and scientists from the open directory project (DMOZ) Correlation, CorreT

EXPERIMENTS
We compute precision, recall, and F-score each person p the cluster that p belongs to by C(p) A(p) to denote the affiliation of person p
e.g., A(Tiger Woods)=Tennis Player.

EXPERIMENTS
,the F-score of person p is defined as

Overall precision, recall, and F-score

EXPERIMENTS

CONCLUSION
We proposed a semantic similarity measure using both page counts and snippets retrieved from a web search engine for two words. Both page counts-based co-occurrence measures and lexical pattern clusters were used to define features for a word pair Experimental results on three benchmark data sets showed that the proposed method outperforms various baselines as well as previously proposed web-based semantic similarity measures

Thank you for your listening

Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Cad-Cam Modeling PDF
100% (2)
Cad-Cam Modeling PDF
390 pages
Algebra and Equations
No ratings yet
Algebra and Equations
36 pages
MCQ
100% (1)
MCQ
5 pages
Quadratic Formula PROOF
100% (1)
Quadratic Formula PROOF
1 page
Module 3 - Theory of Production, Cost and Revenue
100% (1)
Module 3 - Theory of Production, Cost and Revenue
11 pages
X++ Coding Standards
100% (1)
X++ Coding Standards
53 pages
Shankara Digvijaya With Commentary (Sanskrit)
100% (2)
Shankara Digvijaya With Commentary (Sanskrit)
624 pages
AX Technical Q&A
No ratings yet
AX Technical Q&A
41 pages
Third Term Exam-Wps Office-5
No ratings yet
Third Term Exam-Wps Office-5
5 pages
Data Preprocessing in Data Mining PDF
100% (3)
Data Preprocessing in Data Mining PDF
327 pages
MAT1100 Integral Calculus I - 2020
No ratings yet
MAT1100 Integral Calculus I - 2020
6 pages
PAF 2022 Woven
No ratings yet
PAF 2022 Woven
100 pages
Cross-Cutting Models of Distributional Lexical Semantics
No ratings yet
Cross-Cutting Models of Distributional Lexical Semantics
53 pages
Automatic Meaning Discovery Using Google
No ratings yet
Automatic Meaning Discovery Using Google
31 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
Technical Report: Learning Compound Noun Semantics
No ratings yet
Technical Report: Learning Compound Noun Semantics
167 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
b732 PDF
No ratings yet
b732 PDF
34 pages
Ldap
No ratings yet
Ldap
47 pages
A Comparative Review of 3D Container Loading Algorithms
No ratings yet
A Comparative Review of 3D Container Loading Algorithms
34 pages
Non Numeric Clustering Seminar
No ratings yet
Non Numeric Clustering Seminar
26 pages
Wordnet and Semantick Similarity
No ratings yet
Wordnet and Semantick Similarity
35 pages
Sentence Similarity Based On Semantic Networks
No ratings yet
Sentence Similarity Based On Semantic Networks
36 pages
An Improved Model of Semantic Similarity Based On Lexical. Rohde, Gonnerman, Plaut
No ratings yet
An Improved Model of Semantic Similarity Based On Lexical. Rohde, Gonnerman, Plaut
33 pages
Lecture12 - Word RepEmb
No ratings yet
Lecture12 - Word RepEmb
28 pages
Word-Level Neutrosophic Sentiment Similarity
No ratings yet
Word-Level Neutrosophic Sentiment Similarity
36 pages
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
No ratings yet
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
29 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Evolution of Semantic Similarity - A Survey
No ratings yet
Evolution of Semantic Similarity - A Survey
35 pages
Messaging in Ax
No ratings yet
Messaging in Ax
23 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Semantic Relatedness Applied To All Words Sense Disambiguation
No ratings yet
Semantic Relatedness Applied To All Words Sense Disambiguation
72 pages
Chapter 3 - Operators in C++
No ratings yet
Chapter 3 - Operators in C++
20 pages
Manual Tambahan Geogebra
No ratings yet
Manual Tambahan Geogebra
21 pages
NLP - Experiment - 8 - A10
No ratings yet
NLP - Experiment - 8 - A10
16 pages
Similarity Metric
No ratings yet
Similarity Metric
13 pages
Admin, 4015
No ratings yet
Admin, 4015
19 pages
On Maximal Paths and Circuits Erods Gallai
No ratings yet
On Maximal Paths and Circuits Erods Gallai
20 pages
Feature-Based Approaches To Semantic Similarity Assessment of Concepts Using Wikipedia
No ratings yet
Feature-Based Approaches To Semantic Similarity Assessment of Concepts Using Wikipedia
18 pages
Tkde 2014 26 7
No ratings yet
Tkde 2014 26 7
17 pages
Operating-System Structures
No ratings yet
Operating-System Structures
17 pages
CP Lab Programs Weekwise
No ratings yet
CP Lab Programs Weekwise
12 pages
Ed Ef: Design of Base Plate & Anchor Bolt: BP1, BP2, BP3, BP4, BP5, BP6, BP7, BP8, BP9 B
No ratings yet
Ed Ef: Design of Base Plate & Anchor Bolt: BP1, BP2, BP3, BP4, BP5, BP6, BP7, BP8, BP9 B
9 pages
M S S W: A S: Easurement of Emantic Imilarity Between Ords Urvey
No ratings yet
M S S W: A S: Easurement of Emantic Imilarity Between Ords Urvey
10 pages
Task Intermediate
No ratings yet
Task Intermediate
15 pages
8-Measuring Text Similarity Based On Structure and Word Embedding
No ratings yet
8-Measuring Text Similarity Based On Structure and Word Embedding
20 pages
Difficulties Observed When Implementing Total Productive Maintenance TPM Empirical Evidences From The Manufacturing Sect
No ratings yet
Difficulties Observed When Implementing Total Productive Maintenance TPM Empirical Evidences From The Manufacturing Sect
15 pages
Acta Psychologica: Mark Steyvers
No ratings yet
Acta Psychologica: Mark Steyvers
10 pages
10.1016 J.ipm.2012.05.007 Improving Relational Similarity Measurement Using Symmetries in Proportional Word Analogies
No ratings yet
10.1016 J.ipm.2012.05.007 Improving Relational Similarity Measurement Using Symmetries in Proportional Word Analogies
15 pages
Combining Lexical and Semantic Features For Short Text Classification
No ratings yet
Combining Lexical and Semantic Features For Short Text Classification
9 pages
Semantic Similarity
No ratings yet
Semantic Similarity
14 pages
CP Bits
No ratings yet
CP Bits
14 pages
Document Similarity From Vector Space Densities
No ratings yet
Document Similarity From Vector Space Densities
12 pages
Measuring Semantic Similarity Between Words Using Web Search Engines
No ratings yet
Measuring Semantic Similarity Between Words Using Web Search Engines
10 pages
UMA Literature Survey
No ratings yet
UMA Literature Survey
11 pages
HEIDENHAINAccuracy of Feed Axes 349 843-20
No ratings yet
HEIDENHAINAccuracy of Feed Axes 349 843-20
12 pages
Format Synopsis DP
No ratings yet
Format Synopsis DP
12 pages
Hermes 2011
No ratings yet
Hermes 2011
11 pages
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
No ratings yet
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
11 pages
Measuring Semantic Similarity Between Words and Improving Word Similarity by Augumenting PMI
No ratings yet
Measuring Semantic Similarity Between Words and Improving Word Similarity by Augumenting PMI
5 pages
A Review of Semantic Similarity Measures in WordNet
No ratings yet
A Review of Semantic Similarity Measures in WordNet
12 pages
A Survey On Semantic Similarity Measures
No ratings yet
A Survey On Semantic Similarity Measures
5 pages
(Aletras and Stevenson, 2013) Topic Coherence
No ratings yet
(Aletras and Stevenson, 2013) Topic Coherence
9 pages
Measuring Semantic Similarity Between Words Using Web Search Engines
No ratings yet
Measuring Semantic Similarity Between Words Using Web Search Engines
10 pages
Effect of Friction Coefficient On Finite Element Modeling of The Deep - Cold Rolling Process
No ratings yet
Effect of Friction Coefficient On Finite Element Modeling of The Deep - Cold Rolling Process
5 pages
A New Similarity Measure For An Ontology Matching System
No ratings yet
A New Similarity Measure For An Ontology Matching System
16 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
Roller Coaster SImulation
No ratings yet
Roller Coaster SImulation
75 pages
A Web Search Engine-Based Approach To Measure Semantic Similarity Between Words
No ratings yet
A Web Search Engine-Based Approach To Measure Semantic Similarity Between Words
14 pages
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
No ratings yet
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
9 pages
A Web-Based Kernel Function For Measuring The Similarity of Short Text Snippets
No ratings yet
A Web-Based Kernel Function For Measuring The Similarity of Short Text Snippets
10 pages
Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space
No ratings yet
Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space
8 pages
Evaluating of Efficacy Semantic Similarity Methods
No ratings yet
Evaluating of Efficacy Semantic Similarity Methods
8 pages
Class Test On Linear Motion.
No ratings yet
Class Test On Linear Motion.
7 pages
AAAI06-123 (Revisar para Referencias)
No ratings yet
AAAI06-123 (Revisar para Referencias)
6 pages
Ambiguous Synonyms Implementing An Unsup
No ratings yet
Ambiguous Synonyms Implementing An Unsup
40 pages
SIMULATION MODEL of Permanent Magnet Synchronous Motor
No ratings yet
SIMULATION MODEL of Permanent Magnet Synchronous Motor
9 pages
A Modified Approach To Keyword Extraction Based On Word-Similarity
No ratings yet
A Modified Approach To Keyword Extraction Based On Word-Similarity
5 pages
Math First Quarter Module
No ratings yet
Math First Quarter Module
4 pages
Partial Correlation Intro 1
No ratings yet
Partial Correlation Intro 1
5 pages
A Distributional Similarity Approach To The Detection of Semantic Change in The Google Books Ngram Corpus
No ratings yet
A Distributional Similarity Approach To The Detection of Semantic Change in The Google Books Ngram Corpus
5 pages
Measuring Similarity Between Question Pair in Online Forums: 1 Pramod Kumar Rai 2 Kunal Chakma
No ratings yet
Measuring Similarity Between Question Pair in Online Forums: 1 Pramod Kumar Rai 2 Kunal Chakma
5 pages
Holographic Data Storage
No ratings yet
Holographic Data Storage
8 pages
Testbank For Precalculus 11th Edition Larson
No ratings yet
Testbank For Precalculus 11th Edition Larson
17 pages
PSO11
No ratings yet
PSO11
5 pages
Functional Programming in Python Syllabus
No ratings yet
Functional Programming in Python Syllabus
3 pages
Elias Iosif, Athanasios Tegos, Apostolos Pangos, Eric Fosler-Lussier, Alexandros Potamianos
No ratings yet
Elias Iosif, Athanasios Tegos, Apostolos Pangos, Eric Fosler-Lussier, Alexandros Potamianos
4 pages
Volume 2 Issue 6 2016 2020
No ratings yet
Volume 2 Issue 6 2016 2020
5 pages
WT Exp5
No ratings yet
WT Exp5
3 pages
Pivot HH LL Imp
No ratings yet
Pivot HH LL Imp
3 pages
2) Aim: A HTML Program To Validate The Registration Page. Program
No ratings yet
2) Aim: A HTML Program To Validate The Registration Page. Program
4 pages
A Web Search Engine
No ratings yet
A Web Search Engine
3 pages
Similarity Measure Based On Edge Counting Using Ontology: Vadivu Ganesan, Rajendran Swaminathan M.Thenmozhi
No ratings yet
Similarity Measure Based On Edge Counting Using Ontology: Vadivu Ganesan, Rajendran Swaminathan M.Thenmozhi
5 pages
CH310
No ratings yet
CH310
2 pages
Comments On "Robust Stabilization of A Class of Time-Delay Nonlinear Systems"
No ratings yet
Comments On "Robust Stabilization of A Class of Time-Delay Nonlinear Systems"
1 page
Nagaraju
No ratings yet
Nagaraju
2 pages
Survey
No ratings yet
Survey
3 pages
Simulation for Data Science with R
From Everand
Simulation for Data Science with R
Matthias Templ
No ratings yet
A Measurement Framework for Software Projects: A Generic and Practical Goal-Question-Metric(Gqm) Based Approach.
From Everand
A Measurement Framework for Software Projects: A Generic and Practical Goal-Question-Metric(Gqm) Based Approach.
Prashanth Harish Southekal
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet