0% found this document useful (0 votes)
11 views17 pages

Coals

The document discusses COALS, a method for mapping semantic similarity between words using corpus-based semantics and co-occurrence counts while addressing the limitations of previous models like HAL. It emphasizes the importance of removing high-frequency closed-class words to achieve more accurate word vector representations and introduces concepts of first-order and second-order associations. The COALS method also employs a normalization strategy to enhance the correlation measures between word pairs, ultimately creating a semantic space for analysis.

Uploaded by

ramaseshan.nlp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Coals

The document discusses COALS, a method for mapping semantic similarity between words using corpus-based semantics and co-occurrence counts while addressing the limitations of previous models like HAL. It emphasizes the importance of removing high-frequency closed-class words to achieve more accurate word vector representations and introduces concepts of first-order and second-order associations. The COALS method also employs a normalization strategy to enhance the correlation measures between word pairs, ultimately creating a semantic space for analysis.

Uploaded by

ramaseshan.nlp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction to COALS

Map the similarity or dissimilarity between two objects into a cartesian coordinate space
A symmetrical matrix representation of pair-wise distances
The distance from an object to itself is 0

The values (stress) are computed to be within 
, where  is a monotonic function

Ramaseshan Ramachandran
HAL - Summary

Captures word meanings through the unsupervised analysis of text


Produces word vectors that are semantic (similar words) and
associative in nature
Acquires word meanings as a function of keeping track of how words
are used in context
Carries the history of the contextual experience by using a moving
window and weighting of co-occurring words based on the distance
Exploits the regularities of language such that conceptual
generalisations can be captured in a data matrix
HAL - Disadvantages

Uses raw frequency counts


Produces sparse vectors of length  where 
High frequency words (mostly from closed-class words) influence the
accuracy of similarity
The context words such as the, an, in of, etc. appearing next to
open-class words (mostly nouns) produce inconsistent similarity in
terms of semantics
Eliminate most closed-class words to remove the effects of high
frequency words
COALS - Goal
The closed-class words include
Create DSM to find the meaning of the word using Corpus-
pronouns, determiners,
based semantics
conjunctions and prepositions
Develop a DSM that mimics human judgments about the
similarity of word pairs
Compute high-dimensional vectors
To learn word meanings from the patterns of word co- Open-class words include
occurrence nouns, verbs adverbs and
adjectives.
Remove the effect of high frequency closed-class words
over open-class words
Achieve a consistent results in computing word vectors
Introduce the concept of predictive relationship by
computing the correlation between any two random words
Assumption for DSMs
Pairs of words in common contexts are semantically related
First order association
If a word  occurred in several contexts along with , then  are related by the first-order
association.  are called as first-order associates
For any pair of words, , the strength of similarity is stronger, if they have a large number
of common first-order associates
Second Order Association
If a word  occurred in several contexts along with  in which  is absent, then  are related by
Indivisible
the second-order association.  are called as second-order associates
basic units of
a language

Where  is transitive and symmetric
The meaning of a word or morpheme is defined by the set of conditional
probabilities of its occurrence in context with all other morphemes
Semantically similar words will will share similar contextual distribution
Assumptions DSMs

Definition of context
All words within a window or ideally within a sentence
All content words within a window or sentence that fall in a certain
frequency range
All content words which stand in closest proximity to the word in
question in the grammatical schema of each window or sentence
CORRELATED OCCURRENCE ANALOGUE TO
LEXICAL SEMANTICS MODEL - COALS 1

Gather co-occurrence counts, typically ignoring closed-class neighbors and using a ramped, size
4 window
Discard all but the m (14,000, in this case) columns reflecting the most common open-class
words.
Convert counts to word pair correlations - Instead of using the raw frequency score, correlation
score is used to analyze the relationship between pair of words
Set negative values to 0, and take square roots of positive ones.
The semantic similarity between two words is given by the correlation of their vectors.
The correlation coefficient values with this normalisation will be in the range of [-1,1]
The matrix constructed using this correlation would be semantic space
COALS method employs a normalisation strategy that largely factors out lexical frequency.
Columns
1
Rhode representing
et al, ”An Improvedlow-frequency words Based
Model of Semantic Similarity are removed
on Lexical Co-Occurrence”, CACM,
2006, 8, 627-633
COALS - Sample Corpus

How much wood would a woodchuck chuck, if a woodchuck could


chuck wood? As much wood as a woodchuck would,if a woodchuck
could chuck wood.
Table 7
Step 3 of the COALS method: Negative values discarded and the positive values square rooted.

woodch.
COALS - Step 1

would
chuck

much
could

wood
how
as

if
a

.
a 0 0 0.120 0.093 0 0.291 0 0 0.310 0.262 0.291 0
as 0 0.175 0 0 0 0 0.364 0.320 0 0 0 0 0.3
chuck 0.120 0 0 0.306 0 0.146 0 0.177 0.220 0 0 0.297 0.1
The initial co-occurrence
could 0.093 table
0 with
0.306 0 a ramped,
0 0.182 04-word window.
0.149 0.221 0 0 0.263 0.1
how 0 0 0 0 0 0 0.438 0.265 0 0.263 0 0
if 0.291 0 0.146 0.182 0 0 0 0 0.291 0.076 0.372 0
much 0 0.364 0 0 0.438 0 0 0.358 0 0.136 0 0 0.2
wood 0 0.320 0.177 0.149 0.265 0 0.358 0 0 0.034 0 0.333 0.3
woodch. 0.310 0 0.220 0.221 0 0.291 0 0 0 0.221 0.291 0
would 0.262 0 0 0 0.263 0.076 0.136 0.034 0.221 0 0.246 0
, 0.291 0 0 0 0 0.372 0 0 0.291 0.246 0 0
. 0 0 0.297 0.263 0 0 0 0.333 0 0 0 0
? 0 0.365 0.175 0.151 0 0 0.268 0.317 0 0 0 0

7
Table 7
Step 3 of the COALS method: Negative values discarded and the positive values square rooted.

COALS - Step 2

woodch.

would
chuck

much
could

wood
how
as

if
a

?
,

.
a 0 0 0.120 0.093 0 0.291 0 0 0.310 0.262 0.291 0 0
Raw counts are converted to correlations
as
chuck
0
0.120
0.175
0
0
0
0
0.306
0
0
0
0.146
0.364
0
0.320
0.177
0
0.220
0
0
0
0
0
0.297
0.365
0.175
could 0.093 0 0.306 0 0 0.182 0 0.149 0.221 0 0 0.263 0.151
how 0 0 0 0 0 0 0.438 0.265 0 0.263 0 0 0
if 0.291 0 0.146 0.182 0 0 0 0 0.291 0.076 0.372 0 0
much 0 0.364 0 0 0.438 0 0 0.358 0 0.136 0 0 0.268
wood 0 0.320 0.177 0.149 0.265 0 0.358 0 0 0.034 0 0.333 0.317
woodch. 0.310 0 0.220 0.221 0 0.291 0 0 0 0.221 0.291 0 0
would 0.262 0 0 0 0.263 0.076 0.136 0.034 0.221 0 0.246 0 0
, 0.291 0 0 0 0 0.372 0 0 0.291 0.246 0 0 0
. 0 0 0.297 0.263 0 0 0 0.333 0 0 0 0 0
? 0 0.365 0.175 0.151 0 0 0.268 0.317 0 0 0 0 0

Twa,b − ∑j wa,j ∑i wb,i


∑∑
r= where T = wi,j
∑j wa,j(T − ∑j wa,j) ∑i wb,i(T − ∑i wb,i) i j
COALS - Step 3
Negative values discarded and the positive values are square rooted
list that we use contains 157 words, including some punc-
tuation and special symbols. Therefore, we actually make
COALS - Algorithm
use of the top 14,000 open-class columns.
In order to produce similarity ratings between pairs of
word vectors, the HAL method uses Euclidean or some-
times city-block distance (see Table 3), but these measures
do not translate well into similarities, even under a variety
of non-linear transformations. LSA, on the other hand,
uses vector cosines, which are naturally bounded in the
range [−1, 1], with high values indicating similar vectors.
For COALS, we have found the correlation measure to

8
places, there is a distinction between the cities and the the average distance betw
states, countries and continents (plus Moscow). Within the average distance betw

Multidimensional Scaling of 3 Nouns


the set of body parts there is little structure. Wrist and
ankle are the closest pair, but the other body parts do not
By computing these dista
in the original space, rath
have much substructure, as indicated by the series of in- ality space, we can avoid
creasingly longer horizontal lines merging the words onto by the dimensionality re
the main cluster one by one. Within the animals there is ing average pairwise dis
a cluster of domestic and farm animals. But these do not root mean square (r.m.s.)
MDS for three group with the other animals. Turtle and oyster are quite tances, although this has
nouns close, perhaps because they are foods. greater the ratio of the r.m
It is notable that the multidimensional scaling and clus- the r.m.s. within-cluster
tering techniques do not entirely agree. Both involve a ing. This ratio will be r
considerable reduction, and therefore possible loss, of in- Because nearest neighbor
formation. Turtle is close to cow and lion in the MDS have a correlation of abo
plot, but that is not apparent in the clustering. On the cluster score is roughly 4
other hand, the clustering distinguishes the (non-capital) If we define the four cl
cities from the other places whereas the MDS plot places task to be those indicate
Hawaii close to Tokyo. Although France appeared to used in Figure 8, the r.m
group with China and Russia in the MDS plot, it doesn’t 0.91, while the r.m.s. wi
in the hierarchical clustering. MDS has the potential sulting in a cluster score

19
1) 57.5 low 45.6 scared 53.7 blue 59.0
2) 51.9 higher 37.2 terrified 47.8 yellow 37.7
3) 43.4 lower 33.7 confused 45.1 purple 37.5
Word Similarity for Nouns
4) 43.2 highest 33.3 frustrated 44.9 green 36.3
5) 35.9 lowest 32.6 worried 43.2 white 34.1
6) 31.5 increases 32.4 embarrassed 42.8 black 32.9
7) 30.7 increase 32.3 angry 36.8 colored 30.7
Nearest neighbours and their percent correlation
8) 29.2 increasing
similarities
31.6 afraid
for a set
35.6 orange 30.6
of nouns 9) 28.7 increased 30.4 upset 33.5 grey 30.6
10) 28.3 lowering 30.3 annoyed 32.4 reddish 29.8
1) 57.5 low 45.6 scared 53.7 blue
2) 51.9 higher 37.2 terrified 47.8 yellow
Word Similarity for Verbs 3)
4)
43.4 lower
43.2 highest
33.7 confused
33.3 frustrated
45.1 purple
44.9 green
5) 35.9 lowest 32.6 worried 43.2 white
6) 31.5 increases 32.4 embarrassed 42.8 black
Nearest neighbours and their
7) percent correlation
30.7 increase 32.3similarities
angry for 36.8
a setcolored
of verbs 8) 29.2 increasing 31.6 afraid 35.6 orange
9) 28.7 increased 30.4 upset 33.5 grey
10) 28.3 lowering 30.3 annoyed 32.4 reddish
Some Insights

The majority of the correlations are negative 5

Words with negative correlations do not contribute well to finding


similarity than the ones with positive correlation
Closed-class words (147) convey syntactic information than
semantic - could be removed from the correlation table punctuation
marks, she, he, where, after, ...
Cartoon Credit: Gregory Piatetsky,
KDnuggets

You might also like