Coals
Coals
Map the similarity or dissimilarity between two objects into a cartesian coordinate space
A symmetrical matrix representation of pair-wise distances
The distance from an object to itself is 0

The values (stress) are computed to be within 
, where  is a monotonic function
Ramaseshan Ramachandran
HAL - Summary
Definition of context
All words within a window or ideally within a sentence
All content words within a window or sentence that fall in a certain
frequency range
All content words which stand in closest proximity to the word in
question in the grammatical schema of each window or sentence
CORRELATED OCCURRENCE ANALOGUE TO
LEXICAL SEMANTICS MODEL - COALS 1
Gather co-occurrence counts, typically ignoring closed-class neighbors and using a ramped, size
4 window
Discard all but the m (14,000, in this case) columns reflecting the most common open-class
words.
Convert counts to word pair correlations - Instead of using the raw frequency score, correlation
score is used to analyze the relationship between pair of words
Set negative values to 0, and take square roots of positive ones.
The semantic similarity between two words is given by the correlation of their vectors.
The correlation coefficient values with this normalisation will be in the range of [-1,1]
The matrix constructed using this correlation would be semantic space
COALS method employs a normalisation strategy that largely factors out lexical frequency.
Columns
1
Rhode representing
et al, ”An Improvedlow-frequency words Based
Model of Semantic Similarity are removed
on Lexical Co-Occurrence”, CACM,
2006, 8, 627-633
COALS - Sample Corpus
woodch.
COALS - Step 1
would
chuck
much
could
wood
how
as
if
a
.
a 0 0 0.120 0.093 0 0.291 0 0 0.310 0.262 0.291 0
as 0 0.175 0 0 0 0 0.364 0.320 0 0 0 0 0.3
chuck 0.120 0 0 0.306 0 0.146 0 0.177 0.220 0 0 0.297 0.1
The initial co-occurrence
could 0.093 table
0 with
0.306 0 a ramped,
0 0.182 04-word window.
0.149 0.221 0 0 0.263 0.1
how 0 0 0 0 0 0 0.438 0.265 0 0.263 0 0
if 0.291 0 0.146 0.182 0 0 0 0 0.291 0.076 0.372 0
much 0 0.364 0 0 0.438 0 0 0.358 0 0.136 0 0 0.2
wood 0 0.320 0.177 0.149 0.265 0 0.358 0 0 0.034 0 0.333 0.3
woodch. 0.310 0 0.220 0.221 0 0.291 0 0 0 0.221 0.291 0
would 0.262 0 0 0 0.263 0.076 0.136 0.034 0.221 0 0.246 0
, 0.291 0 0 0 0 0.372 0 0 0.291 0.246 0 0
. 0 0 0.297 0.263 0 0 0 0.333 0 0 0 0
? 0 0.365 0.175 0.151 0 0 0.268 0.317 0 0 0 0
7
Table 7
Step 3 of the COALS method: Negative values discarded and the positive values square rooted.
COALS - Step 2
woodch.
would
chuck
much
could
wood
how
as
if
a
?
,
.
a 0 0 0.120 0.093 0 0.291 0 0 0.310 0.262 0.291 0 0
Raw counts are converted to correlations
as
chuck
0
0.120
0.175
0
0
0
0
0.306
0
0
0
0.146
0.364
0
0.320
0.177
0
0.220
0
0
0
0
0
0.297
0.365
0.175
could 0.093 0 0.306 0 0 0.182 0 0.149 0.221 0 0 0.263 0.151
how 0 0 0 0 0 0 0.438 0.265 0 0.263 0 0 0
if 0.291 0 0.146 0.182 0 0 0 0 0.291 0.076 0.372 0 0
much 0 0.364 0 0 0.438 0 0 0.358 0 0.136 0 0 0.268
wood 0 0.320 0.177 0.149 0.265 0 0.358 0 0 0.034 0 0.333 0.317
woodch. 0.310 0 0.220 0.221 0 0.291 0 0 0 0.221 0.291 0 0
would 0.262 0 0 0 0.263 0.076 0.136 0.034 0.221 0 0.246 0 0
, 0.291 0 0 0 0 0.372 0 0 0.291 0.246 0 0 0
. 0 0 0.297 0.263 0 0 0 0.333 0 0 0 0 0
? 0 0.365 0.175 0.151 0 0 0.268 0.317 0 0 0 0 0
8
places, there is a distinction between the cities and the the average distance betw
states, countries and continents (plus Moscow). Within the average distance betw
19
1) 57.5 low 45.6 scared 53.7 blue 59.0
2) 51.9 higher 37.2 terrified 47.8 yellow 37.7
3) 43.4 lower 33.7 confused 45.1 purple 37.5
Word Similarity for Nouns
4) 43.2 highest 33.3 frustrated 44.9 green 36.3
5) 35.9 lowest 32.6 worried 43.2 white 34.1
6) 31.5 increases 32.4 embarrassed 42.8 black 32.9
7) 30.7 increase 32.3 angry 36.8 colored 30.7
Nearest neighbours and their percent correlation
8) 29.2 increasing
similarities
31.6 afraid
for a set
35.6 orange 30.6
of nouns 9) 28.7 increased 30.4 upset 33.5 grey 30.6
10) 28.3 lowering 30.3 annoyed 32.4 reddish 29.8
1) 57.5 low 45.6 scared 53.7 blue
2) 51.9 higher 37.2 terrified 47.8 yellow
Word Similarity for Verbs 3)
4)
43.4 lower
43.2 highest
33.7 confused
33.3 frustrated
45.1 purple
44.9 green
5) 35.9 lowest 32.6 worried 43.2 white
6) 31.5 increases 32.4 embarrassed 42.8 black
Nearest neighbours and their
7) percent correlation
30.7 increase 32.3similarities
angry for 36.8
a setcolored
of verbs 8) 29.2 increasing 31.6 afraid 35.6 orange
9) 28.7 increased 30.4 upset 33.5 grey
10) 28.3 lowering 30.3 annoyed 32.4 reddish
Some Insights