CIKM2022 Submission 3961

This research investigates the optimization of singular value-based similarity measures for document comparisons, particularly in the context of patent retrieval. It introduces new metrics that utilize matrix similarity measures instead of traditional vector representations, aiming to retain more information and improve accuracy. The study demonstrates that these advanced similarity measures can outperform simpler methods, especially when tailored for specific tasks using learnable parameters.

Uploaded by

pkalnis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

CIKM2022 Submission 3961

Uploaded by

pkalnis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Research: Optimizing singular value based similarity measures

for document similarity comparisons

Anonymous Author(s)
ABSTRACT In reality, the situation is not that bad, but rather it seems that
The similarity of documents is typically computed using fairly sim- even long documents can be effectively represented as relatively
ple similarity measures, most often by mean or maximum pooling of low-dimensional vectors. Already simple mean pooling can work
word representations followed by vector cosine similarity. This has well in practice [7], and further developments such as smart weight-
the advantage of fast computation but compared to second-order ing schemes [2, 10], directly learned document vector representa-
or matrix-based similarity measures naturally loses information. In tions [6, 14], and especially the contextual embeddings and trans-
this work, we investigate the value of matrix similarity measures former models [8, 19] have pushed the limits of what you can encode
for document similarity comparison in full-length patent retrieval into a vector. Irrespective of how the vector representing the docu-
tasks. Furthermore, we introduce two new metrics motivated by ment was formed, we can efficiently compute distances between
the Schatten 𝑝-norm. The new similarity measures are based on documents using cosine similarity or other standard similarity mea-
singular values and involve learnable parameters to be optimized sures. These developments have not come without a drawback and
for a given evaluation task. We show that tuning the similarity especially transformer models have often very high computational
measures for a specific task improves the similarity comparison cost [17], thus simpler methods such as weighting schemes and
accuracy. better similarity measures based on static word representations still
have their place in many applications.
CCS CONCEPTS In this work, we step outside such vector-shaped representa-
• Information systems → Similarity measures; Document struc- tions and directly work with a full matrix 𝐴 ∈ 𝑅𝑛×𝑑 that stores 𝑑-
ture. dimensional representations for 𝑛 words appearing in the document.
Our main goal is to measure similarity between two documents
KEYWORDS 𝐴 ∈ 𝑅𝑛×𝑑 and 𝐵 ∈ 𝑅𝑚×𝑑 directly using these matrix representa-
tions, without collapsing them into vectors as above. Intuitively,
matrix norms, document similarity, similarity measures, singular
it is clear that retaining the full information can lead to superior
value, schatten p-norm, patents
accuracy, but realizing this in practice is far from trivial – we now
ACM Reference Format: need to define a similarity measure that can reliably take advantage
Anonymous Author(s). 2022. Research: Optimizing singular value based of this information.
similarity measures for document similarity comparisons. In Proceedings of
Make sure to enter the correct conference title from your rights confirmation
emai (Conference acronym ’XX). ACM, New York, NY, USA, 5 pages. https: 1.2 Matrix metrics
//doi.org/XXXXXXX.XXXXXXX
Here we define a matrix similarity measure to be any function
𝑓 (𝐴, 𝐵) ∈ 𝑅 that assigns similarity score for document matrices
1 INTRODUCTION
𝐴 ∈ 𝑅𝑛×𝑑 and 𝐵 ∈ 𝑅𝑚×𝑑 .
1.1 Document representations and similarity There exists multiple ways to define similarity measures between
For natural language processing tasks, we typically represent words matrices. The most straightforward ones – such as Word mover’s
and documents as numerical vectors, since they allow mathemat- distance [12] (also known as Bures-Wasserstein distance [5]) or
ically simple comparisons and are space-efficient. While modern pairwise comparison of all possible word pairs in a matrix – can be
vector representations are extremely informative for individual directly applied on matrices with an arbitrary number of rows, and
words, compressing longer text passages such as sentences and hence for documents of arbitrary lengths. Some other similarity
especially full documents into a single vector can heavily reduce measures assume 𝐴 and 𝐵 to be the same shape. To apply those for
the amount of useful information, as pointed out by the famous document comparisons, we need to first preprocess the document
quote by Ray Mooney: matrices suitably; we here call this step pooling. The simplest pool-
You can’t cram the meaning of a whole %&!$# sen- ing approach is padding the shorter document with suitably many
tence into a single $&!#* vector! rows of zeros, whereas a more general approach is to use covariance
pooling where we use 𝐴𝑇 𝐴 ∈ 𝑅𝑑×𝑑 and 𝐵𝑇 𝐵 ∈ 𝑅𝑑×𝑑 as the inputs
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed for the similarity measure. Covariance pooling has been shown to
for profit or commercial advantage and that copies bear this notice and the full citation have beneficial properties as a document representation [13, 18]. As
on the first page. Copyrights for components of this work owned by others than ACM 𝑑 is often large, for the smaller size we can use SVD pooling where
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a only 𝑘 leading singular vectors of the covariance representation
fee. Request permissions from [email protected]. are used. This can have a regularizing effect in addition to lowering
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY memory and computational costs [13].
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 The key contribution of this work is the introduction of new
https://doi.org/XXXXXXX.XXXXXXX matrix similarity measures for document similarity. We explain
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Anon.

how submultiplicative norms can be converted into a metric resem- This measure can be considered as a natural extension to the stan-
bling cosine similarity, providing a family of similarity measures dard cosine similarity between vectors. Due to submultiplicativity,
building on the Schatten 𝑝-norm computed using singular values it is always within the range [−1, 1]. Even though the measure will
of covariance pooling. We then introduce new similarity measures not in general be a proper metric, we will have higher similarity
that are based on the same singular values but map them to similar- when 𝐴 and 𝐵 are similar in terms of the norm and can use it for
ity scores in a more flexible manner. The new similarity measures similarity comparisons.
have learnable parameters that are tuned for a specific end task and In this work we build on a particular family of submultiplicative
hence can learn to represent relevant information better. norms called Schatten 𝑝-norms, defined as
∑︁ 1/𝑝
1.3 Patent retrieval as context 𝑆𝑝 (𝐴) :=
𝑝
𝑠𝑛 (𝐴) , (2)
We evaluate the measures in the context of patent applications, as an 𝑛
example domain with long but structured documents. Efficient tools
where 𝑝 ∈ [1, ∞) and 𝑠𝑛 (𝐴) is the 𝑛th singular value of the matrix
for handling patent documents are in high demand due to the high
𝐴 in descending order. The normalized similarity measure can then
labor cost of manual inspection. This is especially the case for the
be expressed as 𝐷 (𝐴, 𝐵, 𝑆𝑝 (·)) in the general notation of Eq. (1).
invalidity search stage, aiming to find relevant patents that could
This family generalizes several well-known norms: for 𝑝 = 2 we get
possibly cause issues with e.g. patent infringement, or lead to delays
the Frobenius norm, for 𝑝 = 1 it corresponds to the trace norm, and
or rejection of the patent application. Over the years, there has been
for 𝑝 = ∞ we get the operator norm. Lagus et al. [13] presented the
lots of research on how to automate different parts of the process [1,
similarity measure of Eq. (1) in the specific context of the Frobenius
3] and on end-to-end solutions [9] for specific tasks. In addition to
form, but here we consider the general formulation for arbitrary
trying to solve specific tasks, there have been efforts toward creating
norms and norm-like functions.
patent-text-specific language models [4, 15]. Still, the field of patent
For 𝑝 ∈ (0, 1) the Schatten 𝑝-norm becomes a quasinorm since it
text processing is far from being solved. The patent domain is a
does not fulfill the triangle inequality, but we still retain the property
good candidate for exploring richer representations and similarity
that 𝐷 (𝐴, 𝐵, 𝑆𝑝 (·)) ∈ [−1, 1] and hence get a normalized similarity
measures as patent documents can often be tens of pages long and
measure. The Schatten 𝑝-quasinorm has recently gained traction
can greatly benefit from richer information.
in other matrix applications such as low-rank matrix recovery [21]
We explore the value of covariance pooling and singular value
and image denoising [20], and has been shown to have beneficial
based similarity measures in patent similarity comparison tasks.
properties even though lacking the convexity guarantees that the
We show that in the case of static embeddings, these similarity
triangle inequality would give, making direct optimization harder.
measures can provide better results in a full document comparison
setting when compared to mean vector representation and that
the newly proposed similarity measure using a neural network 2.2 Learnable similarity measures
further improves the similarity comparison accuracy compared to The similarity measure (1) is general and depends on the norm.
the standard Schatten 𝑝-norm. Instead of assuming a specific norm in advance, we propose using
a slightly more flexible parametric family of norms. We can then
2 SIMILARITY MEASURES BASED ON optimize the parameters of the norm directly for a task where the
distance measure is used. The Schatten 𝑝-norm (2) itself has the
SINGULAR VALUES
parameter 𝑝 which can be learned to maximize a task performance,
This section introduces our technical contributions. We first explain such as retrieval accuracy. Since we only have a single parameter, we
how submultiplicative matrix norms can be used for deriving a can either just evaluate the performance using a grid of alternative
similarity measure between two matrices. We provide a family choices or directly optimize over 𝑝 using standard gradient-based
of measures building on the Schatten 𝑝-norm, computed using optimization; we will later show that both approaches work.
singular values of the covariance pooling of document matrices. For more flexibility, we next propose extensions of the Schatten 𝑝-
We then proceed to create a family of potentially more expressive norm that involve additional control parameters. These extensions
matrix similarity measures, building on the same basic distance are not necessarily interesting as matrix norms as such since there
measure but replacing the matrix norm with alternative functions of would be no basis for determining the parameters in isolation, but in
the singular values. In particular, we introduce similarity measures applications where the norm is used to construct a distance measure,
with learnable parameters that can be fine-tuned for a given task. we can determine the parameters to maximize the eventual task
performance. We start from the observation that the Schatten 𝑝-
2.1 From matrix norm to similarity measure norm is based on singular values, and explore how much richer
Any matrix norm ∥𝐴∥ that has the submultiplicative property measures we can construct using singular values as the inputs. We
∥𝐴𝐵∥ ≤ ∥𝐴∥ ∥𝐵∥ can be used for constructing a normalized simi- consider two alternatives to be used in place of the norm in (1), all
larity measure between matrices 𝐴 and 𝐵 measured with norm (or of which result in bounded similarities.
norm-like) 𝑆 (·). This can be expressed as a general formula The simplest extension

𝑝 1/𝑝
∑︁
𝑆 (𝐴𝑇 𝐵) 𝑆 𝑤,𝑝 (𝐴) := 𝑤𝑛 𝑠𝑛 (𝐴) (3)
𝐷 (𝐴, 𝐵, 𝑆 (·)) := . (1)
𝑆 (𝐴 𝐴) 1/2𝑆 (𝐵𝑇 𝐵) 1/2
𝑇 𝑛
Research: Optimizing singular value based similarity measures for document similarity comparisons Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

weights each singular value independently but otherwise retains the 2000 samples as the training set. We use triplet loss as the loss
functional form of the Schatten 𝑝-norm. This generalization is still function setting one of the models as the distance function and the
a norm, since for any matrix 𝐴, we can always find matrix 𝐴 ′ where margin (chosen using hyperparameter optimization) to 0.5. The
𝑠𝑖 (𝐴 ′ ) = 𝑤𝑖 𝑠𝑖 (𝐴). One motivation for this norm is the observation loss for one instance for the measure in Eq. (1) is then
of Arora et al. [2] that removing the direction of the largest singular
L (𝐴, 𝑃, 𝑁 , 𝑆 (·)) = max(𝐷 (𝐴, 𝑃, 𝑆 (·)) − 𝐷 (𝐴, 𝑁 , 𝑆 (·)) + 0.5, 0),
vector helps in reducing the effect of the most common words that
are not informative in document discrimination. For 𝑝 = 1 (denoted and for the neural network model is
as 𝑆 𝑤,1 (·) later on) we obtain simple weighting as special case of
the more general weighting. Alternatively, we can interpret the L (𝐴, 𝑃, 𝑁 ) = max(D𝑁 𝑁 (𝐴, 𝑃) − D𝑁 𝑁 (𝐴, 𝑁 ) + 0.5, 0),
weights 𝑤𝑛 as a form of attention mechanism. where 𝐴 is the encoded original document, 𝑃 is the encoded X
As a still more flexible alternative, we consider directly mapping citation (the positive sample), 𝑁 is the encoded A citation (the
the singular values of 𝐴𝑇 𝐵 to the similarity with a flexible model. negative sample), and 𝑆 (·) is any of the previously defined norm-
We can then include the normalization within the measure itself, like models. Optimization is terminated once the result on the
and hence get directly a replacement for Eq. (1). For this, we use a validation set decreases for three consecutive evaluations.
small neural network
Evaluation. Finally we evaluate the trained model using a test
D𝑁 𝑁 (𝐴, 𝐵) = 𝑡𝑎𝑛ℎ(𝑅𝑒𝐿𝑈 (𝑅𝑒𝐿𝑈 (𝑠 (𝐴𝑇 𝐵)𝑊1 )𝑊2 )𝑊3 ), (4) set of 1000 triplets, measuring the distance from the anchor to both
where 𝑊1 ∈ 𝑅𝑑×500 , 𝑊2 ∈ 𝑅 500×500 , 𝑊3 ∈ 𝑅 500×1 , and 𝑅𝑒𝐿𝑈 (·) is a positive and negative samples and counting how often the positive
the rectified linear unit activation function. Finally, the hyperpoblic sample is closer to the anchor than the negative sample, i.e. the X
𝑡𝑎𝑛ℎ(·) activation at the end ensures the outcome is normalized citation ranks higher than the A citation. As the baseline, we use
between [−1, 1]. Each layer has also a bias term of suitable size, the standard mean vector combined with cosine similarity.
which is omitted here for conciseness. The network architecture
could be further tuned by standard architecture search but is not 3.2 Results
particularly relevant for this work. Results for all methods are reported in Table 1. We first inspect
the accuracy of the similarity measure using standard Schatten
3 EXPERIMENTS 𝑝-norm. The main observation is that small values of 𝑝 are the best,
We evaluate the proposed similarity measures in the context of so that 𝑝 = 1 is the best of the proper norms in both cases and the
patent documents. When patent examiners evaluate the novelty of highest overall accuracy is obtained with quasinorms with 𝑝 < 1.
a patent application, there are different kind of prior art that is to be The best 𝑝 clearly outperforms the baseline of mean vector and
considered. The X citations are prior work that can alone lead to a cosine similarity (Mean); for Claims we improve from 0.566 to 0.601
rejection, while the A citations describe the state of the art, but are with 𝑝 = 0.2 and for Descriptions from 0.553 to 0.573 with 𝑝 = 0.5.
not immediate reasons for rejection. Differentiating between these Large 𝑝 are clearly worse and all 𝑝 > 3 are effectively equivalent to
categories of citations can be useful, for example, in retrieval tasks 𝑝 = ∞.
where we want to rank the patents by their relevance to the original Rather than evaluating the metric for a range of 𝑝, we can just
document. If we know the relative ordering of each citation class, as well optimize over 𝑝. For both cases, the solution, denoted by
we can reorder the search results to highlight the most relevant 𝑆𝑜𝑝𝑡 , slightly improves from the one chosen amongst the grid of
documents. In the case of X and A citations, we should often rank alternatives as expected, and we get the optimal values of 𝑝 = 0.884
X citations higher as they give more evidence against rejecting for Claims and 𝑝 = 0.327 for Descriptions. One technical aspect we
a patent application. A good similarity measure between patents note is that when 𝑝 ∈ (0, 1) the function is non-convex [16] and
should satisfy this. can have multiple local optima within this range, but we did not
Patents themselves consist of two main parts, claims and descrip- observe this to be a problem in practice.
tion, where the claims part describes the actual claims that are being The weighted extension of Schatten 𝑝-norm of (3) is denoted here
made and the description part is a more free-form description of the by 𝑆 𝑤,𝑝 . Figure 1 (a) illustrates the learned weights (as function of
invention overall. For this reason, the claims part is usually much iteration) for fixed 𝑝 = 1, demonstrating how the measure assigns
shorter and less noisy than the description part, while the descrip- more weight for the first 10 or so singular values. Figure 1 (b)
tion part is more thorough and thus contains more fine-grained illustrates the behavior of the weights and 𝑝 when optimized jointly,
information. We evaluate the similarity measures for both cases to and reveals quite different phenomena: Instead of small 𝑝 it is
provide two parallel sets of results. now better to use large 𝑝 and down-weight many of the early
singular vectors. Even though this alternative way of measuring
3.1 Data and evaluation similarity is interesting, the empirical performance (on test data) is
not ideal; both weighted versions outperform the mean baseline,
Encoding. We encode the patent documents using English 300-
but do not provide an improvement over 𝑆𝑜𝑝𝑡 and for Claims it
dimensional fastText embeddings [11] and form the covariance
remains worse. One advantage of these measures is that – as seen
matrices of dimensionality 300 × 300 of each document as the
here – the similarity measures only depend on fairly small number
document representation.
of eigenvalues; we here have 300 × 300 matrices but only need tens
Training. For the models that require learning the parameters, of eigenvalues to represent the distance, and hence only need to
we use PyTorch library to do gradient-based optimization using compute a subset of the eigenvalues.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Anon.

a) b)

Figure 1: a) Development of singular value weights as a function of iterations for the model 𝑆 𝑤,1 . b) Development of the weights
and 𝑝 for the model 𝑆 𝑤,𝑝 . Only first 70 out of 300 weights are shown; the rest are effectively zero.

Dataset Mean S0.1 S0.2 S0.5 S1.0 S1.5 S2.0 S3.0 S5.0 S∞ S𝑜𝑝𝑡 S𝑤,1 S𝑤,𝑝 D𝑁 𝑁
Claims 0.566 0.593 0.601 0.580 0.594 0.577 0.558 0.545 0.545 0.545 0.603 0.588 0.589 0.642
Description 0.553 0.549 0.558 0.573 0.520 0.504 0.496 0.487 0.482 0.482 0.574 0.525 0.574 0.652
Table 1: Numerical results. Mean shows the baseline of mean vector with cosine similarity. Free-form neural network model
𝐷 𝑁 𝑁 is clearly the best for both tasks.

The still more flexible neural network measure of Eq. (4), how- REFERENCES
ever, works very well. It has the highest accuracy for both Claims [1] Leonidas Aristodemou and Frank Tietze. 2018. The state-of-the-art on Intellectual
and Descriptions, with substantial improvement also over 𝑆𝑜𝑝𝑡 . Property Analytics (IPA): A literature review on artificial intelligence, machine
learning and deep learning methods for analysing intellectual property (IP) data.
This verifies that singular values of 𝐴𝑇 𝐵 can be used as the ba- World Patent Information 55 (2018), 37–51.
sis for measuring similarity between documents more accurately [2] Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-
beat baseline for sentence embeddings. In International conference on learning
than what standard Schatten 𝑝-norm can reveal, and importantly representations.
the performance remains high also for the full-length documents [3] Benjamin Balsmeier, Mohamad Assaf, Tyler Chesebro, Gabe Fierro, Kevin John-
(Descriptions) that are challenging for all other similarity measures. son, Scott Johnson, Guan-Cheng Li, Sonja Lück, Doug O’Reagan, Bill Yeh, et al.
2018. Machine learning and natural language processing on the patent corpus:
Data, tools, and new measures. Journal of Economics & Management Strategy 27,
3 (2018), 535–553.
[4] Hamid Bekamiri, Daniel S Hain, and Roman Jurowetzki. 2021. PatentSBERTa:
A Deep NLP based Hybrid Model for Patent Distance and Classification using
Augmented SBERT. arXiv preprint arXiv:2103.11933 (2021).
[5] Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. 2019. On the Bures–Wasserstein
4 CONCLUSIONS distance between positive definite matrices. Expositiones Mathematicae 37, 2
We set out to investigate how similarity measures based on matrix (2019), 165–191.
[6] Minmin Chen. 2017. Efficient vector representation for documents through
norms work in document similarity comparisons in the context corruption. arXiv preprint arXiv:1707.02377 (2017).
of patent retrieval. We focused on similarity measures based on [7] Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, and
Marco Baroni. 2018. What you can cram into a single vector: Probing sentence
singular values of the inner product of the two document matrices, embeddings for linguistic properties. arXiv preprint arXiv:1805.01070 (2018).
motivated by the Schatten 𝑝-norm and similarity measures induced [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert:
by that. Our main contribution was introducing new parametric Pre-training of deep bidirectional transformers for language understanding. arXiv
preprint arXiv:1810.04805 (2018).
similarity measures that build on the same singular values but are [9] Xiaochen Gao, Zhaoyi Hou, Yifei Ning, Kewen Zhao, Beilei He, Jingbo Shang,
fine-tuned for the specific task at hand, and we showed how a and Vish Krishnan. 2022. Towards Comprehensive Patent Approval Predictions:
direct neural network mapping the singular values to a distance Beyond Traditional Document Classification. In Proceedings of the 60th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
outperforms both standard mean representation as well as our 349–372.
attempts of more constrained – and hence more interpretable – [10] Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, and
Partha Talukdar. 2020. P-sif: Document embeddings using partition averaging. In
similarity measures. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7863–7870.
While the investigation was done in the context of static em- [11] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag
beddings and patent data, the applicability is not limited to these of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016).
[12] Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word
choices. Likely any full-document comparison task can benefit from embeddings to document distances. In International conference on machine learn-
richer representations and the rich contextual embeddings, such ing. PMLR, 957–966.
as the ones outputted by transformer models, should enhance the [13] Jarkko Lagus, Janne Sinkkonen, Arto Klami, et al. 2019. Low-rank approximations
of second-order document representations. In Proceedings of the 23rd Conference
results further.
Research: Optimizing singular value based similarity measures for document similarity comparisons Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

on Computational Natural Language Learning (CoNLL). ACL. Linguistics (Volume 2: Short Papers). 527–532.
[14] Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and [19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
documents. In International conference on machine learning. PMLR, 1188–1196. Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all
[15] Jieh-Sheng Lee and Jieh Hsiang. 2020. Patent classification by fine-tuning BERT you need. Advances in neural information processing systems 30 (2017).
language model. World Patent Information 61 (2020), 101965. [20] Yuan Xie, Shuhang Gu, Yan Liu, Wangmeng Zuo, Wensheng Zhang, and Lei
[16] Fanhua Shang, Yuanyuan Liu, Fanjie Shang, Hongying Liu, Lin Kong, and Licheng Zhang. 2016. Weighted Schatten 𝑝 -norm minimization for image denoising and
Jiao. 2020. A unified scalable equivalent formulation for schatten quasi-norms. background subtraction. IEEE transactions on image processing 25, 10 (2016),
Mathematics 8, 8 (2020), 1325. 4842–4857.
[17] Or Sharir, Barak Peleg, and Yoav Shoham. 2020. The cost of training nlp models: [21] Hengmin Zhang, Jianjun Qian, Bob Zhang, Jian Yang, Chen Gong, and Yang Wei.
A concise overview. arXiv preprint arXiv:2004.08900 (2020). 2019. Low-Rank Matrix Recovery via Modified Schatten-𝑝 Norm Minimization
[18] Marwan Torki. 2018. A document descriptor using covariance of word vectors. With Convergence Guarantees. IEEE Transactions on Image Processing 29 (2019),
In Proceedings of the 56th Annual Meeting of the Association for Computational 3132–3142.

Embeddings - A Simple Guide To Rag
No ratings yet
Embeddings - A Simple Guide To Rag
10 pages
LNCS 2810 Similarity Based Classification 1st Edition by Axel Bernal, Karen Hospevian, Tayfun Karadeniz, Jean Louis Lassez ISBN 3540408134 978-3540408130 Download
100% (3)
LNCS 2810 Similarity Based Classification 1st Edition by Axel Bernal, Karen Hospevian, Tayfun Karadeniz, Jean Louis Lassez ISBN 3540408134 978-3540408130 Download
26 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
37 pages
CS 3308 Learning Journal Unit 4
No ratings yet
CS 3308 Learning Journal Unit 4
5 pages
OCET Questions
No ratings yet
OCET Questions
504 pages
Lec 3
No ratings yet
Lec 3
51 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
49 pages
Data Mining: Dimensionality Reduction Pca - SVD
No ratings yet
Data Mining: Dimensionality Reduction Pca - SVD
33 pages
An Approximate Algorithm For Maximum Inner Product Search Over Streaming Sparse Vectors
No ratings yet
An Approximate Algorithm For Maximum Inner Product Search Over Streaming Sparse Vectors
44 pages
06 VectorSpaceModel
No ratings yet
06 VectorSpaceModel
65 pages
06 VectorSpaceModel PDF
No ratings yet
06 VectorSpaceModel PDF
75 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Netsimile: A Scalable Approach To Size-Independent Network Similarity
No ratings yet
Netsimile: A Scalable Approach To Size-Independent Network Similarity
12 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
06-AIA42022424 Online
No ratings yet
06-AIA42022424 Online
12 pages
Week 5 - Latent Semantic Indexing
No ratings yet
Week 5 - Latent Semantic Indexing
38 pages
ShortCourse QTT Lecture1
No ratings yet
ShortCourse QTT Lecture1
40 pages
Word Embedding Sand Length Normalization For Document Ranking
No ratings yet
Word Embedding Sand Length Normalization For Document Ranking
10 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
No ratings yet
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
50 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
What Are Similarity and Dissimilarity Measures
No ratings yet
What Are Similarity and Dissimilarity Measures
4 pages
Vector Space Model
No ratings yet
Vector Space Model
11 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
L04
No ratings yet
L04
35 pages
Chapter 4 - Part II
No ratings yet
Chapter 4 - Part II
44 pages
Non Numeric Clustering Seminar
No ratings yet
Non Numeric Clustering Seminar
26 pages
Semantic Technology-Assisted Review STAR Document
No ratings yet
Semantic Technology-Assisted Review STAR Document
14 pages
Webir 06
No ratings yet
Webir 06
32 pages
Deep Learning Approaches For Similarity Computation A Survey
No ratings yet
Deep Learning Approaches For Similarity Computation A Survey
20 pages
Is Cosine-Similarity of Embeddings Really About Similarity
No ratings yet
Is Cosine-Similarity of Embeddings Really About Similarity
9 pages
Cosine Similarity in Machine Learning
No ratings yet
Cosine Similarity in Machine Learning
14 pages
ISR Chap... 5
No ratings yet
ISR Chap... 5
34 pages
Efficient Graph-Based Author Disambiguation by Topological Similarity in DBLP
No ratings yet
Efficient Graph-Based Author Disambiguation by Topological Similarity in DBLP
5 pages
Vector Semantics 2 Word Embeddings (Vector Semantics)
No ratings yet
Vector Semantics 2 Word Embeddings (Vector Semantics)
5 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
Similarity-Based Learning: Exercise Solutions: Solutionsmanual-Mit-7X9-Style 2015/4/22 21:17 Page 45 #55
No ratings yet
Similarity-Based Learning: Exercise Solutions: Solutionsmanual-Mit-7X9-Style 2015/4/22 21:17 Page 45 #55
10 pages
f33 Cai PDF
No ratings yet
f33 Cai PDF
8 pages
WMD PDF
No ratings yet
WMD PDF
10 pages
Unit IV
No ratings yet
Unit IV
58 pages
Clustering With Multi-Viewpoint Based Similarity Measure: An Overview
No ratings yet
Clustering With Multi-Viewpoint Based Similarity Measure: An Overview
5 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Tkde 2014 26 7
No ratings yet
Tkde 2014 26 7
17 pages
CS224d Deep Learning For Natural Language Processing Lecture 2: Word Vectors
No ratings yet
CS224d Deep Learning For Natural Language Processing Lecture 2: Word Vectors
40 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
L14 VSM
No ratings yet
L14 VSM
24 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
Vector Space Model
No ratings yet
Vector Space Model
4 pages
Vector Space Model
No ratings yet
Vector Space Model
6 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
Vector Space Model: TF - IDF: Adapted From Lectures by
No ratings yet
Vector Space Model: TF - IDF: Adapted From Lectures by
37 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
A General View For Network Embedding As Matrix Factorization
No ratings yet
A General View For Network Embedding As Matrix Factorization
9 pages
Documents Similarity
No ratings yet
Documents Similarity
6 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Business Intelligence Question Bank
No ratings yet
Business Intelligence Question Bank
35 pages
Machine Learning
100% (1)
Machine Learning
65 pages
IRS Unit 4
No ratings yet
IRS Unit 4
63 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
FKFKF
No ratings yet
FKFKF
9 pages
Ch7 Refactoring
No ratings yet
Ch7 Refactoring
71 pages
Tishme Hasan-Resume - Ok
No ratings yet
Tishme Hasan-Resume - Ok
3 pages
Past PPR
No ratings yet
Past PPR
31 pages
1. Εισαγωγή στην Εξόρυξη Δεδομένων
No ratings yet
1. Εισαγωγή στην Εξόρυξη Δεδομένων
70 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
Unit 4
No ratings yet
Unit 4
61 pages
Model Question Paper 2
No ratings yet
Model Question Paper 2
7 pages
Iai&ml Unit-4
No ratings yet
Iai&ml Unit-4
34 pages
Cluster Analysis-Unit 11
No ratings yet
Cluster Analysis-Unit 11
37 pages
Semantic Search With LLMs
No ratings yet
Semantic Search With LLMs
21 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
Universal Hopfield Networks - A General Framework For Single-Shot Associative Memory Models
No ratings yet
Universal Hopfield Networks - A General Framework For Single-Shot Associative Memory Models
24 pages
IJCRT24A4501
No ratings yet
IJCRT24A4501
4 pages
Identification of Functionally Related Enzymes by Learning-to-Rank Methods
No ratings yet
Identification of Functionally Related Enzymes by Learning-to-Rank Methods
13 pages
Bero-Similarity Measure For Molecular Structure-A Brief Review-NA
No ratings yet
Bero-Similarity Measure For Molecular Structure-A Brief Review-NA
9 pages
JETIR2306482
No ratings yet
JETIR2306482
4 pages
Complex Word Mathematics in Natural Language Processing (NLP) PDF
No ratings yet
Complex Word Mathematics in Natural Language Processing (NLP) PDF
10 pages
Supervised Kmeans-08
No ratings yet
Supervised Kmeans-08
9 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Data Similarity and Dissimilarity
No ratings yet
Data Similarity and Dissimilarity
3 pages
Almost Rerere Learning To Resolve Conflicts in Distributed Projects
No ratings yet
Almost Rerere Learning To Resolve Conflicts in Distributed Projects
18 pages
2018 April CS362-A - Ktu Qbank
No ratings yet
2018 April CS362-A - Ktu Qbank
2 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet

CIKM2022 Submission 3961

Uploaded by

CIKM2022 Submission 3961

Uploaded by

Research: Optimizing singular value based similarity measures

for document similarity comparisons

You might also like