An Algorithm For Automatic Assignment of
An Algorithm For Automatic Assignment of
The author’s accepted manuscript is made public in accordance with the copyright agreement with
the publisher:
“You may further deposit the accepted manuscript version in any repository, provided it is only
made publicly available 12 months after official publication or later and provided acknowledgement
is given to the original source of publication and a link is inserted to the published article on
Springer's website.”
An algorithm for automatic assignment of reviewers to papers
Yordan Kalmukov
Department of Computer Systems and Technologies,
University of Ruse,
8 Studentska Str., 7017 Ruse, Bulgaria,
[email protected]
Abstract: The assignment of reviewers to papers is one of the most important and challenging tasks in
organizing scientific conferences and a peer review process in general. It is a typical example of an
optimization task where limited resources (reviewers) should be assigned to a number of consumers (papers),
so that every paper should be evaluated by highly competent, in its subject domain, reviewers while
maintaining a workload balancing of the reviewers.
This article suggests a heuristic algorithm for automatic assignment of reviewers to papers that
achieves accuracy of about 98-99% in comparison to the maximum-weighted matching (the most accurate)
algorithms, but has better time complexity of Θ(n2). The algorithm provides an uniform distribution of
papers to reviewers (i.e. all reviewers evaluate roughly the same number of papers); guarantees that if there
is at least one reviewer competent to evaluate a paper, then the paper will have a reviewer assigned to it; and
allows iterative and interactive execution that could further increase accuracy and enables subsequent
reassignments. Both accuracy and time complexity are experimentally confirmed by performing a large
number of experiments and proper statistical analyses.
Although it is initially designed to assign reviewers to papers, the algorithm is universal and could be
successfully implemented in other subject domains, where assignment or matching is necessary. For
example: assigning resources to consumers, tasks to persons, matching men and women on dating web sites,
grouping documents in digital libraries and others.
1. INTRODUCTION
One of the most important and challenging task in organizing scientific conferences is the assignment
of reviewers to papers. It directly affects the conference’s quality and its public image. As the quality of a
conference depends mostly on the quality of the accepted papers, it is crucial that every paper is evaluated by
the most competent in its subject domain reviewers. The assignment itself is a typical example of a classical
optimization task where limited resources (reviewers) should be assigned to a number of papers in a way
that:
Every paper should be evaluated by the most competent (in its subject domain) reviewers.
Reviewers assigned to a paper should not fall in a conflict of interests with it.
All reviewers should evaluate roughly the same amount of papers, i.e. reviewers should be
equally loaded.
In general reviewers could be assigned to papers in two ways:
Manual
Automatic
Manual assignment is only applicable to “small” conferences having a small number of submitted
papers and registered reviewers. It requires that Programme Committee (PC) chairs familiarize themselves
with all papers and reviewers’ competences then assign the most suitable reviewers to each paper, while
maintaining a load balancing so that all reviewers evaluate roughly the same number of papers. Doing that
for a large number of papers and reviewers is not just hard and time consuming, but due to the many
constraints (expertise, load balancing, conflict of interests and etc.) that should be taken into account the
manual assignment gets less and less accurate with increasing the number of papers and reviewers. For that
reason, all commercially available conference management systems (CMS) implement an automatic
assignment process that tries to meet all of the three assignment requirements stated earlier.
The non-intersecting sets of papers and reviewers can be represented by a complete weighted bipartite
graph G = (P + R, E) (figure 1), where P is the set of all submitted papers, R – the set of all registered
reviewers and E – the set of all edges. There is an edge from every paper to every reviewer and every edge
should have a weight. In case of a zero weight, the corresponding edge may be omitted turning the graph to a
non-complete one. The weight of the edge between paper pi and reviewer rj tells us how competent (suitable)
is rj to review pi. This measure of suitability is called a similarity factor. The weights are calculated or
assigned in accordance with the chosen method of describing papers and reviewers’ competences [14].
Figure 1. The sets of papers (P) and reviewers (R) represented as a complete weighted bipartite
graph. The edges in bold are the actual assignments suggested by an assignment algorithm. All edges have
weights but just those of the assignments are shown for clearness.
Once the edge weights are calculated then any assignment algorithm could be implemented to assign
reviewers to papers. Therefore, the accuracy of the automatic assignment depends mostly on the following
two aspects:
The method of describing papers and reviewers’ competences, and the similarity measure used to
calculate the weights.
The accuracy of the assignment algorithm itself.
This work mainly focuses on the assignment algorithms and proposes a new heuristic solution having
lower time complexity at the cost of just 1% loss of accuracy. I initially presented this algorithm in 2006 in
[13] but that paper did not explain the algorithm well enough, did not perform a detailed experimental
analysis and did not provide sufficient evidence for its accuracy and computational complexity. Thus, I
decided to publish it again – this time accompanied by pseudo code, examples and most importantly
comprehensive experimental analysis proving its qualities.
The rest of the paper is organized as follows:
Section 2 reviews related and previous work by other researchers. Although the manuscript focuses on
assignment algorithms, methods of describing papers and reviewers’ competences, and other assignment
approaches are also discussed for completeness.
Section 3 proposes a novel heuristic assignment algorithm that achieves accuracy of about 98-99% in
comparison to the maximum-weighted matching (the most accurate) algorithms, but has better time
complexity of Θ(n2). The algorithm provides an uniform distribution of papers to reviewers (i.e. workload
balancing); guarantees that if there is at least one reviewer competent to review a paper, then the paper will
have a reviewer assigned to it; and allows iterative and interactive execution that could further increase the
accuracy and enables subsequent reassignments.
Section 4 is devoted to experimental evaluation of the proposed algorithm in terms of assignment
accuracy. It could be determined as the ratio between the computed assignment and the best possible
assignment. The Hungarian algorithm is used as a reference since it guarantees finding the maximum-
weighted matching, i.e. the best possible solution. A number of experiments having thousands of tests are
performed here, followed by proper statistical analyses such as hypothesis testing and ANOVA.
Section 5 employs regression analysis to experimentally prove the algorithm’s time complexity of
Θ(n2). Additionally a direct comparison between the proposed and the Hungarian algorithm is presented in
terms of running time with number of papers ranging from 100 to 750, and reviewers from 60 to 450.
Section 6 continues experimental evaluation of both accuracy and time complexity, but this time using
real datasets taken from a series of nine already conducted conferences – CompSysTech from 2010 to 2018.
Section 7 gives useful ideas of how iterative and interactive execution could further increase the
assignment accuracy and provides higher flexibility.
Finally, the paper ends with conclusions in Section 8.
Appendix 1 presents an illustrated example of how the algorithm works.
Appendix 2 contains a detailed pseudo code that could be directly translated in any high-level
imperative programming language.
2. RELATED WORK
2.1. Calculating paper-reviewer similarities
Calculating paper-reviewer similarity factors mostly depends on the chosen method of describing
papers and reviewers’ competences. Generally, methods could be divided into two main groups:
Explicit methods. Authors and reviewers are required to provide additional information to
explicitly describe their papers and competences.
Implicit methods. No additional actions are required by users. Similarities are calculated based on
content analysis of publications.
The most commonly implemented explicit methods are selection of keywords/topics from a predefined
list and bidding. Calculating similarities based on keywords/topics selection also refers as feature-based
matching [29]. The idea is simple – during paper submission, authors select (usually from checkboxes) the
topics that best describe their papers. Reviewers do the same while registering to the conference management
system. Then the paper-reviewer similarities could be calculated in multiple ways. In case the list of topics is
presented as an unordered set, Dice’s [7] or Jaccard’s [11] similarity measures are good choice.
In case of feature-based matching, it is better to organize keywords/topics hierarchically in taxonomy.
In this way similarity measures, as the one proposed in [14], can take into account not just the number of
common keywords, but also to determine how semantically close the non-matching ones are. So, a non-zero
similarity could be calculated even if the paper and the reviewer do not share any keyword in common.
Bidding allows reviewers to explicitly state their willing/interest to review specific papers. However if
the number of submitted papers is high, reviewers are not likely to browse all papers and read their abstracts.
Thus, collected bids (or preferences) will be sparse and incomplete. This could be overcome in several ways.
First, the conference management system should recommend to each reviewer a small subset of papers that
will be most interesting to him/her. And second, the missing preferences could be “guessed” by applying
collaborative filtering techniques as suggested in [30, 5, 3]. The recommendation of papers to reviewers
could be based on paper-reviewer similarities calculated in any method.
Implicit methods do not require any additional actions from authors and reviewers. Instead,
similarities are calculated based on document content analysis. This is also known as profile-based matching
[29]. As its name suggests it relies on building papers’ and reviewers’ profiles. Papers are represented by
their content, while in most cases reviewers’ expertise is obtained from their previous publications. Then,
similarities are calculated based on IR techniques such as the Vectors Space Model (VSM), Latent Dirichlet
Allocation (LDA) [1] or other.
Andreas Pesenhofer et al. [28] suggest paper-reviewer similarities to be calculated as Euclidian
distance between the titles of the submitted papers and the titles of all reviewers’ publications. The latter are
fetched from CiteSeer and Google Scholar. The authors evaluated their proposal with data from ECDL 2005.
They noted that for 10 out of 87 PC members, no publications have been found and they got their papers to
review at random.
Stefano Ferilli et al. [10] use Latent Semantic Indexing (LSI) to automatically extract paper topics
from the titles and the abstracts of the submitted papers and from the titles of reviewers’ publications
obtained from DBLP. The proposal was evaluated by the organizers of the IEA/AIE 2005 conference. In
their opinion the average accuracy was 79%. According to the reviewers, the accuracy was 65% [10].
Laurent Charlin and Richard S. Zemel [3, 4] propose a standalone paper assignment recommender
system called “The Toronto Paper Matching System (TPMS)” that is also loosely coupled with Microsoft’s
Conference Management Toolkit. TPMS builds reviewers’ profiles based on their previous publications
obtained from Google Scholar or uploaded by the reviewers themselves. The latter actually allows reviewers
to control their profiles. The paper-reviewer scoring model is similar to the vector space model, but takes a
Bayesian approach [29]. By using Latent Dirichlet Allocation (LDA), TPMS can guess reviewers’ research
topics from their publications. To enhance accuracy, the system also supports reviewer’s self-assessment of
expertise in respect to the submitted papers.
Jennifer Nguyen et al [27] propose an interesting approach to apply Order Weighted Averaging
(OWA) function over multiple data sources to calculate the paper-reviewer similarities. The reviewer’s
profile consists of research interests (expertise), recency and quality. The quality refers to the number of
published papers (and their citations), written books and book chapters, and supervised PhD students. Data
are obtained from global sources like Aminer and ResearchGate, and local ones as TDX. Reviewers’
publications are not taken into account. Instead, reviewers’ research interests are directly taken form the
mentioned websites and aligned (automatically or by an expert) with the conference topics. Papers’ profiles
consist of concepts, obtained by applying LDA (Latent Dirichlet Allocation) on the entire collection of
submitted papers. Then paper concepts are translated to conference topics in an interesting manner – the
combination of each concept and each conference topic is sent as a search query to Google Scholar. The
number of returned results is normalized and combined with the “concept-to-paper” proportions provided by
LDA to calculate how much the paper is related to each conference topic. Finally, the paper-reviewer
similarity is calculated as an OWA function of reviewer’s expertise according to the topics coverage need of
the paper, recency, quality, and the availability of the reviewer. The authors tested their proposal with real
data taken from the International Conference of the Catalan Association for Artificial Intelligence (CCIA
2014, 2015, and 2016). However they report that among the original 96 reviewers, only 51 had skills
populated on their ResearchGate profiles, so they took into consideration only them [27].
Xiang Liu et al [19] propose a recommender system that calculates paper-reviewer similarities based
on three aspects of the reviewer: expertise, authority, and diversity. Authority refers to public recognition of
the reviewer in the scientific community, while diversity - whether he/she has diverse research interests and
background. Latent Dirichlet Allocation (LDA) is applied over the sets of submitted papers and reviewers’
publications to extract their topics. Then cosine similarity is used to calculate the relevance between the topic
vectors of each paper and each reviewer’s publication. Authority is determined by constructing a graph that
consists of the paper being processed and all of its candidate reviewers. Two reviewers are connected with an
edge if they have co-authored at least one paper. The weight of the edge depends on the number of papers
they co-authored. The intuition behind this is that if a reviewer is well connected, i.e. has many co-authors,
he or she would be considered as having higher authority [19]. A Random Walk with Restart (RWR) model
is employed on the graph to integrate expertise, authority and diversity. To test their approach, authors use
two datasets – one from NIPS 2006, accompanied by paper-reviewer relevance judgements provided by
NIPS experts. These judgements are used as a reference or ground truth as the authors call it. The other set is
derived from SIGIR 2007. Results show that their combined approach achieves higher precision than if no
authority and diversity were used. I found another interesting result in their data – in all experiments “Text
similarity” achieves better results than “Topic similarity”. So, pure VSM with proper term-weighing model
performs better than topics extraction by LDA followed by a cosine similarity of the topic vectors.
Xinlian Li et al [18] suggest that paper-reviewer similarity is calculated as a weighted sum of the
expertise degree of the reviewer and the relevance degree between the reviewer and the paper. The expertise
degree of the reviewer depends on quantity (total number of publications), quality (number of citations and
journal ranking) and freshness (time interval between the year of publication and the current year). The
relevance degree between a paper and a reviewer is calculated as the share of jointly-cited references
between the paper and the reviewer’s previous publications. The assumption is that the higher the number of
common references two papers have, the more similar research fields they are in [18]. The authors identify
three types of common referring: 1) direct referring: paper P quotes one of reviewer R’s publication directly;
2) same paper referring: both paper P and reviewer R refer to the same reference; and 3) same author
referring: both paper P and reviewer R cite the same author’s publication. Authors report they have tested
their approach with data obtained from DBLP, CiteSeerX and Journal Citation reports.
Another approach that combines multiple data sources is given by Don Conry et al [5]. They propose a
recommender system that predicts missing reviewer bids/preferences by using: explicit bids; predefined
conference topics/categories; topics, inferred by latent models such as LSI or LDA; paper-paper similarities;
reviewer-reviewer similarities; and conflicts of interest. These are all combined in a single similarity
measure. Paper-paper similarities are calculated in a VSM manner by the cosine of their abstracts. Reviewer-
reviewer similarities are determined by the number of commonly co-authored papers (obtained from DBLP).
Predefined conference topics contribute as a weighted sum of products of matching between topics and
papers, and topics and reviewers. Automatically inferred topics of a paper and a reviewer influence the
similarity between them by the inner product of their vectors. The authors test they approach with data taken
from ICDM’07. The results show that the assignment quality can be increased by providing more flexibility
with additional ratings from which to choose [5].
Marko Rodriguez and Johan Bollen [31] also exploit the idea that a manuscript’s subject domain can
be identified by its references. Their approach builds a co-authorship network that initially consists of the
authors who appear in the reference section of the paper. Then their names are sent to DBLP to find their co-
authors, who are also added to the network. Then the co-authors of the co-authors and etc. Finally, a relative-
rank particle-swarm algorithm is run for finding the most appropriate experts to review the paper. Rodriguez
and Bollen used real data set, taken from JCDL 2005 conference to evaluate their method. Results show that
89% of the reviewers and only 83% of the papers had identifiable authors in DBLP.
Ngai Meng Kou et al [15] state that a paper is well-reviewed only if the assigned reviewers have the
expertise to cover every single topic of the paper [15]. And that makes a lot of sense. However in most cases
each paper-reviewer similarity factor is individually calculated. Reviewers are also assigned to each paper
individually, regardless of the expertise of the other reviewers already assigned to it. To maximize the
coverage of the paper’s topics, the researchers propose that reviewers are not assigned to it individually, but
simultaneously as a group. So they redefine the classic reviewer assignment problem to Weighted-coverage
Group-based Reviewer Assignment Problem (WGRAP) [15]. The Author-Topic Model (ATM) proposed in
[32] is first use to extract a set of T topics and the topic vectors of reviewers from the reviewers’
publications. Then the topic vectors of papers are estimated by Expectation-Maximization (EM) [35] based
on the same set T. Instead of considering the expertise of a single reviewer in respect to a specific paper, the
approach now considers the expertise of the entire group of candidate reviewers. The expertise of the
reviewer group is a vector, which for every topic t stores the maximum expertise of any reviewer (inside the
group) in t.
Profile-based matching approaches are also applied by Denis Zubarev et al [36] for assigning
experts/reviewers to project proposals.
Some popular conference management systems, including OpenConf [46] and EDAS [41], as well as
the out of date and out of support CyberChair [49, 39] rely on their own greedy algorithms to assign
reviewers to papers. According to Prof. Philippe Rigaux [30, 47], Microsoft CMT [44] also employs a
greedy algorithm for that purpose.
OpenConf provides two assignment algorithms – “Topics Match” and “Weighted Topics Match”.
Both are open source, implemented in php, and could be downloaded from the OpenConf’s official web site
[46]. The first one processes papers sequentially starting from those having the highest number of common
keywords with the reviewers. This may not be a very good idea since the most problematic papers, those
describing by less keywords, will be processed last and on that time there may be no free reviewers
competent to evaluate them. When processing a paper the algorithm assigns the necessary number of
reviewers (2 or 3) to it starting from those who are still free and have the highest number of common
keywords with the paper. “Weighted Topics Match” computes a weight of each keyword that is proportional
to the number of reviewers who have chosen it and back proportional to the number of papers it describes.
Then the algorithm calculates a weight of each paper and each reviewer as a sum of the weights of their
keywords. So papers described by less keywords or papers described by keywords chosen from a small
amount of reviewers will get lower weight and will be processed first. In this way, the algorithm tries to
minimize the above-mentioned disadvantage of the greedy algorithms. The disadvantage of “Weighted
Topics Match” is that when assigning reviewers to a paper it does not measure the level of their competence
in respect to that specific paper. Instead, it sorts reviewers according to their global weight that is not related
to the paper at all. The only one requirement directly related to the paper being processed is that its reviewers
should have at least one keyword in common with it.
EDAS and CyberChair use simpler but better greedy algorithm. It relies mostly on reviewers’ bids.
Chosen keywords are taken into account only if no bids were specified. The algorithm processes papers
sequentially, starting from those having less bids, i.e. the most problematic papers. It assigns the necessary
number of reviewers to each paper, starting from those PC members who are still free and who have stated
the highest willingness to evaluate the paper.
Jennifer Nguyen et al [27] propose an iterative greedy algorithm that assigns one reviewer to every
paper on each iteration. Papers are processed sequentially, starting with the ones with the highest topic
coverage needs. That is a good decision because at the beginning there are more free reviewers who can
cover higher topic needs. The list of candidate reviewers is sorted by their similarity score (see section 2.1.
about how it is calculated) with the paper in descending order. The paper is assigned to the reviewer having
the highest score. If two or more reviewers have equal scores at the top of the list, then topic exclusiveness is
taken into account and the paper is assigned to the reviewer with the least exclusive topics (also a good
decision). Exclusiveness of a topic refers to the number of reviewers who are expert in it. Lower
exclusiveness means higher number of reviewers who are experts in the topic.
Xinlian Li et al [18] propose an algorithm to solve the nonlinear assignment problem that achieves
higher accuracy than greedy algorithms. Authors state their algorithm is successful in finding the maximum-
weighted assignment in most cases though it fails in small-scale matrixes [18]. Their results show it is twice
faster than the Hungarian algorithm [16,26], i.e. its time complexity is cubic O(n3).
Solving the Weighted-coverage Group-based Reviewer Assignment Problem (WGRAP) is even harder
due to the increased number of constraints it should satisfy. Its basic idea is to assign all of the reviewers
who should evaluate a paper simultaneously (as a group), so that the reviewers’ group covers as much as
possible all of the topics associated with the paper. To solve WGRAP, Ngai Meng Kou et al propose a
polynomial-time approximation algorithm which they call “Stage Deepening Greedy Algorithm (SDGA)”
[15]. At each stage, a classic linear assignment algorithm (e.g., Hungarian algorithm [16,26]) could be
applied to compute the assignment in polynomial time. The time complexity of SDGA, using the Hungarian
algorithm, is O(δp(max{P,R})3), where P – the number of papers, R – the number of reviewers, and δp –
reviewers’ group size, i.e. the number of reviewers, assigned to a paper. The algorithm improves the
approximation ratio of previous work from 1/3 to 1/2 and achieves optimality ratio of about 97% (for group
size of 3) and higher, if 4+ reviewers are assigned to papers. The approximation ratio is defined as the ratio
between the computed assignment and the optimal assignment. However, computing the latter may take very
long time even for small instances, that’s why for experimental evaluation the authors use optimality ratio. It
is the ratio between the computed assignment and the ideal assignment. The latter is obtained by greedily
assignment to each paper the best set (group) of reviewers, regardless their workloads [15]. Additionally, the
authors propose a stochastic refinement post-processing (SDGA-SRA) that could further improve the
optimality ratio by 1.4%.
The group-based reviewer assignment problem could be also solved by the greedy algorithm proposed
by Long et al. [21] in lower time complexity, but with lower approximation ratio of 1/3 [15].
Greedy algorithms usually complete in O(n m log2(m)) where n – the number of papers and m – the
number of reviewers. The logarithmic part comes from the need to sort all reviewers according to their level
of competence in respect to every single paper. However, an important remark should be made here:
Conference management systems are usually (almost always) implemented as web applications written in a
scripting language as php or perl, or in a language that is compiled to an intermediate interpretable language
such as C# or Java. All these languages provide a built-in (native) sort() function that is implemented as a
part of the language interpreter itself. Thus, sorting with the built-in function runs with the speed of the
compiled machine code of the interpreter, not with the speed of the interpretable high-level language.
According to professional programmers [22] the difference in the running time between the native sort()
function in php (that implements the quicksort algorithm) and the implementation of the same algorithm in
php code is about 22-23 times. When sorting an array of less than 5 million elements this speed difference of
22 times compensate or even exceed the value of the logarithmic component. Therefore, in the context of
conference management systems we can skip it. Additionally, to keep reviewers’ load in a reasonable range,
m should completely depend on n. Then we can conclude that in case of conference management systems
most greedy algorithms assign reviewers to papers with a time complexity of O(n2). All performed
experiments support this conclusion.
Unfortunately, it is unknown how the most popular conference management system EasyChair [40]
handles the automatic assignment of reviewers to papers. As it is provided as a completely hosted cloud
service, the only source of technical information about it is its official documentation. According to it,
EasyChair uses a special-purpose randomized algorithm to assign papers [40]. However, this information is
far from enough in order to make a fair analysis and comparison to the other algorithms.
If the calculated number of papers per reviewer is a fractional number, it should be ceiled to its closest
upper integer.
The algorithm takes as an input the so-called similarity matrix (SM). It contains the similarity factors
between all papers and all reviewers. Initially, rows represent papers, columns – reviewers. For a graphical
representation, please refer to the example in Appendix 1. It probably better illustrates how the algorithm
works than the generalized pseudo itself.
At the beginning, the algorithm sorts each row by similarity factor in descending order. As a result, the
first column of the matrix contains the most competent reviewer for every paper. Let denote the calculated
number of papers per reviewer (1) by n. If there are reviewers in the first column who are suggested to more
than n papers, then the algorithm cannot directly assign them to the corresponding papers. Doing that
violates the load-balancing requirement. Therefore, the algorithm needs to decide which n of all suggested
papers to assign to the respective reviewers. To do that it copies the first column of the matrix to an auxiliary
data structure (an ordinary array) and modifies all similarity factors in it by adding corrections C1 and C2. C1
depends on the number of reviewers competent to evaluate the paper being processed (pi), while C2 depends
on the rate of decrease in the competence of the next-suggested reviewers for pi. The basic idea of these two
corrections is to give a priority to papers having a short list of reviewers competent to evaluate them.
, , (2)
2,
, (3)
2∗ , , (4)
where:
, – Similarity factor between paper pi and its most competent reviewer, i.e. the
similarity factor in the first column of the row corresponding to pi.
, – Similarity factor between paper pi and its second-competent reviewer, i.e. the
similarity factor in the second column of the row corresponding to pi.
– Number of reviewers having a non-zero similarity factor with pi.
– Number of reviewers that should be assigned to pi.
To maintain a load balancing, no reviewer should be assigned to more papers than the calculated
number of papers per reviewer, i.e. n. If a reviewer rj is suggested to more than n papers then he/she is
assigned just to the first n of them which, after the modification, have the highest similarity factors with
him/her. All of the rest will be given to other reviewers during the next pass of the algorithm. Rows
corresponding to papers that will not be assigned to rj are shifted one position to the left, so that the next-
competent reviewers are suggested to these papers. As rj already has the maximum allowed number of papers
to review, no more papers should be assigned to him/her in future. Thus, all similarity factors outside the first
column between all papers and rj are deleted from the matrix. All these operations (including the
modification of similarity factors) are iteratively repeated while there are still reviewers in the first column of
the matrix who are suggested to more than n papers.
Eventually, after one or more passes, there will be no reviewers who appear more than n times in the
first column of the similarity matrix. At that point, all suggested reviewers could be directly assigned to the
corresponding papers. At the end of the algorithm each paper will have 1 (or 0 if there are no competent PC
members) new reviewer(s) assigned to it. If papers have to be evaluated by more than 1 reviewer (as they
usually do) the algorithm should be run as many times as needed. It is important however that each time the
algorithm takes as an input not the initial similarity matrix, but the one produced as a result from the
previous run (after deleting the entire first column). It guarantees that busy reviewers will not get any more
papers and each time the newly suggested reviewer will be different from those previously assigned.
Here are the major data structures used within the generalized pseudo code:
assignmentReady – a Boolean flag, indicating if the suggested reviewers could be directly
assigned to the corresponding papers. That is only possible if each reviewer in the first column of the
similarity matrix is suggested to not more than n papers. If the flag is true at the end of the current iteration
through the do-while cycle (lines 8-40) then the algorithm ends.
papersOfReviewer[rj] – an associative array, whose keys correspond to the usernames of the
reviewers who appear in the first column of the similarity matrix; and values each holding an array of papers
(and similarity factors) suggested for assignment to the respective reviewer rj. For implementation details
please refer to Appendix 2.
numPapersToAssign[rj] – an associative array, whose keys correspond to the usernames of the
reviewers who appear in the first column of the similarity matrix; and values each holding the maximum
number of papers that could be assigned to the respective reviewer rj.
To determine the algorithm’s time complexity we need to find the most frequently repeated
simple/scalar operation that has a constant complexity. In iterative algorithms, it is usually any simple
operation within the innermost cycle. In our case, this is modifying similarity factors on row 14 and shifting
rows one position to the left on row 28. Deleting similarity factors on row 31, on the other hand, is a
complex operation that has a linear complexity in respect to the number of papers. At the end of the
algorithm most reviewers should be marked as busy, that means the algorithm has performed up to m * (n -
papersPerReviewer) deletions. Where n is the number of submitted papers, m – the number of registered
reviewers and papersPerRevieweris a conference-specific constant that usually ranges from 3 to 7. To keep
reviewers’ workload within reasonable limits m should be tightly correlated to n. So deletion of a similarity
factor (that is a simple constant-complexity operation) on row 31 may overall occur up to n2 times.
Row 14 will be repeated exactly n times within a single iteration through the do-while cycle (rows 8-
40). The number of unique reviewers occurring in the first column of the matrix does not matter. If there are
just a few reviewers, they will have long lists of suggested papers. If there are many reviewers, then they will
have much shorter lists. But the total number of similarity factors that should be modified will be always n.
Row 28 will be repeated less than n times within a single iteration through the do-while cycle. More
precisely n – (count(j) * papersPerReviewer) times, where count(j) is the number of unique reviewers in the
first column of the matrix. Furthermore, we may safely assume that ignoring the very first (zero) element of
an array and shifting all the others one position to the beginning is a constant-complexity operation because
it is implemented just by changing the array pointer.
It seems that sorting on rows 1 and 17 may be a concern as it is done in O(n log2(n)). However, as
discussed earlier, if the algorithm is implemented in an interpretable language as part of a web application,
the logarithmic multiplier could be omitted. Then the overall time complexity of row 1 should be O(n2).
Please note that the same assumption is made for all existing heuristic algorithms reviewed in the “related
work” section, i.e. they step on an equal footing with the proposed one.
ALGORITHM 1
One of the fastest and most commonly implemented assignment algorithms is proposed by Kuhn and
Munkres [16, 26]. Its optimized version finds the maximum-weighted matching in O(n3). That motivates its
choice for a reference algorithm for this experimental analysis. Both algorithms are implemented in php
language – the proposed one by the author of this paper and the reference algorithm by Prof. Miki Hermann
[45]. The implementation of Prof. Hermann is taken out from The MyReview System [48] – a great open
source conference management system. It should be noted that Prof. Hermann’s implementation is used just
for the purpose of this experiment and not for any commercial use.
To achieve a fair and accurate comparison both algorithms (proposed and reference) should share the
same input data, i.e. the same similarity matrix. Then the accuracy of the proposed algorithm could be easily
determined by the weight of matching, produced by the two algorithms.
Formally:
х 100, % (6)
where:
w(M) – the weight of matching produced by the proposed algorithm;
w(Mref) – the weight of matching produced by the reference algorithm (Kuhn and Munkres). This is the
maximum-weighted matching for the corresponding similarity matrix.
Running the experiment just once or twice cannot guarantee reliability of the results and will not
provide enough data for a subsequent statistical analysis. So a special purpose software application has been
built (in php) that automates testing and performs as many tests and replications as needed.
On each test, it:
Generates a new and independent from the previous tests similarity matrix (SM) with size
specified by the experimenter. Similarity factors in it are fractional numbers within the range
[0.00, 1.00] which are pseudorandom generated by the Mersenne Twister algorithm [23]
proposed by Makoto Matsumoto and Takuji Nishimura. The algorithm is implemented by the
built-in php function mt_rand().
Generates a number (specified in percentage) of zero similarity factors and place them on
random positions in the similarity matrix.
Starts the proposed algorithm with the SM matrix as in input and measures the weight of the
output matching.
Start the reference algorithm (of Kuhn and Munkres) with the same SM matrix as an input and
measures the weight of output matching.
Calculates accuracy of the proposed algorithm by using equation (6) and stores all data from the
current test/iteration into a file.
Of course, the implemented data generator is not meant to replace tests with real data sets. Such
experiments will be performed as well. However the data generator allows the experimenter to set up and
precisely control some aspects of the assignment (for example number of papers and reviewers; percentage
of zero (missing) similarity factors; number of levels that similarity factors can have; number of reviewers
per paper; number of papers per reviewer; and etc.) and evaluate their influence on the accuracy.
Experiment E1: To determine accuracy of the proposed algorithm when assigning 90 papers to 60
reviewers while each paper is evaluated by 3 reviewers. That is a very common scenario.
A 1000 tests (observations) are made for the purpose of this experiment and for each of them a new
and independent (from the previous tests) similarity matrix is generated as an input. That provides more than
enough sample size for a subsequent statistical analysis and lowers the standard error of the mean. The same
similarity matrix is used as an input for both algorithms on each test. Then the algorithms are run one after
another and the accuracy of the proposed one is calculated by using eq. (6). The result is shown on figure 2
in the form of a histogram.
300
Number of tests (observations)
250
200
150
100
50
Accuracy, %
0
98 98.5 99 99.5 100 100.5
Experiment E2: To determine if and how the fractional part of the calculated number of papers per
reviewer influence the assignment’s accuracy.
In most cases, the calculated number of papers per reviewer is not an integer, but a float value - for
example 4.2. As it is quite pointless a reviewer to evaluate just part of a paper (20% in this case), the
calculated number is ceiled up to the next integer. In some cases, however its fractional part could actually
influence the accuracy of the assignment. It all depends on the working mode. The proposed algorithm, in
contrast to the reference one for example, can provide two types of paper distribution (assignment):
Uniform distribution – All reviewers evaluate exactly n or n-1 papers, where n is the ceiled
number of papers per reviewer. If the calculated number of papers per reviewer is 4.2, then n=5
and most of the reviewers will evaluate 4 papers while just a few reviewers will evaluate 5
papers.
Threshold distribution – There will be no reviewer evaluating more than n papers. This sets just
an upper limit, but there is no required minimum. Therefore, there may be reviewers evaluating
1 or 2 papers or even reviewers without any assigned papers at all. Removing the constraint to
assign at least n-1 papers to every reviewer allows the algorithm to give higher priority to
reviewers’ competence and to follow similarity factors better. This could increase the overall
accuracy of the assignment. If the fractional part of the calculated number of papers per reviewer
is lower (but not x.0) the algorithm will have much more choices during the assignment,
resulting in higher accuracy. If the fractional part is higher (x.9 for example) the assignment will
get closer to uniform distribution lowering the accuracy a little bit.
As the reference algorithm of Kuhn and Munkres does not support threshold distribution, this
untypical working mode is not discussed much in this paper. All experiments are conducted under uniform
distribution, so that both algorithms step on an equal footing. In case of uniform distribution of papers to
reviewers, the fractional part of the calculated number of papers per reviewer should not influence the
accuracy of the assignment as it does in threshold distribution. The aim of this experiment is to check if this
assumption is true. If it is not, then all other experiments should be planned in a way that the number of
papers per reviewer remains the same for all tests.
Four experiments are conducted with a 1000 tests (observations) in each experiment. The same
amount of papers (90) is assigned in each experiment while the number of reviewers varies a little bit, so we
can achieve a different fractional part (from x.1 to x.9) of the calculated number of papers per reviewer.
Results are summarized in table 1 and figure 3.
Table 1. Results from the experiment that checks how the fractional part of the calculated
number of papers per reviewer influences the accuracy of the assignment
Experiment # 1 2 3 4
Papers / Reviewers 90 x 66 90 x 60 90 x 55 88 х 66
Reviewers per paper 3 3 3 3
Papers per reviewer 4.09 4.5 4.91 4.0
Accuracy H0: µ = 99.14 H0: µ = 99.63 H0: µ = 99.78 H0: µ = 99.76
not rejected not rejected not rejected not rejected
Standard deviation 0.2272 0.1452 0.1570 0.2083
Variance 0.0516 0.0211 0.0246 0.0434
95% conf. interval / 99.1252 99.6208 99.7739 99.7509
α = 0.05 99.1534 99.6388 99.7933 99.7767
Although the tight non-overlapping confidence intervals reliably suggest that the fractional part of the
calculated number of papers per reviewer actually does influence on the assignment accuracy, an ANOVA
analysis is also performed to prove it statistically. It is done by using the Statistic Toolbox of MATLAB [43].
To avoid typing of large data arrays (4 x 1000 observations) by hand, an application programming interface
is built between the implemented special-purpose software for experimental analysis and MATLAB. Of
course, the conducted analysis of variances proves that the fractional part of the number of papers per
reviewer does influence the accuracy of the assignment.
200 300 300
180
250 250
160
140
200 200
120
80
100 100
60
40
50 50
20
0 0 0
98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5
a) 4.09 papers per reviewer b) 4.5 papers per reviewer c) 4.91 papers per reviewer
300
250
200
150
100
50
0
98 98.5 99 99.5 100 100.5
Experiment E3: To determine if and how the number of submitted papers and registered reviewers
influence the accuracy of the assignment.
Five experiments are conducted with a 1000 tests (observations) in each experiment. Three reviewers
are assigned to each paper and the number of papers per reviewer is 4.5 every time. The only aspect that
varies along the different experiments is the number of papers and reviewers. As in the previous experiment,
for each single test a new and independent (from the previous tests) similarity matrix is generated as in input.
Results are summarized in table 2 and figure 4.
Table 2. Results from the experiment that checks how
the number of papers and reviewers influence the accuracy of the assignment
Experiment # 1 2 3 4 5
Papers / Reviewers 50 x 33 75 x 50 90 x 60 120 x 80 150 x 100
Reviewers per paper 3 3 3 3 3
Papers per reviewer 4.5 4.5 4.5 4.5 4.5
Accuracy H0: µ = 99.49 H0: µ = 99.58 H0: µ = 99.63 H0: µ = 99.69 H0: µ = 99.73
not rejected not rejected not rejected not rejected not rejected
Standard deviation 0.2972 0.1780 0.1452 0.1045 0.0867
Variance 0.0883 0.0317 0.0211 0.0109 0.0075
95% conf. interval / 99.4727 99.5648 99.6208 99.6791 99.7239
α = 0.05 99.5095 99.5868 99.6388 99.6921 99.7347
The tight non-overlapping confidence intervals reliably suggest that the number of papers and
reviewers do influence on the assignment accuracy. However, an ANOVA analysis is also performed to
prove it statistically. As seen on figure 5, increasing the number of papers and reviewers increases the
accuracy a little bit and more importantly significantly reduces dispersion around the mean. The most
probable reason is that the higher number of reviewers provides more choices during the assignment.
140 250 300
120 250
200
100
200
150
80
150
60
100
100
40
50
50
20
0 0 0
98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5
350 400
300 350
300
250
Figure 4. Accuracy histograms of the
250
200 proposed algorithm in respect to a
150
200
different number of papers and
100
150
reviewers
100
50
50
0
98 98.5 99 99.5 100 100.5 0
98 98.5 99 99.5 100 100.5
An ANOVA analysis is not performed this time, because the result of it is quite clear as the confidence
levels do not overlap. Expectedly the amount of zero-calculated similarity factors does influence on the
accuracy of the assignment. However, it affects mostly the dispersion around the mean rather than the mean
itself. That is clearly illustrated on the box and whisker (quartile) diagram on figure 5.
100
Accuracy, %
99.8
99.6
99.4
99.2
99
98.8
98.6
98.4
98.2
98
No minimal
25% zeros 50% zeros 75% zeros
value
Figure 5. Box and whisker diagram (quartiles diagram) showing how the amount
of zero-calculated similarity factors influences the accuracy of the assignment.
(7)
where:
ŷ – execution time of the algorithm, predicted by the model.
x – number of submitted papers.
b0, b1 and b2 – regression model parameters.
Model parameters could be determined by the method of least squares. However in the current
experiment they are calculated by using the regress() function from the MATLAB’s Statistics Toolbox. They
are as follows:
b0 = 0.115, b1 = -0.0016 and b2 = 0.000021.
So that the regression model is:
0.115 0.0016 0.000021 (8)
The meaning of regression parameters will be discussed later, but before that it is mandatory to check
if the model itself is appropriate (fits measurements well).
As all tests are performed with replication, i.e. multiple independent tests are conducted for every
single value of x, the total error could be divided into two components – error due to the lack of model fit
and error due to data/tests replication (sometimes called “pure” error). The model is considered to be
appropriate if its error, represented by the “lack-of-fit variance”, does not exceed the data error, represented
by the “data replication variance” [24, 25]. Then the goodness of fit could be determined by comparing these
two variances by Fisher’s F-test [20]. The calculated value of F based on the data from table 4, for
significance level α=0.05, and degrees of freedom d1=11 and d2=28 is F=0.60893. As it is lower than the
critical value Fα=0.05;11;28=2.15, then the second order polynomial regression model (7) is valid/good. So the
experiment confirms that the time complexity of the algorithm is Θ(n2).
The goodness of fit could be also confirmed visually on figure 6. The green line represents the curve
drawn by the regression model itself while the circles represent the real time measurements. Note how well
the real data points lie on the model curve. That visually proves the model is valid.
Table 4. Results from the experimental evaluation of the algorithm’s time complexity
Number of papers
0
100 200 300 400 500 600 700 800
Figure 6. Dependence of the total execution time (including the initial sorting of the matrix)
of the proposed algorithm on the number of submitted papers.
Visual evaluation of the model’s goodness of fit.
In general, regression parameters, bi, are strongly dependent on the hardware performance. Although
the current experiment is conducted on an old laptop computer their low value means a low amount of time
needed for processing a single paper. The coefficient b0 represents the total influence of the so called
uncontrollable/random factors (variables) and consists mostly of the execution times of the operations
performed roughly the same number of times, regardless the size of the input data. b1 is mostly influenced
by: forming the papersOfReviewer[] array; calculating similarity factors’ corrections; shifting rows
one position to the left; and assigning reviewers from the first column of the matrix to the corresponding
papers. Initial sorting of the matrix and deleting similarity factors from it have the highest contribution to b2.
In all previous experiments, the proposed algorithm has been compared in term of accuracy to the
maximum-weighted matching algorithm of Kuhn and Munkres. As the latter is one of the fastest and most
efficient maximum-weighted matching algorithms, the comparison continues but now in terms of running
time. Table 5 shows the execution times of both algorithms. As the number of papers and reviewers is the
same as in table 4, the execution time of the proposed algorithm is taken from there (as an average from all 3
replications for every test). The implementation of the Kuhn and Munkres algorithm is run without any
replication, because that would take enormous amount of time.
Table 5. Comparison, in terms of execution time, between the proposed algorithm
and the maximum-weighted matching algorithm of Kuhn and Munkres
Execution time, seconds
All experiments are performed on an old laptop computer – CPU: Dual Core Intel Celeron T 3000 (1.8
GHz); RAM: 4 GB (DDR2); OS: Windows 7 Ultimate 32 bit; Web server: Apache 2.2.24 with PHP 5.4.13.
As known the most efficient implementation of the Kuhn and Munkres algorithm has a time
complexity of O(n3). We can note from table 6 that indeed the execution time is a cubic function of the
number of papers. Therefore, the implementation of the algorithm by Miki Hermann and Philippe Rigaux is
good enough and does not increase the previously known time complexity of this algorithm.
89 x 73 х 3
CompSysTech’13 227.18 10.2308 s 221.72 0.0713 s 97.6 %
(3.66)
94 х 87 х 3
CompSysTech’12 244.40 12.9081 s 240.74 0.0761 s 98.5 %
(3.24)
183 х 98 х 3
CompSysTech’11 474.75 121.5543 s 470.20 0.2378 s 99 %
(5.60)
134 х 77 х 3
CompSysTech’10 296.17 52.0996 s 289.55 0.0965 s 97.8 %
(5.22)
Besides that the accuracy is within the expected range, the significant difference in the execution time
of the proposed algorithm in respect to the reference one is quite noticeable. For example, for
CompSysTech’11 the algorithm of Kuhn and Munkres assigns 183 papers to 98 reviewers for more than 121
seconds while the proposed algorithm does the same just for 0.24 seconds. Furthermore, this significant
speed improvement is at expense of just 1% loss of accuracy.
Again, all experiments are performed on the same laptop computer – CPU: Dual Core Intel Celeron T
3000 (1.8 GHz); RAM: 4 GB (DDR2); OS: Windows 7 Ultimate 32 bit; Web server: Apache 2.2.24 with
PHP 5.4.13.
8. CONCLUSION
The manual assignment of reviewers to papers is applicable only to conferences having a low number
of submitted papers and registered reviewers. It assumes that PC chairs get familiar with the papers’ content
and reviewers competences and assign them in a way that every paper is evaluated by highly competent (in
its subject domain) reviewers while maintaining a load balancing of the PC members, so that all reviewers
evaluate roughly the same number of papers. In this sense, the assignment is a typical example of a classical
optimization task where limited resources (reviewers) should be assigned among a number of consumers
(papers), balancing between accuracy and uniformity of workload distribution.
In case of a large number of papers and reviewers, the manual assignment becomes extremely difficult
and time consuming and usually leads to dramatic decrease in the assignment accuracy. That is why every
conference management system should offer an automatic assignment as well. As an optimization problem,
it could be solved by any of the existing exhaustive search or heuristic algorithms. The relatively high time
complexity of O(n3) of exhaustive search algorithms makes them not quite suitable for implementation in
web applications in case of a large number of papers and reviewers. Existing heuristic algorithms, on the
other hand, are all designed as greedy algorithms assigning the most competent reviewers to papers
processed first and almost random reviewers to papers processed last.
The heuristic assignment algorithm, proposed in this article, achieves accuracy of about 98-99% in
comparison to the maximum-weighted matching (the most accurate / exhaustive search) algorithms, but has
better time complexity of Θ(n2). It provides an uniform distribution of papers to reviewers (i.e. all reviewers
evaluate the same number of papers); guarantees that if there is at least one reviewer competent to evaluate a
paper, then the paper will have a reviewer assigned to it; and allows iterative and interactive execution that
could further increase the accuracy and enables subsequent reassignments. Both accuracy and time
complexity are experimentally confirmed by performing a large number of experiments, followed by proper
statistical analyses.
Although the suggested algorithm is designed to be used in conference management systems, it is
universal and could be successfully implemented in other subject domains, where assignment or matching is
necessary. For example: assigning resources to consumers, tasks to persons, matching men and women on
dating web sites, grouping documents in digital libraries and others.
COMPLIANCE WITH ETHICAL STANDARDS
Conflict of Interest: The author declares that he has no conflict of interest.
Funding: There has been no financial support for this work that could have influenced its outcome.
REFERECES
1. Blei, David M., Ng, Andrew Y., Jordan, Michael I., and Lafferty, John. Latent dirichlet allocation.
Journal of Machine Learning Research, 3:2003, 2003.
2. Cechlárová K, Fleiner T, Potpinková E. Assigning evaluators to research grant applications: the case
of Slovak Research and Development Agency. Scientometrics, Volume 99, Issue 2, pp. 495-506,
Springer Netherlands, May 2014, Print ISSN 0138-9130, Online ISSN 1588-2861
3. Charlin, Laurent, and Richard Zemel. "The Toronto paper matching system: an automated paper-
reviewer assignment system." (2013).
4. Charlin, L., Zemel, R. and Boutilier, C. A framework for optimizing paper matching. In Proceedings
of the 27th Annual Conference on Uncertainty in Artificial Intelligence (Corvallis, OR, 2011). AUAI
Press, 86–95.
5. Conry, Don, Yehuda Koren, and Naren Ramakrishnan. "Recommender systems for the conference
paper assignment problem." In Proceedings of the third ACM conference on Recommender systems,
pp. 357-360. 2009.
6. Cormen, T. H., Leiserson, C., Rivest, R., Stein, C. Introduction to Algorithms (2nd ed.). MIT Press
and McGraw-Hill. ISBN 0-262-03293-7, 2001.
7. Dice, Lee R. (1945). "Measures of the Amount of Ecologic Association Between Species". Ecology.
26 (3): 297–302. doi:10.2307/1932409. JSTOR 1932409.
8. Dinic, E. A. Algorithm for solution of a problem of maximum flow in a network with power
estimation. Soviet Math. Dokl., 11(5): 1277–1280, 1970.
9. Edmonds, Jack; Karp, Richard M. "Theoretical improvements in algorithmic efficiency for network
flow problems". Journal of the ACM (Association for Computing Machinery) 19(2): 248–264, 1972.
doi:10.1145/321694.321699
10. Ferilli S., N. Di Mauro, T.M.A. Basile, F. Esposito, M. Biba. Automatic Topics Identification for
Reviewer Assignment. 19th International Conference on Industrial, Engineering and Other
Applications of Applied Intelligent Systems, IEA/AIE 2006. Springer LNCS, 2006, pp. 721-730.
11. Jaccard, Paul (1912), "The Distribution of the flora in the alpine zone", New Phytologist, 11: 37–50,
doi:10.1111/j.1469-8137.1912.tb05611.x
12. Kalinov, K. Practical Statistics for Social Sciences, Archeologists and Anthropologists. New
Bulgarian University, Sofia, 2002 (in Bulgarian).
13. Kalmukov, Y. An algorithm for automatic assignment of reviewers to papers. Proceedings of the
International Conference on Computer Systems and Technologies CompSysTech’06, Ruse, 2006,
pp. V.5-1-V.5-7.
14. Kalmukov, Y. Describing Papers and Reviewers’ Competences by Taxonomy of Keywords.
Computer Science and Information Systems, vol. 9, no. 2, 2012, pp. 763-789, ISSN 1820-0214.
15. Kou, Ngai Meng, Leong Hou U, Nikos Mamoulis, and Zhiguo Gong. "Weighted coverage based
reviewer assignment." In Proceedings of the 2015 ACM SIGMOD international conference on
management of data, pp. 2031-2046. 2015.
16. Kuhn, Harold W. "The Hungarian Method for the assignment problem". Naval Research Logistics
Quarterly. 1955:(2), pp. 83–97.
17. Lawler, E. L. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, Winston,
Newyork, 1976
18. Li, Xinlian, and Toyohide Watanabe. "Automatic paper-to-reviewer assignment, based on the
matching degree of the reviewers." Procedia Computer Science 22 (2013): 633-642.
19. Liu, Xiang, Torsten Suel, and Nasir Memon. "A robust model for paper reviewer assignment." In
Proceedings of the 8th ACM Conference on Recommender systems, pp. 25-32. 2014.
20. Lomax, Richard G. (2007). Statistical Concepts: A Second Course. p. 10. ISBN 0-8058-5850-4
21. Long, Cheng, Raymond Chi-Wing Wong, Yu Peng, and Liangliang Ye. "On good and fair paper-
reviewer assignment." In 2013 IEEE 13th International Conference on Data Mining, pp. 1145-1150.
IEEE, 2013.
22. Lowik, P. Comparative analysis between PHP’s native sort function and quicksort implementation in
PHP. http://stackoverflow.com/a/1282757, August 2009.
23. Matsumoto, M., Nishimura, T. "Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator." ACM Transactions on Modeling and Computer Simulation
(TOMACS), 8(1), 1998, pp. 3-30.
24. Mitkov., A. Theory of the Experiment. “Library for PhD Students” Series. Ruse, 2010, ISBN: 978-
954-712-474-5 (in Bulgarian)
25. Mitkov, A., Minkov, D. Methods for Statistical Analysis and Optimization of Agriculture machinery
– 2-nd part. Zemizdat Publishing House, Sofia 1993, ISBN: 954-05-0253-5 (in Bulgarian).
26. Munkres, J. "Algorithms for the Assignment and Transportation Problems". Journal of the Society
for Industrial and Applied Mathematics. 5:1, 1957, pp. 32–38.
27. Nguyen, Jennifer, Germán Sánchez-Hernández, Núria Agell, Xari Rovira, and Cecilio Angulo. "A
decision support tool using Order Weighted Averaging for conference review assignment." Pattern
Recognition Letters 105 (2018): 114-120.
28. Pesenhofer A., R. Mayer, A. Rauber. Improving Scientific Conferences by enhancing Conference
Management System with information mining capabilities. Proceedings IEEE International
Conference on Digital Information Management (ICDIM 2006), ISBN: 1-4244-0682-x; S. 359 - 366.
29. Price, Simon, and Peter A. Flach. "Computational support for academic peer review: A perspective
from artificial intelligence." Communications of the ACM 60, no. 3 (2017): 70-79.
30. Rigaux, Ph. An Iterative Rating Method: Application to Web-based Conference Management.
Proceedings of the 2004 ACM Symposium on Applied Computing (SAC'04), ACM Press N.Y., pp.
1682 – 1687, ISBN 1-58113-812-1.
31. Rodriguez M., J. Bollen. An Algorithm to Determine Peer-Reviewers. Conference on Information
and Knowledge Management (CIKM 2008), ACM Press, pp. 319-328.
32. Rosen-Zvi, Michal, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. "The author-topic model
for authors and documents." arXiv preprint arXiv:1207.4169 (2012).
33. Taylor, Camillo J. "On the optimal assignment of conference papers to reviewers." (2008).
34. W.G. Cochran, The distribution of the largest of a set of estimated variances as a fraction of their
total, Annals of Human Genetics (London) 11(1), 47–52 (January 1941)
35. Zhai, ChengXiang, Atulya Velivelli, and Bei Yu. "A cross-collection mixture model for comparative
text mining." In Proceedings of the tenth ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 743-748. 2004.
36. Zubarev, Denis, Dmitry Devyatkin, Ilia Sochenkov, Ilia Tikhomirov, and Oleg Grigoriev. "Expert
Assignment Method Based on Similar Document Retrieval." In Data Analytics and Management in
Data Intensive Domains: ХХI In-ternational Conference DAМDID/RCDL'2019 (October 15–18,
2019, Kazan, Russia): Conference Proceedings. Edited bу Alexander Elizarov, Boris Novikov,
Sergey Stupnikov.–Kazan: Kazan Federal University, 2019.–496 р., p. 339. 2019.
37. ADBIS 2007, International Conference on Advances in Databases and Information Systems,
http://www.adbis.org/
38. CompSysTech, International Conference on Computer Systems and Technologies,
http://www.compsystech.org/
39. CyberChair, A Web-based Paper Submission & Review System,
http://www.borbala.com/cyberchair/
40. EasyChair, conference management system, http://www.easychair.org/
41. EDAS: Editor’s Assistant, conference management system, http://edas.info/
42. Halevi, Shai. Web Submission and Review Software, http://people.csail.mit.edu/shaih/websubrev/
43. MathWorks. MATLAB – Statistic Toolbox. http://www.mathworks.com/products/statistics/
44. Microsoft Conference Management Toolkit, https://cmt3.research.microsoft.com/About
45. Miki Hermann, Professor in Algorithms and Complexity. http://www.lix.polytechnique.fr/~hermann/
46. OpenConf Conference Management System, http://www.openconf.com/
47. Philippe Rigaux, http://deptinfo.cnam.fr/~rigaux/
48. The MyReview System, a web-based conference management system,
http://myreview.sourceforge.net/ (Accessed January 2017. Unavailable now)
49. van de Stadt, R., CyberChair: A Web-Based Groupware Application to Facilitate the Paper Reviewing
Process. 2001, Available at www.cyberchair.org.
APPENDIX 1: AN EXAMPLE
Think of a hypothetical conference. Let’s assume there are 5 submitted papers and 5 registered
reviewers. Each paper should be evaluated by 2 reviewers, so every reviewer should evaluate exactly (5*2)/5
= 2.0 papers.
Let the similarity matrix is as follows. Rows represent papers and columns represent reviewers.
0.60 0.53 0.47 0.40 0.55
0.89 0.25 0.50 0.65 0.80
0.50 0.55 0.37 0.57 0.60
0.53 0 0 0.40 0.50
0.50 0 0 0.33 0.25
At the beginning, the algorithm sorts each row by the similarity factor in descending order. As a result,
the first column suggests the most competent reviewer to every single paper.
0.60 0.55 0.53 0.47 0.40
0.89 0.80 0.65 0.50 0.25
0.60 0.57 0.55 0.50 0.37
0.53 0.50 0.40 0 0
0.50 0.33 0.25 0 0
The first column suggests that the reviewer r1 should be assigned to 4 papers (p1, p2, p4 and p5).
However, the maximum allowed number of papers per reviewer is 2, i.e. nobody should review more than two
papers. So the algorithm has to decide which 2 of these 4 papers to assign to r1. At a glance it seems logical
that r1 should be assigned to p1 and p2, as they have the highest similarity factors with him/her. On the other
hand, there are fewer reviewers competent to evaluate p4 and p5 so they should be processed with priority. If r1
has to be assigned just to p4 or p5 which one is more suitable? One may say p4 as it has higher similarity
factor. However, the second-suggested reviewer of p4 is almost as competent as r1, while the second-suggested
reviewer of p5 is much less competent in it than r1 is. In this case it is better to assign r1 to p5 rather than to p4.
If r1 is assigned to p4, then p5 will be evaluated by less competent reviewers only, a situation that is highly
undesirable. So when deciding which papers to assign to a specific reviewer, the algorithm should take into
account both the number of competent reviewers for each paper as well as the rate of decrease in the
competence of the next-suggested reviewers for those papers. To automate the process the algorithm modifies
the similarity factors from the first column by adding two corrections – C1 and C2. They are calculated by
formulas 3 and 4. C1 takes into account the number of non-zero similarity factors with pi (i.e. the number of
reviewers competent to evaluate pi), while C2 depends on the rate of decrease in the competence of the next-
suggested reviewers for pi.
The specific values of C1 и C2 are as follows:
C2 p1, r1 2 * 0.05 0.1
C1 p1, r1 0.0625
C2 p2, r1 2 * 0.09 0.18
C1 p2, r1 0.0625
C2 p3, r5 2 * 0.03 0.06
C1 p3, r5 0.0625
C2 p4, r1 2 * 0.03 0.06
C1 p4, r1 0.25
C2 p5, r1 2 * 0.17 0.34
C1 p5, r1 0.25
To preserve the real weight of matching, similarity factors should be modified in an auxiliary data
structure (an ordinary array) rather than the matrix itself. Here is the first column stored in a single-dimension
array.
Now r1 is suggested not to 4 but just to 2 papers. However, after all operations performed above, r5 is
now suggested to 3 papers. To decide which 2 of these 3 papers to assign to r5, the algorithm again modifies
the similarity factors taken from the newly-formed first column of the matrix. As in the previous step, this is
done by using formulas 3 and 4. After modification the first column will look like:
As seen in the first column, no reviewer is suggested to evaluate more than 2 papers. So it is now
possible to assign all reviewers from the first column directly to the papers which they are suggested to. If the
last matrix is compared to the initial one, it could be spotted that 3 of 5 papers (p2, p3 и p5) are assigned to
their most competent reviewers. One (p4) has got its second-competent reviewer and another one (p1) its third-
competent reviewer. However, the levels of competence of these reviewers in respect to p4 and p1 are very
close to the levels of the most competent reviewers for these papers. r5 is assigned to p4 with a similarity factor
of 0.50 while the most competent reviewer for p4 has a similarity factor of 0.53.
rowsToShift[] – an array containing the row ids (these are actually the paper ids) that should be
shifted one position to the left, so that the next-competent reviewer is suggested to this paper.
signifficantSF[paperId] – an array holding the number of significant, non-zero, similarity
factors for every paper, identified by its paperId.
reviewersToRemove[] – an array holding the identifiers (revUser) of the reviewers who are
already busy (i.e. have enough papers to review) and should be reviewed from the similarity matrix (except
its first row) on the next pass through the outermost do-while cycle, so that they are not assigned to any more
papers.
busyReviewers[] – an array holding the identifiers of the reviewers who are already busy (i.e.
have enough papers to review). This is similar to reviewersToRemove with one major difference –
busyReviewers keeps identifiers all the time, while reviewersToRemove is cleared on each pass after
deleting the respective reviewers (similarity factors) from the similarity matrix.
maxPapersToAssign[j] – an array holding the maximum number of papers that could be
assigned to every reviewer j, identified by its revUser.