0% found this document useful (0 votes)
11 views33 pages

An Algorithm For Automatic Assignment of

Uploaded by

Thư Ngô
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views33 pages

An Algorithm For Automatic Assignment of

Uploaded by

Thư Ngô
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

This is an author’s accepted manuscript of:

Kalmukov, Y. An algorithm for automatic assignment of reviewers to papers.


Scientometrics 124(3), pp. 1811-1850 (2020), https://doi.org/10.1007/s11192-020-03519-0

The final publication is available at:


https://link.springer.com/article/10.1007/s11192-020-03519-0

The author’s accepted manuscript is made public in accordance with the copyright agreement with
the publisher:
“You may further deposit the accepted manuscript version in any repository, provided it is only
made publicly available 12 months after official publication or later and provided acknowledgement
is given to the original source of publication and a link is inserted to the published article on
Springer's website.”
An algorithm for automatic assignment of reviewers to papers
Yordan Kalmukov
Department of Computer Systems and Technologies,
University of Ruse,
8 Studentska Str., 7017 Ruse, Bulgaria,
[email protected]

Abstract: The assignment of reviewers to papers is one of the most important and challenging tasks in
organizing scientific conferences and a peer review process in general. It is a typical example of an
optimization task where limited resources (reviewers) should be assigned to a number of consumers (papers),
so that every paper should be evaluated by highly competent, in its subject domain, reviewers while
maintaining a workload balancing of the reviewers.
This article suggests a heuristic algorithm for automatic assignment of reviewers to papers that
achieves accuracy of about 98-99% in comparison to the maximum-weighted matching (the most accurate)
algorithms, but has better time complexity of Θ(n2). The algorithm provides an uniform distribution of
papers to reviewers (i.e. all reviewers evaluate roughly the same number of papers); guarantees that if there
is at least one reviewer competent to evaluate a paper, then the paper will have a reviewer assigned to it; and
allows iterative and interactive execution that could further increase accuracy and enables subsequent
reassignments. Both accuracy and time complexity are experimentally confirmed by performing a large
number of experiments and proper statistical analyses.
Although it is initially designed to assign reviewers to papers, the algorithm is universal and could be
successfully implemented in other subject domains, where assignment or matching is necessary. For
example: assigning resources to consumers, tasks to persons, matching men and women on dating web sites,
grouping documents in digital libraries and others.

Keywords: heuristic assignment algorithm; bipartite graph matching; assignment of reviewers to


papers; conference management.
MSC codes: 68R10 (Graph theory), 05C70 (Graph Matching), 05C85 (Graph algorithms), 90B50
(Management decision making), 90B70 (Manpower planning), 68P20 (Information storage and retrieval),
68T20 (heuristics, search strategies, etc.), 68U35 (Information systems)
JEL codes: D83 Search, Learning, Information and Knowledge; L86 Information and Internet
Services, Computer Software; C88 Other Computer Software; C12 Hypothesis Testing; C25 Discrete
Regression and Qualitative Choice Models,

1. INTRODUCTION
One of the most important and challenging task in organizing scientific conferences is the assignment
of reviewers to papers. It directly affects the conference’s quality and its public image. As the quality of a
conference depends mostly on the quality of the accepted papers, it is crucial that every paper is evaluated by
the most competent in its subject domain reviewers. The assignment itself is a typical example of a classical
optimization task where limited resources (reviewers) should be assigned to a number of papers in a way
that:
 Every paper should be evaluated by the most competent (in its subject domain) reviewers.
 Reviewers assigned to a paper should not fall in a conflict of interests with it.
 All reviewers should evaluate roughly the same amount of papers, i.e. reviewers should be
equally loaded.
In general reviewers could be assigned to papers in two ways:
 Manual
 Automatic
Manual assignment is only applicable to “small” conferences having a small number of submitted
papers and registered reviewers. It requires that Programme Committee (PC) chairs familiarize themselves
with all papers and reviewers’ competences then assign the most suitable reviewers to each paper, while
maintaining a load balancing so that all reviewers evaluate roughly the same number of papers. Doing that
for a large number of papers and reviewers is not just hard and time consuming, but due to the many
constraints (expertise, load balancing, conflict of interests and etc.) that should be taken into account the
manual assignment gets less and less accurate with increasing the number of papers and reviewers. For that
reason, all commercially available conference management systems (CMS) implement an automatic
assignment process that tries to meet all of the three assignment requirements stated earlier.
The non-intersecting sets of papers and reviewers can be represented by a complete weighted bipartite
graph G = (P + R, E) (figure 1), where P is the set of all submitted papers, R – the set of all registered
reviewers and E – the set of all edges. There is an edge from every paper to every reviewer and every edge
should have a weight. In case of a zero weight, the corresponding edge may be omitted turning the graph to a
non-complete one. The weight of the edge between paper pi and reviewer rj tells us how competent (suitable)
is rj to review pi. This measure of suitability is called a similarity factor. The weights are calculated or
assigned in accordance with the chosen method of describing papers and reviewers’ competences [14].

Figure 1. The sets of papers (P) and reviewers (R) represented as a complete weighted bipartite
graph. The edges in bold are the actual assignments suggested by an assignment algorithm. All edges have
weights but just those of the assignments are shown for clearness.

Once the edge weights are calculated then any assignment algorithm could be implemented to assign
reviewers to papers. Therefore, the accuracy of the automatic assignment depends mostly on the following
two aspects:
 The method of describing papers and reviewers’ competences, and the similarity measure used to
calculate the weights.
 The accuracy of the assignment algorithm itself.
This work mainly focuses on the assignment algorithms and proposes a new heuristic solution having
lower time complexity at the cost of just 1% loss of accuracy. I initially presented this algorithm in 2006 in
[13] but that paper did not explain the algorithm well enough, did not perform a detailed experimental
analysis and did not provide sufficient evidence for its accuracy and computational complexity. Thus, I
decided to publish it again – this time accompanied by pseudo code, examples and most importantly
comprehensive experimental analysis proving its qualities.
The rest of the paper is organized as follows:
Section 2 reviews related and previous work by other researchers. Although the manuscript focuses on
assignment algorithms, methods of describing papers and reviewers’ competences, and other assignment
approaches are also discussed for completeness.
Section 3 proposes a novel heuristic assignment algorithm that achieves accuracy of about 98-99% in
comparison to the maximum-weighted matching (the most accurate) algorithms, but has better time
complexity of Θ(n2). The algorithm provides an uniform distribution of papers to reviewers (i.e. workload
balancing); guarantees that if there is at least one reviewer competent to review a paper, then the paper will
have a reviewer assigned to it; and allows iterative and interactive execution that could further increase the
accuracy and enables subsequent reassignments.
Section 4 is devoted to experimental evaluation of the proposed algorithm in terms of assignment
accuracy. It could be determined as the ratio between the computed assignment and the best possible
assignment. The Hungarian algorithm is used as a reference since it guarantees finding the maximum-
weighted matching, i.e. the best possible solution. A number of experiments having thousands of tests are
performed here, followed by proper statistical analyses such as hypothesis testing and ANOVA.
Section 5 employs regression analysis to experimentally prove the algorithm’s time complexity of
Θ(n2). Additionally a direct comparison between the proposed and the Hungarian algorithm is presented in
terms of running time with number of papers ranging from 100 to 750, and reviewers from 60 to 450.
Section 6 continues experimental evaluation of both accuracy and time complexity, but this time using
real datasets taken from a series of nine already conducted conferences – CompSysTech from 2010 to 2018.
Section 7 gives useful ideas of how iterative and interactive execution could further increase the
assignment accuracy and provides higher flexibility.
Finally, the paper ends with conclusions in Section 8.
Appendix 1 presents an illustrated example of how the algorithm works.
Appendix 2 contains a detailed pseudo code that could be directly translated in any high-level
imperative programming language.

The main contributions of this work are:


1. The proposed heuristic assignment algorithm that achieves accuracy of about 99% in comparison
to the maximum-weighted matching algorithms, but with lower time complexity of Θ(n2). All of its
advantages are presented at the beginning of section 3.
2. The evaluation methodology, given in sections 4 and 5, that could be used to evaluate any other
assignment algorithm as well.
3. The large number of experiments that prove the quality of the proposed algorithm.

2. RELATED WORK
2.1. Calculating paper-reviewer similarities
Calculating paper-reviewer similarity factors mostly depends on the chosen method of describing
papers and reviewers’ competences. Generally, methods could be divided into two main groups:
 Explicit methods. Authors and reviewers are required to provide additional information to
explicitly describe their papers and competences.
 Implicit methods. No additional actions are required by users. Similarities are calculated based on
content analysis of publications.
The most commonly implemented explicit methods are selection of keywords/topics from a predefined
list and bidding. Calculating similarities based on keywords/topics selection also refers as feature-based
matching [29]. The idea is simple – during paper submission, authors select (usually from checkboxes) the
topics that best describe their papers. Reviewers do the same while registering to the conference management
system. Then the paper-reviewer similarities could be calculated in multiple ways. In case the list of topics is
presented as an unordered set, Dice’s [7] or Jaccard’s [11] similarity measures are good choice.
In case of feature-based matching, it is better to organize keywords/topics hierarchically in taxonomy.
In this way similarity measures, as the one proposed in [14], can take into account not just the number of
common keywords, but also to determine how semantically close the non-matching ones are. So, a non-zero
similarity could be calculated even if the paper and the reviewer do not share any keyword in common.
Bidding allows reviewers to explicitly state their willing/interest to review specific papers. However if
the number of submitted papers is high, reviewers are not likely to browse all papers and read their abstracts.
Thus, collected bids (or preferences) will be sparse and incomplete. This could be overcome in several ways.
First, the conference management system should recommend to each reviewer a small subset of papers that
will be most interesting to him/her. And second, the missing preferences could be “guessed” by applying
collaborative filtering techniques as suggested in [30, 5, 3]. The recommendation of papers to reviewers
could be based on paper-reviewer similarities calculated in any method.
Implicit methods do not require any additional actions from authors and reviewers. Instead,
similarities are calculated based on document content analysis. This is also known as profile-based matching
[29]. As its name suggests it relies on building papers’ and reviewers’ profiles. Papers are represented by
their content, while in most cases reviewers’ expertise is obtained from their previous publications. Then,
similarities are calculated based on IR techniques such as the Vectors Space Model (VSM), Latent Dirichlet
Allocation (LDA) [1] or other.
Andreas Pesenhofer et al. [28] suggest paper-reviewer similarities to be calculated as Euclidian
distance between the titles of the submitted papers and the titles of all reviewers’ publications. The latter are
fetched from CiteSeer and Google Scholar. The authors evaluated their proposal with data from ECDL 2005.
They noted that for 10 out of 87 PC members, no publications have been found and they got their papers to
review at random.
Stefano Ferilli et al. [10] use Latent Semantic Indexing (LSI) to automatically extract paper topics
from the titles and the abstracts of the submitted papers and from the titles of reviewers’ publications
obtained from DBLP. The proposal was evaluated by the organizers of the IEA/AIE 2005 conference. In
their opinion the average accuracy was 79%. According to the reviewers, the accuracy was 65% [10].
Laurent Charlin and Richard S. Zemel [3, 4] propose a standalone paper assignment recommender
system called “The Toronto Paper Matching System (TPMS)” that is also loosely coupled with Microsoft’s
Conference Management Toolkit. TPMS builds reviewers’ profiles based on their previous publications
obtained from Google Scholar or uploaded by the reviewers themselves. The latter actually allows reviewers
to control their profiles. The paper-reviewer scoring model is similar to the vector space model, but takes a
Bayesian approach [29]. By using Latent Dirichlet Allocation (LDA), TPMS can guess reviewers’ research
topics from their publications. To enhance accuracy, the system also supports reviewer’s self-assessment of
expertise in respect to the submitted papers.
Jennifer Nguyen et al [27] propose an interesting approach to apply Order Weighted Averaging
(OWA) function over multiple data sources to calculate the paper-reviewer similarities. The reviewer’s
profile consists of research interests (expertise), recency and quality. The quality refers to the number of
published papers (and their citations), written books and book chapters, and supervised PhD students. Data
are obtained from global sources like Aminer and ResearchGate, and local ones as TDX. Reviewers’
publications are not taken into account. Instead, reviewers’ research interests are directly taken form the
mentioned websites and aligned (automatically or by an expert) with the conference topics. Papers’ profiles
consist of concepts, obtained by applying LDA (Latent Dirichlet Allocation) on the entire collection of
submitted papers. Then paper concepts are translated to conference topics in an interesting manner – the
combination of each concept and each conference topic is sent as a search query to Google Scholar. The
number of returned results is normalized and combined with the “concept-to-paper” proportions provided by
LDA to calculate how much the paper is related to each conference topic. Finally, the paper-reviewer
similarity is calculated as an OWA function of reviewer’s expertise according to the topics coverage need of
the paper, recency, quality, and the availability of the reviewer. The authors tested their proposal with real
data taken from the International Conference of the Catalan Association for Artificial Intelligence (CCIA
2014, 2015, and 2016). However they report that among the original 96 reviewers, only 51 had skills
populated on their ResearchGate profiles, so they took into consideration only them [27].
Xiang Liu et al [19] propose a recommender system that calculates paper-reviewer similarities based
on three aspects of the reviewer: expertise, authority, and diversity. Authority refers to public recognition of
the reviewer in the scientific community, while diversity - whether he/she has diverse research interests and
background. Latent Dirichlet Allocation (LDA) is applied over the sets of submitted papers and reviewers’
publications to extract their topics. Then cosine similarity is used to calculate the relevance between the topic
vectors of each paper and each reviewer’s publication. Authority is determined by constructing a graph that
consists of the paper being processed and all of its candidate reviewers. Two reviewers are connected with an
edge if they have co-authored at least one paper. The weight of the edge depends on the number of papers
they co-authored. The intuition behind this is that if a reviewer is well connected, i.e. has many co-authors,
he or she would be considered as having higher authority [19]. A Random Walk with Restart (RWR) model
is employed on the graph to integrate expertise, authority and diversity. To test their approach, authors use
two datasets – one from NIPS 2006, accompanied by paper-reviewer relevance judgements provided by
NIPS experts. These judgements are used as a reference or ground truth as the authors call it. The other set is
derived from SIGIR 2007. Results show that their combined approach achieves higher precision than if no
authority and diversity were used. I found another interesting result in their data – in all experiments “Text
similarity” achieves better results than “Topic similarity”. So, pure VSM with proper term-weighing model
performs better than topics extraction by LDA followed by a cosine similarity of the topic vectors.
Xinlian Li et al [18] suggest that paper-reviewer similarity is calculated as a weighted sum of the
expertise degree of the reviewer and the relevance degree between the reviewer and the paper. The expertise
degree of the reviewer depends on quantity (total number of publications), quality (number of citations and
journal ranking) and freshness (time interval between the year of publication and the current year). The
relevance degree between a paper and a reviewer is calculated as the share of jointly-cited references
between the paper and the reviewer’s previous publications. The assumption is that the higher the number of
common references two papers have, the more similar research fields they are in [18]. The authors identify
three types of common referring: 1) direct referring: paper P quotes one of reviewer R’s publication directly;
2) same paper referring: both paper P and reviewer R refer to the same reference; and 3) same author
referring: both paper P and reviewer R cite the same author’s publication. Authors report they have tested
their approach with data obtained from DBLP, CiteSeerX and Journal Citation reports.
Another approach that combines multiple data sources is given by Don Conry et al [5]. They propose a
recommender system that predicts missing reviewer bids/preferences by using: explicit bids; predefined
conference topics/categories; topics, inferred by latent models such as LSI or LDA; paper-paper similarities;
reviewer-reviewer similarities; and conflicts of interest. These are all combined in a single similarity
measure. Paper-paper similarities are calculated in a VSM manner by the cosine of their abstracts. Reviewer-
reviewer similarities are determined by the number of commonly co-authored papers (obtained from DBLP).
Predefined conference topics contribute as a weighted sum of products of matching between topics and
papers, and topics and reviewers. Automatically inferred topics of a paper and a reviewer influence the
similarity between them by the inner product of their vectors. The authors test they approach with data taken
from ICDM’07. The results show that the assignment quality can be increased by providing more flexibility
with additional ratings from which to choose [5].
Marko Rodriguez and Johan Bollen [31] also exploit the idea that a manuscript’s subject domain can
be identified by its references. Their approach builds a co-authorship network that initially consists of the
authors who appear in the reference section of the paper. Then their names are sent to DBLP to find their co-
authors, who are also added to the network. Then the co-authors of the co-authors and etc. Finally, a relative-
rank particle-swarm algorithm is run for finding the most appropriate experts to review the paper. Rodriguez
and Bollen used real data set, taken from JCDL 2005 conference to evaluate their method. Results show that
89% of the reviewers and only 83% of the papers had identifiable authors in DBLP.
Ngai Meng Kou et al [15] state that a paper is well-reviewed only if the assigned reviewers have the
expertise to cover every single topic of the paper [15]. And that makes a lot of sense. However in most cases
each paper-reviewer similarity factor is individually calculated. Reviewers are also assigned to each paper
individually, regardless of the expertise of the other reviewers already assigned to it. To maximize the
coverage of the paper’s topics, the researchers propose that reviewers are not assigned to it individually, but
simultaneously as a group. So they redefine the classic reviewer assignment problem to Weighted-coverage
Group-based Reviewer Assignment Problem (WGRAP) [15]. The Author-Topic Model (ATM) proposed in
[32] is first use to extract a set of T topics and the topic vectors of reviewers from the reviewers’
publications. Then the topic vectors of papers are estimated by Expectation-Maximization (EM) [35] based
on the same set T. Instead of considering the expertise of a single reviewer in respect to a specific paper, the
approach now considers the expertise of the entire group of candidate reviewers. The expertise of the
reviewer group is a vector, which for every topic t stores the maximum expertise of any reviewer (inside the
group) in t.
Profile-based matching approaches are also applied by Denis Zubarev et al [36] for assigning
experts/reviewers to project proposals.

2.2. Assignment algorithms


The assignment problem is not a new, but a well-studied optimization task. As such, it is usually
solved by:
 Exhaustive search (brute force) algorithms.
 Heuristic (mostly greedy) algorithms.
Brute force algorithms explore all possible solutions then choose the best one among them. They
guarantee the best possible solution will be found for sure. However, that guarantee has its price – exploring
all solutions results in higher time complexity. In contrast, heuristic algorithms do not explore all possible
solutions, but on each step, they rank all alternatives and follow just the most perspective ones (could be one
or many) while ignoring all the others. Greedy algorithms in particular choose only the local optimum on
each step hoping that the sequence of local optima will lead to a global optimum. Obviously, they do not
guarantee finding the best possible solution, but as they explore fewer alternatives, they run much faster than
brute force algorithms.
Before evaluating specific algorithms, we should first define what the best solution in the context of
reviewer assignment is. As already discussed, the weight of an edge indicates how competent a reviewer is to
evaluate a specific paper. The higher the weight the higher the competence. The overall assignment is a set
of edges. Common sense suggests that an accurate assignment is the one that includes edges with high
weights. Thus, the optimal assignment or the best possible solution is the one whose sum of individual edge
weights is maximal. Тhe goal of the assignment algorithm is to find the maximum-weighted matching.
According to the graph theory, a matching is a set of edges that do not share any common vertices. This strict
definition directly limits the number of reviewers evaluating a paper to just one. However, in reality papers
are evaluated by three or more reviewers. To achieve that the maximum-weighted matching algorithm should
be run as many times as needed so it assigns an additional reviewer to every paper on each pass.
There are many existing assignment algorithms that guarantee finding the maximum-weighted
matching, i.e. the best solution. One of the most commonly used due to its efficiency is the algorithm of
Kuhn and Munkres, also known as the Hungarian algorithm [16, 26]. Its time complexity is O(n4) but could
be optimized to O(n3) [17], where n is the number of submitted papers or registered reviewers, whichever is
higher.
An alternative way of finding the maximum-weighted matching is to represent the bipartite graph as a
network and determine the maximum flow in it. This could be done by the algorithm of Edmonds and Karp
[9] in O(V E2), that in the context of reviewer assignment equals to O(n5). The Dinic’s algorithm [8] provides
a bit better time complexity of O(V2E), that corresponds to O(n4). One of the most efficient maximum flow
algorithms is the push-relabel algorithm implemented with relabel-to-front selection rule. It runs in O(n3)
[6]. An extended good study on how to apply a network flow theory in the assignment problem is given in
[2].
All these algorithms guarantee to find the maximum-weighted matching. But their time complexity of
O(n3) or even higher makes them almost inapplicable for a large number of papers. Why? Because the
conference management systems are usually implemented as web applications and web applications cannot
work limitlessly long. Due to the client-server architecture and the stateless communication protocol, both
the web server and the web browser apply their own time limits – execution timeout and connection timeout
respectively. Thus, execution times higher than 5 (preferably 3) minutes are considered dangerous and may
result in losing the client-server connection. In some cases, the web application can manage the execution
timeout on the server but cannot control the connection timeout of the browser. If computationally intense
time-consuming algorithms could not be avoided, they should be implemented as stand-alone server
processes capable of asynchronous communication with the client.
The Hungarian algorithm is implemented in one of the well-designed and commonly used conference
management systems – The MyReview System [48, 30]. The authors provide two implementations of the
algorithm – one in php as all the other system functionalities and another one in C. The official
documentation of the system from 2010 states that if the number of papers exceeds 300 then the C
implementation of the algorithm should be used as the one in php may not complete the task before reaching
the timeout. However distributing the assignment algorithm in C while the entire system is written in php
requires a well-qualified programmer to compile and integrate it.
The conference management system developed by Shai Halevi [42] from IBM Research employs the
Edmonds-Karp [9] / Dinic [8] maximum flow algorithm implemented in php. Unfortunately no information
on how (reliable) it handles the large number of papers is given in its official documentation.
Camillo J. Taylor suggests an approach to solving a variant of the bipartite matching problem in the
context of finding an optimal paper-reviewer assignment [33]. The problem is formulated as a linear program
which can be readily solved using standard interior point methods [33].
Many conference management systems use their own greedy algorithms to assign reviewers to papers.
However assigning with a greedy algorithm is a bit tricky and often results in lower accuracy. The reason is
simple: When processing papers one by one independently, it is certain that the first processed papers will
get the most competent for them reviewers, because at the beginning all reviewers are free. However, it is
quite possible that there will be no reviewers competent to evaluate the last processed papers as these
reviewers have already been unreasonably assigned to previously processed papers. If that happen, papers
from the second half of the list may get randomly assigned reviewers, i.e. evaluators who are not experts in
the relevant subject domains. Here is a naïve example that illustrates the problem. Imagine there are two
reviewers – ri and rj, and two papers – pk and pk+20. Every paper should be evaluated by a single reviewer and
every reviewer should evaluate just one paper. The algorithm first processes pk and determines that both
reviewers are competent to review it, but ri is just a little bit more competent. Therefore, it assigns ri to pk.
Later the algorithm reaches pk+20 and finds out that ri is actually the only reviewer competent to evaluate it.
OK, but ri has already been assigned to pk (although rj is also competent to review it) and cannot evaluate any
more papers. Then rj is assigned to pk+20 regardless that he/she is not competent to review it.
Although there is no way to avoid this problem, its negative effect could be significantly reduced by:
 Introducing papers’ priority. Papers having less reviewers’ bids, or papers described by a small
number of keywords, or papers described by keywords chosen by just a few reviewers should be
processed first.
 Calculating similarity factors by using a symmetric similarity measure. It takes into account not
only the level of competence of a reviewer in respect to a specific paper, but also computes how
worthy is to assign him/her exactly to that paper [14].

Some popular conference management systems, including OpenConf [46] and EDAS [41], as well as
the out of date and out of support CyberChair [49, 39] rely on their own greedy algorithms to assign
reviewers to papers. According to Prof. Philippe Rigaux [30, 47], Microsoft CMT [44] also employs a
greedy algorithm for that purpose.
OpenConf provides two assignment algorithms – “Topics Match” and “Weighted Topics Match”.
Both are open source, implemented in php, and could be downloaded from the OpenConf’s official web site
[46]. The first one processes papers sequentially starting from those having the highest number of common
keywords with the reviewers. This may not be a very good idea since the most problematic papers, those
describing by less keywords, will be processed last and on that time there may be no free reviewers
competent to evaluate them. When processing a paper the algorithm assigns the necessary number of
reviewers (2 or 3) to it starting from those who are still free and have the highest number of common
keywords with the paper. “Weighted Topics Match” computes a weight of each keyword that is proportional
to the number of reviewers who have chosen it and back proportional to the number of papers it describes.
Then the algorithm calculates a weight of each paper and each reviewer as a sum of the weights of their
keywords. So papers described by less keywords or papers described by keywords chosen from a small
amount of reviewers will get lower weight and will be processed first. In this way, the algorithm tries to
minimize the above-mentioned disadvantage of the greedy algorithms. The disadvantage of “Weighted
Topics Match” is that when assigning reviewers to a paper it does not measure the level of their competence
in respect to that specific paper. Instead, it sorts reviewers according to their global weight that is not related
to the paper at all. The only one requirement directly related to the paper being processed is that its reviewers
should have at least one keyword in common with it.
EDAS and CyberChair use simpler but better greedy algorithm. It relies mostly on reviewers’ bids.
Chosen keywords are taken into account only if no bids were specified. The algorithm processes papers
sequentially, starting from those having less bids, i.e. the most problematic papers. It assigns the necessary
number of reviewers to each paper, starting from those PC members who are still free and who have stated
the highest willingness to evaluate the paper.
Jennifer Nguyen et al [27] propose an iterative greedy algorithm that assigns one reviewer to every
paper on each iteration. Papers are processed sequentially, starting with the ones with the highest topic
coverage needs. That is a good decision because at the beginning there are more free reviewers who can
cover higher topic needs. The list of candidate reviewers is sorted by their similarity score (see section 2.1.
about how it is calculated) with the paper in descending order. The paper is assigned to the reviewer having
the highest score. If two or more reviewers have equal scores at the top of the list, then topic exclusiveness is
taken into account and the paper is assigned to the reviewer with the least exclusive topics (also a good
decision). Exclusiveness of a topic refers to the number of reviewers who are expert in it. Lower
exclusiveness means higher number of reviewers who are experts in the topic.
Xinlian Li et al [18] propose an algorithm to solve the nonlinear assignment problem that achieves
higher accuracy than greedy algorithms. Authors state their algorithm is successful in finding the maximum-
weighted assignment in most cases though it fails in small-scale matrixes [18]. Their results show it is twice
faster than the Hungarian algorithm [16,26], i.e. its time complexity is cubic O(n3).
Solving the Weighted-coverage Group-based Reviewer Assignment Problem (WGRAP) is even harder
due to the increased number of constraints it should satisfy. Its basic idea is to assign all of the reviewers
who should evaluate a paper simultaneously (as a group), so that the reviewers’ group covers as much as
possible all of the topics associated with the paper. To solve WGRAP, Ngai Meng Kou et al propose a
polynomial-time approximation algorithm which they call “Stage Deepening Greedy Algorithm (SDGA)”
[15]. At each stage, a classic linear assignment algorithm (e.g., Hungarian algorithm [16,26]) could be
applied to compute the assignment in polynomial time. The time complexity of SDGA, using the Hungarian
algorithm, is O(δp(max{P,R})3), where P – the number of papers, R – the number of reviewers, and δp –
reviewers’ group size, i.e. the number of reviewers, assigned to a paper. The algorithm improves the
approximation ratio of previous work from 1/3 to 1/2 and achieves optimality ratio of about 97% (for group
size of 3) and higher, if 4+ reviewers are assigned to papers. The approximation ratio is defined as the ratio
between the computed assignment and the optimal assignment. However, computing the latter may take very
long time even for small instances, that’s why for experimental evaluation the authors use optimality ratio. It
is the ratio between the computed assignment and the ideal assignment. The latter is obtained by greedily
assignment to each paper the best set (group) of reviewers, regardless their workloads [15]. Additionally, the
authors propose a stochastic refinement post-processing (SDGA-SRA) that could further improve the
optimality ratio by 1.4%.
The group-based reviewer assignment problem could be also solved by the greedy algorithm proposed
by Long et al. [21] in lower time complexity, but with lower approximation ratio of 1/3 [15].
Greedy algorithms usually complete in O(n m log2(m)) where n – the number of papers and m – the
number of reviewers. The logarithmic part comes from the need to sort all reviewers according to their level
of competence in respect to every single paper. However, an important remark should be made here:
Conference management systems are usually (almost always) implemented as web applications written in a
scripting language as php or perl, or in a language that is compiled to an intermediate interpretable language
such as C# or Java. All these languages provide a built-in (native) sort() function that is implemented as a
part of the language interpreter itself. Thus, sorting with the built-in function runs with the speed of the
compiled machine code of the interpreter, not with the speed of the interpretable high-level language.
According to professional programmers [22] the difference in the running time between the native sort()
function in php (that implements the quicksort algorithm) and the implementation of the same algorithm in
php code is about 22-23 times. When sorting an array of less than 5 million elements this speed difference of
22 times compensate or even exceed the value of the logarithmic component. Therefore, in the context of
conference management systems we can skip it. Additionally, to keep reviewers’ load in a reasonable range,
m should completely depend on n. Then we can conclude that in case of conference management systems
most greedy algorithms assign reviewers to papers with a time complexity of O(n2). All performed
experiments support this conclusion.
Unfortunately, it is unknown how the most popular conference management system EasyChair [40]
handles the automatic assignment of reviewers to papers. As it is provided as a completely hosted cloud
service, the only source of technical information about it is its official documentation. According to it,
EasyChair uses a special-purpose randomized algorithm to assign papers [40]. However, this information is
far from enough in order to make a fair analysis and comparison to the other algorithms.

3. AN ALGORITHM FOR AUTOMATIC ASSIGNMENT OF REVIEWERS TO PAPERS


This paper suggests a heuristic algorithm for automatic assignment of reviewers to papers that:
 Provides a uniform distribution of papers to reviewers (load balancing), i.e. all reviewers
evaluate the same number of papers.
 Provides better resource management than all known greedy algorithms. It does not process
papers one by one independently on each other, but assign reviewers to all papers “in parallel”,
i.e. it does not assign a reviewer to any paper before it finds suitable reviewers for all papers.
 Guarantees as much as possible that if there is at least one reviewer competent to evaluate a
specific paper then the paper will have a reviewer assigned to it. However if there are too many
papers (more than the calculated number of papers per reviewer) that could be evaluated just by a
single reviewer, some of them will remain unassigned. No algorithm can handle this situation, as
it is a typical lack of resources.
 Achieves accuracy of about 98-99%.
 Runs in O(n m log2(m)) in general or in Θ(n2) when implemented as a web application.
 Does not remove previously made assignments. Allows iterative and interactive usage, and
subsequent reassignments.
The algorithm is presented by a generalized pseudo code, a detailed pseudo code and an example. The
generalized pseudo shows its basic idea and explains how it works on a conceptual level. All steps are well
illustrated in the example given in Appendix 1. However, the efficient implementation is very important in
order to achieve a quadratic time complexity. That is the idea of the detailed pseudo code. It could be directly
translated in any high-level imperative programming language as each operation in the pseudo corresponds
to an operator or a built-in function in the chosen programming language.
Before starting the algorithm, the PC chair has to specify how many reviewers should evaluate each
one of the papers. All the other settings are determined automatically. To maintain a load balancing, the
algorithm calculates the maximum allowed number of papers per reviewer.
Number of papers per reviewer
Number of papers * Number of reviews per paper / Number of reviewers; (1)

If the calculated number of papers per reviewer is a fractional number, it should be ceiled to its closest
upper integer.
The algorithm takes as an input the so-called similarity matrix (SM). It contains the similarity factors
between all papers and all reviewers. Initially, rows represent papers, columns – reviewers. For a graphical
representation, please refer to the example in Appendix 1. It probably better illustrates how the algorithm
works than the generalized pseudo itself.
At the beginning, the algorithm sorts each row by similarity factor in descending order. As a result, the
first column of the matrix contains the most competent reviewer for every paper. Let denote the calculated
number of papers per reviewer (1) by n. If there are reviewers in the first column who are suggested to more
than n papers, then the algorithm cannot directly assign them to the corresponding papers. Doing that
violates the load-balancing requirement. Therefore, the algorithm needs to decide which n of all suggested
papers to assign to the respective reviewers. To do that it copies the first column of the matrix to an auxiliary
data structure (an ordinary array) and modifies all similarity factors in it by adding corrections C1 and C2. C1
depends on the number of reviewers competent to evaluate the paper being processed (pi), while C2 depends
on the rate of decrease in the competence of the next-suggested reviewers for pi. The basic idea of these two
corrections is to give a priority to papers having a short list of reviewers competent to evaluate them.
, , (2)
2,
, (3)

2∗ , , (4)

where:
, – Similarity factor between paper pi and its most competent reviewer, i.e. the
similarity factor in the first column of the row corresponding to pi.
, – Similarity factor between paper pi and its second-competent reviewer, i.e. the
similarity factor in the second column of the row corresponding to pi.
– Number of reviewers having a non-zero similarity factor with pi.
– Number of reviewers that should be assigned to pi.
To maintain a load balancing, no reviewer should be assigned to more papers than the calculated
number of papers per reviewer, i.e. n. If a reviewer rj is suggested to more than n papers then he/she is
assigned just to the first n of them which, after the modification, have the highest similarity factors with
him/her. All of the rest will be given to other reviewers during the next pass of the algorithm. Rows
corresponding to papers that will not be assigned to rj are shifted one position to the left, so that the next-
competent reviewers are suggested to these papers. As rj already has the maximum allowed number of papers
to review, no more papers should be assigned to him/her in future. Thus, all similarity factors outside the first
column between all papers and rj are deleted from the matrix. All these operations (including the
modification of similarity factors) are iteratively repeated while there are still reviewers in the first column of
the matrix who are suggested to more than n papers.
Eventually, after one or more passes, there will be no reviewers who appear more than n times in the
first column of the similarity matrix. At that point, all suggested reviewers could be directly assigned to the
corresponding papers. At the end of the algorithm each paper will have 1 (or 0 if there are no competent PC
members) new reviewer(s) assigned to it. If papers have to be evaluated by more than 1 reviewer (as they
usually do) the algorithm should be run as many times as needed. It is important however that each time the
algorithm takes as an input not the initial similarity matrix, but the one produced as a result from the
previous run (after deleting the entire first column). It guarantees that busy reviewers will not get any more
papers and each time the newly suggested reviewer will be different from those previously assigned.
Here are the major data structures used within the generalized pseudo code:
assignmentReady – a Boolean flag, indicating if the suggested reviewers could be directly
assigned to the corresponding papers. That is only possible if each reviewer in the first column of the
similarity matrix is suggested to not more than n papers. If the flag is true at the end of the current iteration
through the do-while cycle (lines 8-40) then the algorithm ends.
papersOfReviewer[rj] – an associative array, whose keys correspond to the usernames of the
reviewers who appear in the first column of the similarity matrix; and values each holding an array of papers
(and similarity factors) suggested for assignment to the respective reviewer rj. For implementation details
please refer to Appendix 2.
numPapersToAssign[rj] – an associative array, whose keys correspond to the usernames of the
reviewers who appear in the first column of the similarity matrix; and values each holding the maximum
number of papers that could be assigned to the respective reviewer rj.

It is guaranteed the algorithm will complete in a finite number of iterations because:


 At the beginning of every iteration through the do-while cycle (lines 8-40) the
assignmentReady flag is initialized to true. Then it could be later changed to false only if
there is at least one reviewer who is suggested to more than n papers.
 If at the beginning of an iteration a reviewer rj is suggested to more than n papers, then at the end
of that iteration just n of them will remain. The similarity factors between rj and all the other
papers will be deleted from the similarity matrix.
 As a result of the previous two, after a finite number of iterations (not more than the number of
PC members), there will be no reviewer suggested to more than n papers, so the
assignmentReady flag will remain true and the algorithm will end.

To determine the algorithm’s time complexity we need to find the most frequently repeated
simple/scalar operation that has a constant complexity. In iterative algorithms, it is usually any simple
operation within the innermost cycle. In our case, this is modifying similarity factors on row 14 and shifting
rows one position to the left on row 28. Deleting similarity factors on row 31, on the other hand, is a
complex operation that has a linear complexity in respect to the number of papers. At the end of the
algorithm most reviewers should be marked as busy, that means the algorithm has performed up to m * (n -
papersPerReviewer) deletions. Where n is the number of submitted papers, m – the number of registered
reviewers and papersPerRevieweris a conference-specific constant that usually ranges from 3 to 7. To keep
reviewers’ workload within reasonable limits m should be tightly correlated to n. So deletion of a similarity
factor (that is a simple constant-complexity operation) on row 31 may overall occur up to n2 times.
Row 14 will be repeated exactly n times within a single iteration through the do-while cycle (rows 8-
40). The number of unique reviewers occurring in the first column of the matrix does not matter. If there are
just a few reviewers, they will have long lists of suggested papers. If there are many reviewers, then they will
have much shorter lists. But the total number of similarity factors that should be modified will be always n.
Row 28 will be repeated less than n times within a single iteration through the do-while cycle. More
precisely n – (count(j) * papersPerReviewer) times, where count(j) is the number of unique reviewers in the
first column of the matrix. Furthermore, we may safely assume that ignoring the very first (zero) element of
an array and shifting all the others one position to the beginning is a constant-complexity operation because
it is implemented just by changing the array pointer.
It seems that sorting on rows 1 and 17 may be a concern as it is done in O(n log2(n)). However, as
discussed earlier, if the algorithm is implemented in an interpretable language as part of a web application,
the logarithmic multiplier could be omitted. Then the overall time complexity of row 1 should be O(n2).
Please note that the same assumption is made for all existing heuristic algorithms reviewed in the “related
work” section, i.e. they step on an equal footing with the proposed one.
ALGORITHM 1

1 Sort all rows of the similarity matrix (SM)


2 by similarity factor (sim) in descending order;
3
4 // the maximum number of papers a reviewer should evaluate
5 papersPerReviewer = ceil(
6 (number of papers * reviewers per paper) / number of reviewers);
7
8 do {
9 assignmentReady = true;
10 for every unique reviewer rj who appear in the first column of SM {
11 papersOfReviewer[rj] = Create a list of papers suggested to rj
12 (in the first column of SM);
13 for every paper pi in papersOfReviewer[rj] {
14 sim(pi, rj) = sim(pi, rj) + C1 + C2;
15 // C1 and C2 are calculated by using formulas 3 and 4
16 }
17 papersOfReviewer[rj] = Sort papersOfReviewer[rj]
18 by similarity factor sim(pi, rj) in descending order;
19 numPapersToAssign[rj] = papersPerReviewer – number of papers,
20 already assigned to rj;
21 if (count(papersOfReviewer[rj]) > numPapersToAssign[rj]) {
22 // If the number of papers suggested to rj is higher than
23 // the number of papers allowed to be assigned to rj, then just the
24 // first numPapersToAssign[rj] number of papers will be assigned to
25 // him/her and all the rest will be given to other reviewers.
26 for every paper pm, positioned after the first
27 numPapersToAssign[rj] papers in papersOfReviewer[rj] {
28 Shift the row corresponding to pm in SM one position left;
29 // that will suggest the next-competent reviewer (after rj) to pm
30 }
31 Delete from SM (but not from its first column) all similarity
32 factors associated with reviewer rj;
33 // no more papers should be assigned to rj as (s)he already has enough
34 assignmentReady = false;
35 // it is not possible to assign the reviewers from the first column
36 // of SM directly to the relevant papers, because there are reviewers
37 // suggested to more papers than the maximum allowed number.
38 }
39 } // end for every unique reviewer in the first column of SM
40 } while (!assignmentReady);
41
42 Assign to each paper the reviewer situated in the first column of the
43 corresponding row in SM;
44 // every reviewer rj appears not more than numPapersToAssign[rj] times
45 // in the first column of SM. After the assignment, the reviewer rj should
46 // be removed from the first column of SM.
Excluding row 31, there is no operation that will be executed more than n times during a single
iteration through the do-while cycle (rows 8-40). The number of iterations through this outermost cycle does
not actually depend on the number of papers and reviewers, but on their distribution on keywords and
conference topics. The worst case (unrealistic) scenario occurs when a single reviewer is suggested (most
competent) to all unassigned papers on each iteration. If so, the algorithm should perform m iterations
through the outermost cycle. So we can conclude that the worst case time complexity of rows 8-40
(excluding 31) is O(n2). Deletion on row 31 will be executed totally (for all iterations) about n2 times.
Assigning suggested reviewers to all papers on row 42 requires a linear time complexity of Θ(n). Then the
overall time complexity of the algorithm is determined as Θ(n2). It is experimentally proven as well (see
section 5).
A detailed pseudo code is presented in Appendix 2. The most significant difference in respect to the
generalized pseudo is that deleting similarity factors and shifting rows left is done not at the end of the
current iteration, but at the beginning of the next one while forming the papersOfReviewer[] array. This
optimization does not influence the time complexity, but speeds the algorithm up by reducing the number of
cycles within the outermost do-while.
Although the algorithm is designed to assign reviewers to papers, it is universal and could be used to
solve the assignment problem in any other subject domains. If implemented in an interpretable language as a
web application its time complexity is Θ(n2). In all other cases - O(n m log2(m)), where n is the number of
objects/tasks that should be assigned and m – the number of participants whom objects should be assigned to.

4. EXPERIMENTAL EVALUATION: ACCURACY


To evaluate accuracy of the proposed algorithm it should be compared to a reference algorithm that is
known to provide the best possible solution. As discussed in the “related work” section this is any algorithm
that guarantees to find the maximum-weighted matching. Fortunately, the weight of matching could be
precisely measured or calculated by (5) so different assignment algorithms could be objectively compared.
∑ ∈ (5)
where:
w(M) – weight of matching M;
w(e) – weight of the edge e (i.e. a similarity factor between a paper and a reviewer) that belongs to M.

One of the fastest and most commonly implemented assignment algorithms is proposed by Kuhn and
Munkres [16, 26]. Its optimized version finds the maximum-weighted matching in O(n3). That motivates its
choice for a reference algorithm for this experimental analysis. Both algorithms are implemented in php
language – the proposed one by the author of this paper and the reference algorithm by Prof. Miki Hermann
[45]. The implementation of Prof. Hermann is taken out from The MyReview System [48] – a great open
source conference management system. It should be noted that Prof. Hermann’s implementation is used just
for the purpose of this experiment and not for any commercial use.
To achieve a fair and accurate comparison both algorithms (proposed and reference) should share the
same input data, i.e. the same similarity matrix. Then the accuracy of the proposed algorithm could be easily
determined by the weight of matching, produced by the two algorithms.

Formally:
х 100, % (6)

where:
w(M) – the weight of matching produced by the proposed algorithm;
w(Mref) – the weight of matching produced by the reference algorithm (Kuhn and Munkres). This is the
maximum-weighted matching for the corresponding similarity matrix.

Running the experiment just once or twice cannot guarantee reliability of the results and will not
provide enough data for a subsequent statistical analysis. So a special purpose software application has been
built (in php) that automates testing and performs as many tests and replications as needed.
On each test, it:
 Generates a new and independent from the previous tests similarity matrix (SM) with size
specified by the experimenter. Similarity factors in it are fractional numbers within the range
[0.00, 1.00] which are pseudorandom generated by the Mersenne Twister algorithm [23]
proposed by Makoto Matsumoto and Takuji Nishimura. The algorithm is implemented by the
built-in php function mt_rand().
 Generates a number (specified in percentage) of zero similarity factors and place them on
random positions in the similarity matrix.
 Starts the proposed algorithm with the SM matrix as in input and measures the weight of the
output matching.
 Start the reference algorithm (of Kuhn and Munkres) with the same SM matrix as an input and
measures the weight of output matching.
 Calculates accuracy of the proposed algorithm by using equation (6) and stores all data from the
current test/iteration into a file.

Of course, the implemented data generator is not meant to replace tests with real data sets. Such
experiments will be performed as well. However the data generator allows the experimenter to set up and
precisely control some aspects of the assignment (for example number of papers and reviewers; percentage
of zero (missing) similarity factors; number of levels that similarity factors can have; number of reviewers
per paper; number of papers per reviewer; and etc.) and evaluate their influence on the accuracy.

Experiment E1: To determine accuracy of the proposed algorithm when assigning 90 papers to 60
reviewers while each paper is evaluated by 3 reviewers. That is a very common scenario.
A 1000 tests (observations) are made for the purpose of this experiment and for each of them a new
and independent (from the previous tests) similarity matrix is generated as an input. That provides more than
enough sample size for a subsequent statistical analysis and lowers the standard error of the mean. The same
similarity matrix is used as an input for both algorithms on each test. Then the algorithms are run one after
another and the accuracy of the proposed one is calculated by using eq. (6). The result is shown on figure 2
in the form of a histogram.
300
Number of tests (observations)

250

200

150

100

50

Accuracy, %
0
98 98.5 99 99.5 100 100.5

Figure 2. Accuracy of the proposed algorithm when assigning 90 papers to 60 reviewers.

Two assumptions could be made by looking at the figure:


 The proposed algorithm assigns 60 reviewers to 90 papers with accuracy of about 99.6% in
comparison to the maximum-weighted matching algorithms that guarantees to find the best
solution.
 The accuracy is normally distributed around the value of 99.6%.
Both assumptions could be checked with a single hypothesis test. Let state, as a null hypothesis H0,
that the sample mean is 99.6.
H0: µ = 99.6
Hα: µ ≠ 99.6
According to the Student’s t-test the null hypothesis cannot be rejected, thus we can accept that the
proposed algorithm indeed assigns with accuracy of 99.6% of the one of the maximum-weighted matching
algorithms. Moreover accepting the null hypothesis implies that the sample data follow a Student’s t-
distribution. As specified in [12], if the sample size is 151 or more, t-distribution gets almost identical to the
normal distribution.

Experiment E2: To determine if and how the fractional part of the calculated number of papers per
reviewer influence the assignment’s accuracy.
In most cases, the calculated number of papers per reviewer is not an integer, but a float value - for
example 4.2. As it is quite pointless a reviewer to evaluate just part of a paper (20% in this case), the
calculated number is ceiled up to the next integer. In some cases, however its fractional part could actually
influence the accuracy of the assignment. It all depends on the working mode. The proposed algorithm, in
contrast to the reference one for example, can provide two types of paper distribution (assignment):
 Uniform distribution – All reviewers evaluate exactly n or n-1 papers, where n is the ceiled
number of papers per reviewer. If the calculated number of papers per reviewer is 4.2, then n=5
and most of the reviewers will evaluate 4 papers while just a few reviewers will evaluate 5
papers.
 Threshold distribution – There will be no reviewer evaluating more than n papers. This sets just
an upper limit, but there is no required minimum. Therefore, there may be reviewers evaluating
1 or 2 papers or even reviewers without any assigned papers at all. Removing the constraint to
assign at least n-1 papers to every reviewer allows the algorithm to give higher priority to
reviewers’ competence and to follow similarity factors better. This could increase the overall
accuracy of the assignment. If the fractional part of the calculated number of papers per reviewer
is lower (but not x.0) the algorithm will have much more choices during the assignment,
resulting in higher accuracy. If the fractional part is higher (x.9 for example) the assignment will
get closer to uniform distribution lowering the accuracy a little bit.
As the reference algorithm of Kuhn and Munkres does not support threshold distribution, this
untypical working mode is not discussed much in this paper. All experiments are conducted under uniform
distribution, so that both algorithms step on an equal footing. In case of uniform distribution of papers to
reviewers, the fractional part of the calculated number of papers per reviewer should not influence the
accuracy of the assignment as it does in threshold distribution. The aim of this experiment is to check if this
assumption is true. If it is not, then all other experiments should be planned in a way that the number of
papers per reviewer remains the same for all tests.
Four experiments are conducted with a 1000 tests (observations) in each experiment. The same
amount of papers (90) is assigned in each experiment while the number of reviewers varies a little bit, so we
can achieve a different fractional part (from x.1 to x.9) of the calculated number of papers per reviewer.
Results are summarized in table 1 and figure 3.

Table 1. Results from the experiment that checks how the fractional part of the calculated
number of papers per reviewer influences the accuracy of the assignment
Experiment # 1 2 3 4
Papers / Reviewers 90 x 66 90 x 60 90 x 55 88 х 66
Reviewers per paper 3 3 3 3
Papers per reviewer 4.09 4.5 4.91 4.0
Accuracy H0: µ = 99.14 H0: µ = 99.63 H0: µ = 99.78 H0: µ = 99.76
not rejected not rejected not rejected not rejected
Standard deviation 0.2272 0.1452 0.1570 0.2083
Variance 0.0516 0.0211 0.0246 0.0434
95% conf. interval / 99.1252 99.6208 99.7739 99.7509
α = 0.05 99.1534 99.6388 99.7933 99.7767

Although the tight non-overlapping confidence intervals reliably suggest that the fractional part of the
calculated number of papers per reviewer actually does influence on the assignment accuracy, an ANOVA
analysis is also performed to prove it statistically. It is done by using the Statistic Toolbox of MATLAB [43].
To avoid typing of large data arrays (4 x 1000 observations) by hand, an application programming interface
is built between the implemented special-purpose software for experimental analysis and MATLAB. Of
course, the conducted analysis of variances proves that the fractional part of the number of papers per
reviewer does influence the accuracy of the assignment.
200 300 300

180
250 250
160

140
200 200
120

100 150 150

80
100 100
60

40
50 50
20

0 0 0
98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5

a) 4.09 papers per reviewer b) 4.5 papers per reviewer c) 4.91 papers per reviewer
300

250

200

150

100

50

0
98 98.5 99 99.5 100 100.5

d) 4.0 papers per reviewer


Figure 3. Accuracy histograms of the proposed algorithm
in respect to a different number of papers and reviewers
Both table 1 and figure 3 clearly show that the fractional part of the calculated number of papers per
reviewer actually does influence the accuracy of the assignment – not much (about 1 pp), but still affects it.
That contradicts with the assumption made prior the experiment. Further analysis shows the reason for this
influence is the implementation of the algorithm and the way it handles assignment of multiple reviewers per
paper. To achieve a uniform workload distribution the algorithm first assigns with limit of n-1 papers per
reviewer while keeping the number of reviews per paper (m) unchanged. Therefore, in this case the
algorithm works in explicit lack of resources, because the number of reviews that should be written is higher
than the number of reviews reviewers are allowed to make. So far, it is sure that some papers will have not
m, but m-1 reviewers assigned to them. Then the algorithm increases the number of papers per reviewer to n
and performs an additional pass to assign one more reviewer to the papers that left with m-1 reviewers during
the previous pass. Due to the increased number of papers per reviewer, all PC members are considered to be
free and available during the second pass that gives the algorithm much more choices when assigning the m-
th reviewer. However, papers processed during the previous pass may get less competent m-th reviewers due
to the lack of resources at that time. If the fractional part of the calculated number of papers per reviewer is
low, then most of the papers get their m-th reviewer during the first pass when the algorithm experiences lack
of resources. And the opposite – if the fractional part is higher just a few papers are processed during the first
pass while most of the papers get their m-th reviewer during the second pass when the algorithm has much
more assignment choices. That explains why the accuracy gets higher when the fractional part is higher.
However, this is not a bug of the assignment algorithm itself but of its implementation when assigning
multiple reviewers to a single paper on multiple passes. It needs to be further improved.

Experiment E3: To determine if and how the number of submitted papers and registered reviewers
influence the accuracy of the assignment.
Five experiments are conducted with a 1000 tests (observations) in each experiment. Three reviewers
are assigned to each paper and the number of papers per reviewer is 4.5 every time. The only aspect that
varies along the different experiments is the number of papers and reviewers. As in the previous experiment,
for each single test a new and independent (from the previous tests) similarity matrix is generated as in input.
Results are summarized in table 2 and figure 4.
Table 2. Results from the experiment that checks how
the number of papers and reviewers influence the accuracy of the assignment
Experiment # 1 2 3 4 5
Papers / Reviewers 50 x 33 75 x 50 90 x 60 120 x 80 150 x 100
Reviewers per paper 3 3 3 3 3
Papers per reviewer 4.5 4.5 4.5 4.5 4.5
Accuracy H0: µ = 99.49 H0: µ = 99.58 H0: µ = 99.63 H0: µ = 99.69 H0: µ = 99.73
not rejected not rejected not rejected not rejected not rejected
Standard deviation 0.2972 0.1780 0.1452 0.1045 0.0867
Variance 0.0883 0.0317 0.0211 0.0109 0.0075
95% conf. interval / 99.4727 99.5648 99.6208 99.6791 99.7239
α = 0.05 99.5095 99.5868 99.6388 99.6921 99.7347

The tight non-overlapping confidence intervals reliably suggest that the number of papers and
reviewers do influence on the assignment accuracy. However, an ANOVA analysis is also performed to
prove it statistically. As seen on figure 5, increasing the number of papers and reviewers increases the
accuracy a little bit and more importantly significantly reduces dispersion around the mean. The most
probable reason is that the higher number of reviewers provides more choices during the assignment.
140 250 300

120 250
200

100
200
150
80
150

60
100
100
40
50
50
20

0 0 0
98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5 98 98.5 99 99.5 100 100.5

a) 50 papers х 33 reviewers b) 75 papers х 50 reviewers c) 90 papers х 60 reviewers


400 450

350 400

300 350

300
250
Figure 4. Accuracy histograms of the
250
200 proposed algorithm in respect to a
150
200
different number of papers and
100
150
reviewers
100
50
50
0
98 98.5 99 99.5 100 100.5 0
98 98.5 99 99.5 100 100.5

d) 120 papers х 80 reviewers e) 150 papers х 100 reviewers


Experiment E4: To determine if and how the number of zero-calculated similarity factors in the
similarity matrix influences the accuracy of the assignment.
The automatic assignment of reviewers to papers is in general independent on the way similarity
factors are calculated. The way they are calculated (and the similarity measures used for that) is most
commonly determined by the chosen method of describing papers and reviewers’ competences. An analysis
of the existing methods could be found in [14]. If, for example, papers and reviewers’ competences are
described by taxonomy of keywords the number of zero-calculated similarity factors will be much less than if
papers and reviewers are described by a predefined unordered set of keywords or explicit preferences. Thus,
depending on the chosen method of describing papers and reviewers’ competences, the number of zero-
calculated similarity factors within the similarity matrix can vary significantly. Common sense suggests that
increasing the number of zero-calculated similarity factors may decrease the overall accuracy. The aim of
this experiment is to prove this assumption and to find the extent of its influence.
Four experiments are performed, each having a thousand tests (observations). A new and independent
(from the previous tests) similarity matrix is generated as an input for an every single test. 90 papers are
assigned to 60 reviewers every time and the number of reviewers per paper is always 3. The only aspect that
varies between the different experiments is the amount of zero-calculated similarity factors. They are
produced by an additional data (zero) generator that for every single test modifies the similarity matrix by
placing the specified amount of zeros in random positions. Randomization is again provided by the
Mersenne Twister algorithm [23]. Furthermore, to guarantee that the specified minimal amount of zeros will
be really achieved, it is not allowed placing a zero on the same position in the matrix more than once. Results
are summarized in table 3 and figure 5.
Table 3. Results from the experiment that check how the number
of zero-calculated similarity factors influences the accuracy of the assignment
Experiment # 1 2 3 4
Papers / Reviewers 90 x 60 х 3 90 x 60 х 3 90 x 60 х 3 90 x 60 х 3
Minimal amount of zeros no minimal
25% 50% 75%
in the similarity matrix value
Accuracy H0: µ = 99.63 H0: µ = 99.53 H0: µ = 99.36 H0: µ = 99.23
not rejected not rejected not rejected not rejected
Standard deviation 0.1452 0.1745 0.2524 0.3691
Variance 0.0211 0.0305 0.0637 0.1362
95% conf. interval / 99.6208 99.5208 99.3414 99.2072
α = 0.05 99.6388 99.5424 99.3728 99.2530

An ANOVA analysis is not performed this time, because the result of it is quite clear as the confidence
levels do not overlap. Expectedly the amount of zero-calculated similarity factors does influence on the
accuracy of the assignment. However, it affects mostly the dispersion around the mean rather than the mean
itself. That is clearly illustrated on the box and whisker (quartile) diagram on figure 5.
100

Accuracy, %
99.8

99.6

99.4

99.2

99

98.8

98.6

98.4

98.2

98

No minimal
25% zeros 50% zeros 75% zeros
value
Figure 5. Box and whisker diagram (quartiles diagram) showing how the amount
of zero-calculated similarity factors influences the accuracy of the assignment.

5. EXPERIMENTAL EVALUATION: TIME COMPLEXITY


This experiment has three main objectives:
 To verify the theoretically formulated time complexity of Θ(n2). Note that it is valid only when the
algorithm is implemented in an interpretable language and sorting is done by the built-it sort
function. Otherwise (in general) time complexity may get a bit higher O(n m log2(m)), that is still
much better than the one of the exhaustive search (maximum-weighted matching) algorithms.
 To implicitly prove that the assumption stated in both the “related work” and “algorithm
description” parts is correct. It says that when sorting an array of less than five million elements in
php with the built-in sort() function and all the rest of the code is written in php, then the
logarithmic component in the sort’s time complexity O(n log2(n)) could be omitted. The built-in
sort() function is implemented as a part of the interpreter itself and runs with the speed of its
compiled machine code, not with the speed of the interpretable high-level language. This speed
difference compensates or even exceeds the value of the logarithmic component when the size of
the array is less than five millions. Note that this assumption is applied to all reviewed algorithms,
not just to the proposed one. So they all step on an equal footing.
 To provide a direct comparison between execution times of the proposed algorithm and the
maximum-weighted matching of Kuhn and Munkres. Note that the most efficient implementation
of the Kuhn and Munkres algorithm has a time complexity of O(n3). As observed in all
experiments, the implementation by Prof. Miki Hermann [45] is well done and fully complies with
this complexity.
In general, the execution time depends on both the number of submitted papers and the number of
registered reviewers. However to keep reviewers’ workload in a reasonable range the number of reviewers
should be tightly correlated to the number of papers. So a simple (single independent variable) regression
analysis could be used to determine how exactly the execution time is related to the number of papers.
14 tests (assignments) are performed for the purpose of this experiment. A new and independent (from
the previous tests) similarity matrix is generated for every single test. Three reviewers are assigned to each
paper every time and the number of papers per reviewer is five. The only aspects that vary between the
different experiments are the numbers of papers and reviewers. Furthermore, the number of reviewers is
calculated automatically from the number of papers so that the algorithm satisfies the constraints that have
just been mentioned. The number of papers varies from 100 to 750 with a step of 50 (table 4). To reduce the
influence of the so-called random or uncontrollable factors (variables) each test is replicated 3 times and a
new similarity matrix is generated for each replication. In the context of paper assignment, there are two
major uncontrollable factors:
 System’s current load. It is obvious – higher load (for example number of users) results in higher
response time.
 Uniformity of distribution of papers and reviewers on conference topics and keywords. If every
reviewer in the first column of the matrix is suggested to about “papers per reviewer” number of
unique papers then the algorithm will work faster. And the opposite – if there are many papers
that could be best evaluated just by a few reviewers then the algorithm will work a bit slower
due to the increased number of iterations through the outermost do-while cycle.
To perform a valid and correct regression analysis all experimental variances should be homogeneous
(similar). The homogeneity check is done by the Cochran's C-test [34]. It confirms that all variances in table
5 are homogeneous so the subsequent regression analysis will be valid.
Preliminary theoretical data suggests that the most suitable regression model should be a second order
polynomial:

(7)

where:
ŷ – execution time of the algorithm, predicted by the model.
x – number of submitted papers.
b0, b1 and b2 – regression model parameters.
Model parameters could be determined by the method of least squares. However in the current
experiment they are calculated by using the regress() function from the MATLAB’s Statistics Toolbox. They
are as follows:
b0 = 0.115, b1 = -0.0016 and b2 = 0.000021.
So that the regression model is:
0.115 0.0016 0.000021 (8)
The meaning of regression parameters will be discussed later, but before that it is mandatory to check
if the model itself is appropriate (fits measurements well).
As all tests are performed with replication, i.e. multiple independent tests are conducted for every
single value of x, the total error could be divided into two components – error due to the lack of model fit
and error due to data/tests replication (sometimes called “pure” error). The model is considered to be
appropriate if its error, represented by the “lack-of-fit variance”, does not exceed the data error, represented
by the “data replication variance” [24, 25]. Then the goodness of fit could be determined by comparing these
two variances by Fisher’s F-test [20]. The calculated value of F based on the data from table 4, for
significance level α=0.05, and degrees of freedom d1=11 and d2=28 is F=0.60893. As it is lower than the
critical value Fα=0.05;11;28=2.15, then the second order polynomial regression model (7) is valid/good. So the
experiment confirms that the time complexity of the algorithm is Θ(n2).
The goodness of fit could be also confirmed visually on figure 6. The green line represents the curve
drawn by the regression model itself while the circles represent the real time measurements. Note how well
the real data points lie on the model curve. That visually proves the model is valid.
Table 4. Results from the experimental evaluation of the algorithm’s time complexity

Number Number of Reviewers Papers per Execution time, seconds


Exp. #
of papers reviewers per papers reviewer y1 y2 y3 sj2 ŷ εj2
1 100 60 3 5 0.1205 0.1551 0.1267 0.1341 0.0003 0.165 0.00095
2 150 90 3 5 0.3271 0.3896 0.3571 0.3579 0.001 0.3475 0.00011
3 200 120 3 5 0.6329 0.6077 0.6922 0.6443 0.0019 0.635 0.00009
4 250 150 3 5 1.0695 1.0288 1.1224 1.0736 0.0022 1.0275 0.00213
5 300 180 3 5 1.4955 1.6685 1.55 1.5713 0.0078 1.525 0.00214
6 350 210 3 5 2.2382 2.0456 2.147 2.1436 0.0093 2.1275 0.00026
7 400 240 3 5 2.9837 2.9017 2.7625 2.8826 0.0125 2.835 0.00227
8 450 270 3 5 3.5253 3.8046 3.6939 3.6746 0.0198 3.6475 0.00073
9 500 300 3 5 4.375 4.7027 4.5876 4.5551 0.0276 4.565 0.00010
10 550 330 3 5 5.6495 5.8733 5.5252 5.6827 0.0311 5.5875 0.00906
11 600 360 3 5 7.0701 6.7043 6.845 6.8731 0.034 6.715 0.02500
12 650 390 3 5 7.9953 8.2588 7.7514 8.0018 0.0644 7.9475 0.00295
13 700 420 3 5 9.6941 8.8369 9.3406 9.2905 0.1856 9.285 0.00003
14 750 450 3 5 11.3181 10.4739 11.0418 10.9446 0.1853 10.7275 0.04713

C 0.3185 Cα=0.05;14;2 0.3924


execution time of the i-th repetition of the j-th degrees of
yi (1..3)
experiment Lack-of-fit variance 0.02535 freedoms 11
degrees of
average execution time of the j-th experiment
Data replication variance 0.04163 freedoms 28
sj2 variance of the j-th experiment F 0.60893 F0.05;11;28 2.15
εj2 squared residual (difference between real average and estimated
value) of the j-th experiment
12

Real time measurements


Regression model
10

Execution time, seconds


8

Number of papers
0
100 200 300 400 500 600 700 800

Figure 6. Dependence of the total execution time (including the initial sorting of the matrix)
of the proposed algorithm on the number of submitted papers.
Visual evaluation of the model’s goodness of fit.

In general, regression parameters, bi, are strongly dependent on the hardware performance. Although
the current experiment is conducted on an old laptop computer their low value means a low amount of time
needed for processing a single paper. The coefficient b0 represents the total influence of the so called
uncontrollable/random factors (variables) and consists mostly of the execution times of the operations
performed roughly the same number of times, regardless the size of the input data. b1 is mostly influenced
by: forming the papersOfReviewer[] array; calculating similarity factors’ corrections; shifting rows
one position to the left; and assigning reviewers from the first column of the matrix to the corresponding
papers. Initial sorting of the matrix and deleting similarity factors from it have the highest contribution to b2.
In all previous experiments, the proposed algorithm has been compared in term of accuracy to the
maximum-weighted matching algorithm of Kuhn and Munkres. As the latter is one of the fastest and most
efficient maximum-weighted matching algorithms, the comparison continues but now in terms of running
time. Table 5 shows the execution times of both algorithms. As the number of papers and reviewers is the
same as in table 4, the execution time of the proposed algorithm is taken from there (as an average from all 3
replications for every test). The implementation of the Kuhn and Munkres algorithm is run without any
replication, because that would take enormous amount of time.
Table 5. Comparison, in terms of execution time, between the proposed algorithm
and the maximum-weighted matching algorithm of Kuhn and Munkres
Execution time, seconds

Number Number of Reviewers Papers per Suggested algorithm Algorithm of


of papers reviewers per paper reviewer ( from table 4) Kuhn and Munkres
(implemented by Miki
Hermann)
100 60 3 5 0.1341 13.3039
150 90 3 5 0.3579 50.365
200 120 3 5 0.6443 114.2994
250 150 3 5 1.0736 221.3307
300 180 3 5 1.5713 433.6833
350 210 3 5 2.1436 774.7328
400 240 3 5 2.8826 1021.1613
450 270 3 5 3.6746 1506.1925
500 300 3 5 4.5551 2247.9327
550 330 3 5 5.6827 3200.2958
600 360 3 5 6.8731 3808.8643
650 390 3 5 8.0018 5127.3992
700 420 3 5 9.2905 6297.5176
750 450 3 5 10.9446 8362.3662

All experiments are performed on an old laptop computer – CPU: Dual Core Intel Celeron T 3000 (1.8
GHz); RAM: 4 GB (DDR2); OS: Windows 7 Ultimate 32 bit; Web server: Apache 2.2.24 with PHP 5.4.13.
As known the most efficient implementation of the Kuhn and Munkres algorithm has a time
complexity of O(n3). We can note from table 6 that indeed the execution time is a cubic function of the
number of papers. Therefore, the implementation of the algorithm by Miki Hermann and Philippe Rigaux is
good enough and does not increase the previously known time complexity of this algorithm.

6. APPLYING THE PROPOSED ALGORITHM ON REAL DATA SETS


The developed data generator proved itself to be quite useful for experimental study of an assignment
algorithm and the individual aspects (factors) that influence its accuracy. However, testing the algorithm in a
real work environment is also important to prove its ability to work in "off-the-lab" conditions.
The proposed algorithm has been used to assign papers to reviewers in all CompSysTech [38]
conferences from 2007 to 2018, and in ADBIS 2007 [37] as well. Table 6 shows the results of its work for
CompSysTech 2010-2018. Data from previous editions are missing because in 2010 the architecture of the
CompSysTech’s conference management system has been significantly changed and the implementation of
the Kuhn and Munkres algorithm is integrated to the newer version only.
The third value (_x_x 3) in the “Papers/Reviewers” column means that 3 reviewers are assigned to
each paper. The value in brackets is the calculated number of papers per reviewer.
Table 6. Evaluation of the proposed algorithm with real data sets
Algorithm of Accuracy of
Papers / Suggested algorithm
Conference name Kuhn and Munkres the suggested
Reviewers algorithm
weight time weight time
75 x 73 x 3
CompSysTech’18 177.25 5.8727 s 174.71 0.0448 s 98.6 %
(3.08)
107 x 77 x 3
CompSysTech’17 261.32 22.1504 s 256.51 0.0677 s 98.2 %
(4.17)
117 x 75 x 3
CompSysTech’16 291.61 27.4821 s 290.84 0.0902 s 99.7 %
(4.68)
103 x 75 x 3
CompSysTech’15 256.87 19.5148 s 249.58 0.0791 s 97.1 %
(4.12)
107 x 66 x 3
CompSysTech’14 254.35 23.2076 s 253.37 0.0812 s 99.6 %
(4.86)

89 x 73 х 3
CompSysTech’13 227.18 10.2308 s 221.72 0.0713 s 97.6 %
(3.66)

94 х 87 х 3
CompSysTech’12 244.40 12.9081 s 240.74 0.0761 s 98.5 %
(3.24)

183 х 98 х 3
CompSysTech’11 474.75 121.5543 s 470.20 0.2378 s 99 %
(5.60)

134 х 77 х 3
CompSysTech’10 296.17 52.0996 s 289.55 0.0965 s 97.8 %
(5.22)

Besides that the accuracy is within the expected range, the significant difference in the execution time
of the proposed algorithm in respect to the reference one is quite noticeable. For example, for
CompSysTech’11 the algorithm of Kuhn and Munkres assigns 183 papers to 98 reviewers for more than 121
seconds while the proposed algorithm does the same just for 0.24 seconds. Furthermore, this significant
speed improvement is at expense of just 1% loss of accuracy.
Again, all experiments are performed on the same laptop computer – CPU: Dual Core Intel Celeron T
3000 (1.8 GHz); RAM: 4 GB (DDR2); OS: Windows 7 Ultimate 32 bit; Web server: Apache 2.2.24 with
PHP 5.4.13.

7. ACHIEVING HIGHER ACCURACY. ITERATIVE AND INTERACTIVE ASSIGNMENT.


The automatic assignment of reviewers to papers could be implemented as:
 Fully automatic. The PC chair just sets up the number of reviewers per paper while all other
settings are calculated automatically. The conference management system suggests n
reviewers to all of the submitted papers at once.
 Semi-automatic (interactive). Again, the conference management system suggests the
reviewers to be assigned to each paper, but the entire process runs interactively on multiple
steps and the PC chair has higher control of it.
Semi-automatic assignment could be quite useful when papers and reviewers’ competences are
described in multiple ways rather than just by a single method. For example they may be described by
taxonomy of keywords, but reviewers could be also allowed to state their willingness to review (or not)
specific papers. Combining different methods of describing papers and competences usually increases the
accuracy of the calculated similarity factors but it could also cause some “fairness” issues. For example,
there may be papers evaluated only by reviewers who have intentionally chosen them (while there is no way
to guarantee that the choice is based on professional interests rather than personal ones). Or there could be
reviewers who evaluate just papers they have chosen by themselves. To avoid that, the semi-automatic
assignment process provides a good way to balance between the similarity factors calculated based on
keywords match and the preferences stated explicitly by the reviewers. In this case, the assignment could be
implemented on multiple steps. Initially the “reviewers per paper” and “papers per reviewer” settings could
be set up from 1/2 to 2/3 of their target values and the assignment to be performed based only on reviewers’
preferences stating an expert or high level of competence. Then, on a second step, the assignment algorithm
could take into account all types of preferences together with the similarity factors calculated based on topics
match. During the second step, the “reviewers per paper” (m) and “papers per reviewer” (n) should be set up
to 100% of their target values.
To allow such an interaction, however, the algorithm should support an iterative (incremental)
assignment - i.e. to be able to run multiple times without deleting the assignments made during the previous
run. The iterative assignment is also very helpful when PC chairs need to reassign some papers to other
reviewers. As describing papers and competences involves subjective human activities, like self-
classification and self-assessment, there is always a chance that authors select improper keywords and the
paper goes to a reviewer who is not an expert in the respective subject domain. If that happens the reviewer
may decline to evaluate the paper, so it should be reassigned to another reviewer. Usually these
reassignments are done manually. But if their number increases, it is easier and more accurate to do it
automatically (if the algorithm supports it). Greedy and heuristic algorithms (in general), including the one
proposed here, support iterative assignment while maximum-weighted matching algorithms don’t.
As discussed during experiment E2, the suggested algorithm supports two working modes: uniform
distribution – every reviewer evaluates exactly n or n-1 papers; and threshold distribution – no reviewer will
be assigned to more than n papers. As the reference algorithm supports uniform distribution only, all
experiments are conducted in this mode. However removing the constraint for minimal number of assigned
papers per reviewer allows the algorithm to give higher priority to accuracy rather than load balancing. As a
result, the overall assignment accuracy increases and in threshold distribution mode it can get even higher
than the accuracy of the reference algorithm (which does not support this mode).

8. CONCLUSION
The manual assignment of reviewers to papers is applicable only to conferences having a low number
of submitted papers and registered reviewers. It assumes that PC chairs get familiar with the papers’ content
and reviewers competences and assign them in a way that every paper is evaluated by highly competent (in
its subject domain) reviewers while maintaining a load balancing of the PC members, so that all reviewers
evaluate roughly the same number of papers. In this sense, the assignment is a typical example of a classical
optimization task where limited resources (reviewers) should be assigned among a number of consumers
(papers), balancing between accuracy and uniformity of workload distribution.
In case of a large number of papers and reviewers, the manual assignment becomes extremely difficult
and time consuming and usually leads to dramatic decrease in the assignment accuracy. That is why every
conference management system should offer an automatic assignment as well. As an optimization problem,
it could be solved by any of the existing exhaustive search or heuristic algorithms. The relatively high time
complexity of O(n3) of exhaustive search algorithms makes them not quite suitable for implementation in
web applications in case of a large number of papers and reviewers. Existing heuristic algorithms, on the
other hand, are all designed as greedy algorithms assigning the most competent reviewers to papers
processed first and almost random reviewers to papers processed last.
The heuristic assignment algorithm, proposed in this article, achieves accuracy of about 98-99% in
comparison to the maximum-weighted matching (the most accurate / exhaustive search) algorithms, but has
better time complexity of Θ(n2). It provides an uniform distribution of papers to reviewers (i.e. all reviewers
evaluate the same number of papers); guarantees that if there is at least one reviewer competent to evaluate a
paper, then the paper will have a reviewer assigned to it; and allows iterative and interactive execution that
could further increase the accuracy and enables subsequent reassignments. Both accuracy and time
complexity are experimentally confirmed by performing a large number of experiments, followed by proper
statistical analyses.
Although the suggested algorithm is designed to be used in conference management systems, it is
universal and could be successfully implemented in other subject domains, where assignment or matching is
necessary. For example: assigning resources to consumers, tasks to persons, matching men and women on
dating web sites, grouping documents in digital libraries and others.
COMPLIANCE WITH ETHICAL STANDARDS
Conflict of Interest: The author declares that he has no conflict of interest.
Funding: There has been no financial support for this work that could have influenced its outcome.

REFERECES
1. Blei, David M., Ng, Andrew Y., Jordan, Michael I., and Lafferty, John. Latent dirichlet allocation.
Journal of Machine Learning Research, 3:2003, 2003.
2. Cechlárová K, Fleiner T, Potpinková E. Assigning evaluators to research grant applications: the case
of Slovak Research and Development Agency. Scientometrics, Volume 99, Issue 2, pp. 495-506,
Springer Netherlands, May 2014, Print ISSN 0138-9130, Online ISSN 1588-2861
3. Charlin, Laurent, and Richard Zemel. "The Toronto paper matching system: an automated paper-
reviewer assignment system." (2013).
4. Charlin, L., Zemel, R. and Boutilier, C. A framework for optimizing paper matching. In Proceedings
of the 27th Annual Conference on Uncertainty in Artificial Intelligence (Corvallis, OR, 2011). AUAI
Press, 86–95.
5. Conry, Don, Yehuda Koren, and Naren Ramakrishnan. "Recommender systems for the conference
paper assignment problem." In Proceedings of the third ACM conference on Recommender systems,
pp. 357-360. 2009.
6. Cormen, T. H., Leiserson, C., Rivest, R., Stein, C. Introduction to Algorithms (2nd ed.). MIT Press
and McGraw-Hill. ISBN 0-262-03293-7, 2001.
7. Dice, Lee R. (1945). "Measures of the Amount of Ecologic Association Between Species". Ecology.
26 (3): 297–302. doi:10.2307/1932409. JSTOR 1932409.
8. Dinic, E. A. Algorithm for solution of a problem of maximum flow in a network with power
estimation. Soviet Math. Dokl., 11(5): 1277–1280, 1970.
9. Edmonds, Jack; Karp, Richard M. "Theoretical improvements in algorithmic efficiency for network
flow problems". Journal of the ACM (Association for Computing Machinery) 19(2): 248–264, 1972.
doi:10.1145/321694.321699
10. Ferilli S., N. Di Mauro, T.M.A. Basile, F. Esposito, M. Biba. Automatic Topics Identification for
Reviewer Assignment. 19th International Conference on Industrial, Engineering and Other
Applications of Applied Intelligent Systems, IEA/AIE 2006. Springer LNCS, 2006, pp. 721-730.
11. Jaccard, Paul (1912), "The Distribution of the flora in the alpine zone", New Phytologist, 11: 37–50,
doi:10.1111/j.1469-8137.1912.tb05611.x
12. Kalinov, K. Practical Statistics for Social Sciences, Archeologists and Anthropologists. New
Bulgarian University, Sofia, 2002 (in Bulgarian).
13. Kalmukov, Y. An algorithm for automatic assignment of reviewers to papers. Proceedings of the
International Conference on Computer Systems and Technologies CompSysTech’06, Ruse, 2006,
pp. V.5-1-V.5-7.
14. Kalmukov, Y. Describing Papers and Reviewers’ Competences by Taxonomy of Keywords.
Computer Science and Information Systems, vol. 9, no. 2, 2012, pp. 763-789, ISSN 1820-0214.
15. Kou, Ngai Meng, Leong Hou U, Nikos Mamoulis, and Zhiguo Gong. "Weighted coverage based
reviewer assignment." In Proceedings of the 2015 ACM SIGMOD international conference on
management of data, pp. 2031-2046. 2015.
16. Kuhn, Harold W. "The Hungarian Method for the assignment problem". Naval Research Logistics
Quarterly. 1955:(2), pp. 83–97.
17. Lawler, E. L. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, Winston,
Newyork, 1976
18. Li, Xinlian, and Toyohide Watanabe. "Automatic paper-to-reviewer assignment, based on the
matching degree of the reviewers." Procedia Computer Science 22 (2013): 633-642.
19. Liu, Xiang, Torsten Suel, and Nasir Memon. "A robust model for paper reviewer assignment." In
Proceedings of the 8th ACM Conference on Recommender systems, pp. 25-32. 2014.
20. Lomax, Richard G. (2007). Statistical Concepts: A Second Course. p. 10. ISBN 0-8058-5850-4
21. Long, Cheng, Raymond Chi-Wing Wong, Yu Peng, and Liangliang Ye. "On good and fair paper-
reviewer assignment." In 2013 IEEE 13th International Conference on Data Mining, pp. 1145-1150.
IEEE, 2013.
22. Lowik, P. Comparative analysis between PHP’s native sort function and quicksort implementation in
PHP. http://stackoverflow.com/a/1282757, August 2009.
23. Matsumoto, M., Nishimura, T. "Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator." ACM Transactions on Modeling and Computer Simulation
(TOMACS), 8(1), 1998, pp. 3-30.
24. Mitkov., A. Theory of the Experiment. “Library for PhD Students” Series. Ruse, 2010, ISBN: 978-
954-712-474-5 (in Bulgarian)
25. Mitkov, A., Minkov, D. Methods for Statistical Analysis and Optimization of Agriculture machinery
– 2-nd part. Zemizdat Publishing House, Sofia 1993, ISBN: 954-05-0253-5 (in Bulgarian).
26. Munkres, J. "Algorithms for the Assignment and Transportation Problems". Journal of the Society
for Industrial and Applied Mathematics. 5:1, 1957, pp. 32–38.
27. Nguyen, Jennifer, Germán Sánchez-Hernández, Núria Agell, Xari Rovira, and Cecilio Angulo. "A
decision support tool using Order Weighted Averaging for conference review assignment." Pattern
Recognition Letters 105 (2018): 114-120.
28. Pesenhofer A., R. Mayer, A. Rauber. Improving Scientific Conferences by enhancing Conference
Management System with information mining capabilities. Proceedings IEEE International
Conference on Digital Information Management (ICDIM 2006), ISBN: 1-4244-0682-x; S. 359 - 366.
29. Price, Simon, and Peter A. Flach. "Computational support for academic peer review: A perspective
from artificial intelligence." Communications of the ACM 60, no. 3 (2017): 70-79.
30. Rigaux, Ph. An Iterative Rating Method: Application to Web-based Conference Management.
Proceedings of the 2004 ACM Symposium on Applied Computing (SAC'04), ACM Press N.Y., pp.
1682 – 1687, ISBN 1-58113-812-1.
31. Rodriguez M., J. Bollen. An Algorithm to Determine Peer-Reviewers. Conference on Information
and Knowledge Management (CIKM 2008), ACM Press, pp. 319-328.
32. Rosen-Zvi, Michal, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. "The author-topic model
for authors and documents." arXiv preprint arXiv:1207.4169 (2012).
33. Taylor, Camillo J. "On the optimal assignment of conference papers to reviewers." (2008).
34. W.G. Cochran, The distribution of the largest of a set of estimated variances as a fraction of their
total, Annals of Human Genetics (London) 11(1), 47–52 (January 1941)
35. Zhai, ChengXiang, Atulya Velivelli, and Bei Yu. "A cross-collection mixture model for comparative
text mining." In Proceedings of the tenth ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 743-748. 2004.
36. Zubarev, Denis, Dmitry Devyatkin, Ilia Sochenkov, Ilia Tikhomirov, and Oleg Grigoriev. "Expert
Assignment Method Based on Similar Document Retrieval." In Data Analytics and Management in
Data Intensive Domains: ХХI In-ternational Conference DAМDID/RCDL'2019 (October 15–18,
2019, Kazan, Russia): Conference Proceedings. Edited bу Alexander Elizarov, Boris Novikov,
Sergey Stupnikov.–Kazan: Kazan Federal University, 2019.–496 р., p. 339. 2019.
37. ADBIS 2007, International Conference on Advances in Databases and Information Systems,
http://www.adbis.org/
38. CompSysTech, International Conference on Computer Systems and Technologies,
http://www.compsystech.org/
39. CyberChair, A Web-based Paper Submission & Review System,
http://www.borbala.com/cyberchair/
40. EasyChair, conference management system, http://www.easychair.org/
41. EDAS: Editor’s Assistant, conference management system, http://edas.info/
42. Halevi, Shai. Web Submission and Review Software, http://people.csail.mit.edu/shaih/websubrev/
43. MathWorks. MATLAB – Statistic Toolbox. http://www.mathworks.com/products/statistics/
44. Microsoft Conference Management Toolkit, https://cmt3.research.microsoft.com/About
45. Miki Hermann, Professor in Algorithms and Complexity. http://www.lix.polytechnique.fr/~hermann/
46. OpenConf Conference Management System, http://www.openconf.com/
47. Philippe Rigaux, http://deptinfo.cnam.fr/~rigaux/
48. The MyReview System, a web-based conference management system,
http://myreview.sourceforge.net/ (Accessed January 2017. Unavailable now)
49. van de Stadt, R., CyberChair: A Web-Based Groupware Application to Facilitate the Paper Reviewing
Process. 2001, Available at www.cyberchair.org.
APPENDIX 1: AN EXAMPLE
Think of a hypothetical conference. Let’s assume there are 5 submitted papers and 5 registered
reviewers. Each paper should be evaluated by 2 reviewers, so every reviewer should evaluate exactly (5*2)/5
= 2.0 papers.
Let the similarity matrix is as follows. Rows represent papers and columns represent reviewers.
0.60 0.53 0.47 0.40 0.55
0.89 0.25 0.50 0.65 0.80
0.50 0.55 0.37 0.57 0.60
0.53 0 0 0.40 0.50
0.50 0 0 0.33 0.25
At the beginning, the algorithm sorts each row by the similarity factor in descending order. As a result,
the first column suggests the most competent reviewer to every single paper.
0.60 0.55 0.53 0.47 0.40
0.89 0.80 0.65 0.50 0.25
0.60 0.57 0.55 0.50 0.37
0.53 0.50 0.40 0 0
0.50 0.33 0.25 0 0

The first column suggests that the reviewer r1 should be assigned to 4 papers (p1, p2, p4 and p5).
However, the maximum allowed number of papers per reviewer is 2, i.e. nobody should review more than two
papers. So the algorithm has to decide which 2 of these 4 papers to assign to r1. At a glance it seems logical
that r1 should be assigned to p1 and p2, as they have the highest similarity factors with him/her. On the other
hand, there are fewer reviewers competent to evaluate p4 and p5 so they should be processed with priority. If r1
has to be assigned just to p4 or p5 which one is more suitable? One may say p4 as it has higher similarity
factor. However, the second-suggested reviewer of p4 is almost as competent as r1, while the second-suggested
reviewer of p5 is much less competent in it than r1 is. In this case it is better to assign r1 to p5 rather than to p4.
If r1 is assigned to p4, then p5 will be evaluated by less competent reviewers only, a situation that is highly
undesirable. So when deciding which papers to assign to a specific reviewer, the algorithm should take into
account both the number of competent reviewers for each paper as well as the rate of decrease in the
competence of the next-suggested reviewers for those papers. To automate the process the algorithm modifies
the similarity factors from the first column by adding two corrections – C1 and C2. They are calculated by
formulas 3 and 4. C1 takes into account the number of non-zero similarity factors with pi (i.e. the number of
reviewers competent to evaluate pi), while C2 depends on the rate of decrease in the competence of the next-
suggested reviewers for pi.
The specific values of C1 и C2 are as follows:
C2 p1, r1 2 * 0.05 0.1
C1 p1, r1 0.0625
C2 p2, r1 2 * 0.09 0.18
C1 p2, r1 0.0625
C2 p3, r5 2 * 0.03 0.06
C1 p3, r5 0.0625
C2 p4, r1 2 * 0.03 0.06
C1 p4, r1 0.25
C2 p5, r1 2 * 0.17 0.34
C1 p5, r1 0.25

To preserve the real weight of matching, similarity factors should be modified in an auxiliary data
structure (an ordinary array) rather than the matrix itself. Here is the first column stored in a single-dimension
array.

0.60 0.89 0.60 0.53 0.50


After adding C1 и C2 the first column of the matrix will look like:

0.76 1.13 0.72 0.84 1.09


As the number of papers per reviewer is 2 then r1 is assigned to those 2 papers which have the highest
similarity factors with him/her after modification. These are p2 and p5.
Rows corresponding to p1 and p4 in the similarity matrix are shifted one position to the left so that the
next-competent reviewers are suggested to these papers. As reviewer r1 has already got the maximum allowed
number of papers to review, he/she is considered to be busy and no more papers should be assigned to him/her
in future. Thus all similarity factors, outside the first column, between r1 and all papers are deleted from the
matrix. Deletion guarantees that he/she will not be assign to any more papers.

r1 0.60 0.55 0.53 0.47 0.40 0


0.89 0.80 0.65 0.50 0.25
0.60 0.57 0.55 0.50 0.37
r1 0.53 0.50 0.40 0 0 0
0.50 0.33 0.25 0 0
After shifting p1 and p4 one position left and deleting all occurrences of r1 outside the first column of the
matrix, it will look like:
0.55 0.53 0.47 0.40 0
0.89 0.80 0.65 0.50 0.25
0.60 0.57 0.55 0.37 0
0.50 0.40 0 0 0
0.50 0.33 0.25 0 0

Now r1 is suggested not to 4 but just to 2 papers. However, after all operations performed above, r5 is
now suggested to 3 papers. To decide which 2 of these 3 papers to assign to r5, the algorithm again modifies
the similarity factors taken from the newly-formed first column of the matrix. As in the previous step, this is
done by using formulas 3 and 4. After modification the first column will look like:

0.70 1.13 0.77 1.70 1.09


r5 should be assigned to those two papers which have the highest similarity factors with him/her. These
are p3 and p4. The row corresponding to p1 is shifted one position to the left again, so that the next-competent
reviewer is suggested to that paper. As r5 already has 2 papers to review, all similarity factors outside the first
column that are associated with him/her are deleted from the matrix.

r5 0.55 0.53 0.47 0.40 0 0


0.89 0.80 0.65 0.50 0.25
0.60 0.57 0.55 0.37 0
0.50 0.40 0 0 0
0.50 0.33 0.25 0 0

Therefore, the matrix looks like:

0.53 0.47 0.40 0 0


0.89 0.65 0.50 0.25 0
0.60 0.57 0.55 0.37 0
0.50 0.40 0 0 0
0.50 0.33 0 0 0

As seen in the first column, no reviewer is suggested to evaluate more than 2 papers. So it is now
possible to assign all reviewers from the first column directly to the papers which they are suggested to. If the
last matrix is compared to the initial one, it could be spotted that 3 of 5 papers (p2, p3 и p5) are assigned to
their most competent reviewers. One (p4) has got its second-competent reviewer and another one (p1) its third-
competent reviewer. However, the levels of competence of these reviewers in respect to p4 and p1 are very
close to the levels of the most competent reviewers for these papers. r5 is assigned to p4 with a similarity factor
of 0.50 while the most competent reviewer for p4 has a similarity factor of 0.53.

APPENDIX 2: DETAILED PSEUDO CODE


The detailed pseudo code here could be directly translated in any high-level imperative programming
language as each operation in the pseudo corresponds to an operator or a built-in function in the chosen
programming language. Here is the meaning of the complex data structures (mostly arrays) used within the
code:
SM[i,j] – similarity matrix - bi-dimensional array of arrays, where rows (i) represent papers and
columns (j) represent reviewers. Each element contains an associative array of two elements – revUser (the
user id of the reviewer who is suggested to evaluate paper i) and weight (the similarity factor between paper i
and reviewer revUser).
papersOfReviewer[] – an associative array, whose keys corresponds to the usernames (revUser) of
the reviewers who appear in the first column of the similarity matrix; and values containing arrays of two
elements – id of the paper, being suggested to this reviewer; and the similarity factor between the paper and
the reviewer.
For example:
papersOfReviewer['jsmith'] = array(
array('paperId'=>15, 'sim'=>0.87),
array('paperId'=>7, 'sim'=>0.63),
... );

rowsToShift[] – an array containing the row ids (these are actually the paper ids) that should be
shifted one position to the left, so that the next-competent reviewer is suggested to this paper.
signifficantSF[paperId] – an array holding the number of significant, non-zero, similarity
factors for every paper, identified by its paperId.
reviewersToRemove[] – an array holding the identifiers (revUser) of the reviewers who are
already busy (i.e. have enough papers to review) and should be reviewed from the similarity matrix (except
its first row) on the next pass through the outermost do-while cycle, so that they are not assigned to any more
papers.
busyReviewers[] – an array holding the identifiers of the reviewers who are already busy (i.e.
have enough papers to review). This is similar to reviewersToRemove with one major difference –
busyReviewers keeps identifiers all the time, while reviewersToRemove is cleared on each pass after
deleting the respective reviewers (similarity factors) from the similarity matrix.
maxPapersToAssign[j] – an array holding the maximum number of papers that could be
assigned to every reviewer j, identified by its revUser.

1 for every paper (row) i in the similarity matrix SM {


2 SM[i] = Sort SM[i] by similarity factor sim in descending order;
3 }
4 // the maximum number of papers a reviewer should evaluate
5 papersPerReviewer = ceil(
6 (number of papers * number of reviews per paper) / number of reviewers);
7
8 do {
9 assignmentReady = true;
10 papersOfReviewer = []; // empty array
11 for every paper (row) i in the similarity matrix SM {
12 if (row is empty, i.e. no competent reviewers for paper i) {
13 Continue to the next paper;
14 }
15 if (paper i has enough reviewers assigned to it) {
16 Continue to the next paper;
17 }
18 // shifting SM[i] one position to the left (if necessary),
19 // so that the second-competent reviewer is suggested to paper i.
20 if (i belongs to the rowsToShift array) {
21 shift(SM[i]);
22 signifficantSF[i]--;
23 }
24 // deleting the busy reviewers from the i-th row of the matrix SM
25 for every reviewer j in reviewersToRemove array {
26 if (j is not the first element of SM[i]) {
27 if (sim(i, j) > 0) {
28 signifficantSF[i]--;
29 }
30 delete(SM[i, j]);
31 }
32 }
33 // getting the most competent reviewer for paper i.
34 revUser = Get the username from the first element
35 (column) of SM[i], i.e. SM[i,0]['revUser'];
36 weight = Get the similarity factor from the first element
37 of SM[i], i.e. SM[i,0]['weight'];
38 // If revUser is marked as busy, he/she is assigned as a reviewer to this
39 // paper on previous iteration and should not be processed now.
40 if (revUser does not belong to the busyReviewers array) {
41 element['paperId'] = i;
42 element['sim'] = weight + calculateCorrection(i);
43 Add the "element" array to papersOfReviewer[revUser];
44 }
45 } // end for every paper i in the similarity matrix SM
46
47 // The modified first column of SM is already copied to papersOfReviewer[].
48 reviewersToRemove = []; // clearing the reviewersToRemove
49 rowsToShift = []; // and rowsToShift arrays.
50 for every reviewer j in papersOfReviewer array {
51 papersOfReviewer[j] = Sort papersOfReviewer[j] by similarity
52 factor sim in descending order;
53 maxPapersToAssign[j] = papersPerReviewer -
54 getNumberOfAlreadyAssignedPapers(j);
55 if (count(papersOfReviewer[j]) > maxPapersToAssign[j]) {
56 for every element k positioned after
57 the first maxPapersToAssign[j] in papersOfReviewer[j] {
58 Add k['paperId'] in rowsToShift[] array;
59 }
60 Add j in reviewersToRemove[];
61 Add j in busyReviewers[];
62 assignmentReady = false;
63 }
64 } // end for every reviewer j
65 } while (!assignmentReady);
66
67 // At this point(for sure) no reviewer is suggested to more papers than his/her
68 // corresponding value in maxPapersToAssign[]. I.e. no one will review more
69 // than the maximum allowed number of papers. So all reviewers in the first
70 // column of the matrix could be directly assigned to their respective papers.
71 for every paper (row) i in the similarity matrix SM {
72 if (row is empty, i.e. no competent reviewers for paper i) {
73 Continue to the next paper;
74 }
75 if (paper i has enough reviewers assigned) {
76 Continue to the next paper;
77 }
78 Assign the reviewer SM[i,0]['revUser'] to evaluate paper i
79 // that is the reviewer suggested to evaluate paper i
80 // i.e. the reviewer in the first column of the i-th row of the matrix.
81 // After the assignment, delete the reviewer SM[i,0]['revUser']
82 // from the first column of SM.
83 }

You might also like