Supplementary Materials For: Improving Refugee Integration Through Data-Driven Algorithmic Assignment
Supplementary Materials For: Improving Refugee Integration Through Data-Driven Algorithmic Assignment
org/content/359/6373/325/suppl/DC1
Kirk Bansak, Jeremy Ferwerda, Jens Hainmueller,* Andrea Dillon, Dominik Hangartner,
Duncan Lawrence, Jeremy Weinstein
Figures xviii
Tables xxx
Materials and Methods
Background Information
The Refugee Assignment Process in the United States
Refugees in the United States are resettled in a collaborative process between the Department of
State and nine voluntary resettlement agencies. Individual cases are allocated to one of the nine
agencies at a weekly draft, with a randomized order, held in Washington DC. While agencies
have the capacity to select cases according to preferences, they must fulfill specific quotas with
respect to potential ‘hardship cases’ (namely medical cases). Upon receiving a case in the draft,
individual agencies are responsible for placing cases across their office network.
Several constraints apply to this internal placement process. First, resettlement agencies
are mandated to settle individuals with “US Ties,” that is a family or friend acting as a sponsor,
to the most proximate resettlement location. Second, resettlement agencies do not control the
locations of refugees who have been granted Special Immigrant Visas (SIVS). Finally,
resettlement agencies are mandated with providing refugees with sufficient medical care
appropriate to their condition.
The remaining set of cases – approximately 30% of the caseload in a current year – are
assigned weekly on a case by case basis to affiliate sites. When making this assignment,
placement officers view the nationality, case structure, and any medical characteristics of the
case. They then consult local affiliate constraints with respect to medical conditions and
language (some sites lack interpreters necessary to cater to particularly nationalities). If such
constraints do not apply, refugee cases are assigned to offices with the smallest proportion of
their yearly capacity currently filled. These capacity constraints are determined prior to the
fiscal year, in cooperation with the Department of State.
Local affiliates provide caseworker assistance and material support for the 90 days mandated
by the US Resettlement Program, or in the case of individuals assigned to the matching grant
program, 180 days. At the conclusion of this service period, agencies report outcomes to the
Department of State. Individual offices are monitored to ensure that all required services are
delivered and that 90-day employment outcomes remain above a reasonable threshold.
i
In cases where families are applying for asylum together, it is possible for one or several
members to arrive prior to the rest of the family, in which case the first arrival(s) will be assigned
to a canton and trailing arrivals will automatically be reunited later with their family members
in the same canton. This situation applies to only 5% of the total number of cases in our data. In
our backtests of the algorithm in the Swiss context, we omit trailing arrivals from the prediction
data (though they are included in the model data).
Refugees that obtain the status of subsidiary protection are protected from deportation and
are granted temporary legal residency. Even while waiting for their asylum decision, a lengthy
process that takes on average almost two years (4), asylum seekers are allowed to work within
3 to 6 months within arrival. However, asylum seekers waiting for their decision and refugees
that are granted subsidiary protection are required to live and work within their assigned canton.
In the Swiss permit classification, this subsidiary protection status is known as an F permit.
There also exists a B permit for refugees that have a well-founded fear of being persecuted
and are accepted under the 1951 Refugee Convention. Over our study period, the majority of
refugees were granted an F permit. Since employment information is only reported for these
refugees, they are the focus of our study.
Algorithm
This section provides details on the design of the algorithm and its implementation in the
backtests presented in the main text. Figure S3 displays a schematic detailing the algorithm’s
key features, which are discussed in depth below.
Data Designation
The first step in the algorithm is to designate the set of data that will be used for statistical
model training (L matrix in Figure S3). This will be historical resettlement data in which the
unit of observation is a single refugee. In these model training data, each refugee’s assigned
location and integration outcome of interest, in addition to their full set of covariates, must be
observed. The assigned locations are necessary because, as will be explained further below, a
separate statistical model will be fit for each location, which requires subsetting the refugee data
by assigned location. The integration outcome is necessary, of course, because the statistical
models will be formulated to predict that outcome.
The out-of-sample prediction data must also be designated (R matrix). The prediction data
correspond to new refugee arrivals and must include the same set of covariates as in the model
training data. In contrast to the model training data, however, the prediction data need not
include refugees’ assigned locations or integration outcomes. In fact, in a real-world
prospective implementation of the algorithm, refugees belonging to the prediction data will not
have yet been assigned to a resettlement location, and thus, their assigned locations and
integration outcomes will not yet have been realized. (For the purposes of assessing and
validating the algorithm via backtests, however, the prediction data may correspond to newer
refugee arrivals who do already have location assignments and integration outcomes, as long
as those data were not part of the model training data.)
ii
An important note is that both the model training and prediction data should be subsetted
to the population of refugees for whom the integration outcome of interest is relevant. In the
U.S. and Swiss applications presented in the main text, the integration outcome of interest was
employment, and thus the population of interest was working-age refugees. Another important
note specifically about the prediction data is that it should be subsetted only to those refugees
who are free to be assigned to different resettlement locations—in contrast to refugees with
predetermined geographic destinations due to family ties and other special circumstances—as
this is the subset for whom the algorithm is designed to help with the assignment process. In
contrast, the model training data need not be restricted to only these “free cases.” Free-case
and non-free-case refugees might be sufficiently dissimilar that forecasting free-case refugees’
integration outcomes with models built using non-free-case data may seem problematic. This
issue is addressed, however, by including case type as a predictor variable in the model building
process, as described further below.
As noted above, new refugee arrivals who have not yet been assigned to locations would be
designated as the out-of-sample prediction data in the real-world prospective implementation
of the algorithm, with the model training data including the most recent data on refugees for
whom location assignment and integration outcomes are available. As a way to validate the
modeling process described below and assess the algorithm more generally, the main text has
presented backtests of the algorithm using small subsets of the historical data as the out-of-
sample prediction data.
Modeling
In the modeling stage of the algorithm, the model training data is used to build a bundle of
statistical models that predict refugees’ probability of the integration outcome (or value of the
outcome, if continuous outcomes were to be used), and those models are then applied to the
out-of-sample prediction data to generate their predicted probabilities.
The goal of the algorithm is to discover synergies between refugee characteristics and
resettlement locations such that refugees can be matched to the resettlement locations that best
fit their own personal profiles. Thus, it is not sufficient to simply achieve a model that is highly
predictive of the integration outcome in general; instead, the modeling stage must also
prioritize interactions between location assignment and refugee characteristics if it is to
provide useful information upon which an optimized assignment can then be performed.
Accordingly, the modeling stage is implemented on a location-by-location basis.
Specifically, for each resettlement location, the model training data are first subsetted to those
refugees who were assigned to that location, and a statistical model is then fit that uses those
refugees’ characteristics to predict their integration outcome. That fitted model is then applied
to the out-of-sample prediction data to predict the probability of the integration outcome for
the new refugee arrivals should they be sent to the location in question. This process is
performed separately and independently for each individual location, which yields for each
individual refugee in the prediction data a vector of predicted probabilities, one for each
location. Collectively for all refugees in the prediction data, the final result is then a matrix of
predicted probabilities (M matrix) with rows representing individual refugees and columns
iii
representing resettlement locations.
The following more formally delineates the modeling process. For each refugee i = 1, ..., n,
let the outcome of interest (e.g. employment) be denoted by yi ∈ {0, 1} and the location
assignment denoted by wi ∈ {1, ..., k}, for a total of k possible resettlement locations. Let ~xi
denote a p-dimensional feature vector comprised of the characteristics of refugee i, and xim
denote the m-th feature in ~xi , where m = 1, ..., p. The goal of the modeling stage is to, for any
i and j, predict P (yi = 1|~xi , wi = j). Denote the function µj (~xi ) = P (yi = 1|~xi , wi = j). The
following describes the steps in the modeling stage.
1. Designate the historical model training data and denote it by the matrix L:
2. Train a set of k models, S = {µ̂1 (~xi ), ..., µ̂j (~xi ), ..., µ̂k (~xi )} as follows.
For j = 1, ..., k:
(a) Subset L to refugees for whom wi = j (i.e. refugees assigned to j-th location), and
call this Lj :
Modeling and estimation of µ̂j (~xi ) should ideally be undertaken using a flexible
supervised machine learning technique that automatically performs feature
selection and can identify complex non-linearities and interactions in the feature
space, with cross-validation employed in the model training in order to achieve
flexibility without over-fitting. Possible machine learning techniques for modeling
and estimating µ̂j (~xi ) include boosted trees (20, 21), random forests (22),
elastic-net logistic regression (23), and kernel-based regularized least squares (24).
3. Designate the data on new refugee arrivals and denote them by the matrix R:
iv
ẋ11 · · · ẋ1m · · · ẋ1p
~ẋ1
.. .. .. ..
. . . .
~ẋi
R = ẋi1 · · · ẋim · · · ẋip =
. .. .. ..
.. . .
.
ẋnR 1 · · · ẋnR m · · · ẋnR p ~ẋnR
4. For all refugees in R and all resettlement locations, estimate P (ẏi = 1|~ẋi , ẇi = j) as
follows.
For i = 1, ..., nR :
For j = 1, ..., k:
Estimate P (ẏi = 1|~ẋi , ẇi = j) by applying j-th model in S to ~ẋi :
Pb(ẏi = 1|~ẋi , ẇi = j) = µ̂j (~ẋi ) = αij
Arrange the αij into a vector, α
~ i = [αi1 , ..., αik ].
5. Produce a matrix of predicted probabilities, with rows corresponding to new refugees and
columns corresponding to resettlement locations, as follows.
Arrange vectors α
~ i into rows of the matrix M:
As noted above, modeling and estimation of µ̂j (~xi ) in step 2(b) is undertaken using
supervised machine learning techniques. Two primary criteria are used to inform which
technique(s) are considered and then actually employed in our implementation of the modeling
stage. The first, and most important, is predictive performance. For testing purposes, we can
use historical data for both the model training data (L) and the test data (R), thereby allowing
us to empirically assess out-of-sample model fit for each machine learning technique, as
described below.
The second criterion is the ability to handle many predictors (both continuous and
categorical) and perform implicit variable selection. The refugee data contain many features
that can be considered as possible predictors, yet the degree of predictiveness offered by each
v
feature will be unknown in advance and may vary across resettlement locations in ways that
are difficult to anticipate from a theory-driven perspective. This issue is exacerbated by the fact
that feature non-linearities and interactions of unknown order may also provide significant
predictiveness. Furthermore, there are also certain key interactions that must be considered,
namely interactions between case type and all other variables, since the prediction data will
only pertain to free cases while the model training data will include all case types. Thus, the
machine learning technique must be capable of efficiently exploring a high-dimensional
feature space and automatically identifying predictive features and feature interactions while
ignoring or remaining robust to irrelevant features. This is particularly important for
resettlement locations with sparser historical data in order to guard against overfitting to too
many and possibly irrelevant features.
As is general practice in supervised machine learning, a number of validation techniques
and metrics should be used in both model building and model assessment. In particular, for
any technique involving tuning parameters, cross-validation can be used to select the parameter
values. In addition, as mentioned above, backtests that use separate subsets of historical data
for the model training and test data can be performed, which allows for the assessment of
out-of-sample model fit for each machine learning technique. A number of model fit metrics
can be used, including mean squared error and percent reduction in error given a particular
classification threshold.
The modeling stage also includes the option of calibrating the predicted probabilities.
Given binary integration outcome variables of interest, classification machine learning models
should be used. While model fit is often most easily assessed via some metric of classification
accuracy, the ultimate outputs of interest of the fitted models for our algorithm given a binary
outcome variable are not classification predictions, but rather class probabilities (i.e. predicted
probabilities of a positive integration outcome). As has been shown by previous studies in the
literature on supervised learning, classification models can sometimes output unreliable or
biased predicted class probabilities even when they offer respectable classification
accuracy (25–28). While some studies have provided evidence on which classification
techniques, by virtue of their mathematical and algorithmic features, are likely to yield
unreliable predicted probabilities and in what ways any bias is likely to manifest, the empirics
and theory establishing general results are limited in this regard, and the problem is likely to be
context- and data-dependent. Nonetheless, calibration plots can be used to assess the
possibility of unreliable predicted probabilities outputted from any supervised learning model,
and various calibration methods have been introduced to correct for any observed bias. Two
common calibration methods, which are considered for use in our algorithm, are Platt scaling
and isotonic regression (25, 29).
For the modeling stage of the application backtests presented in the main text, we assessed
various machine learning techniques that can efficiently handle mixed data and remain robust
in the presence of many and possibly irrelevant predictors, including gradient boosted
trees (21), random forests (22), elastic-net logistic regression (23), and kernel-based
regularized least squares (24). We found gradient boosted trees to exhibit the best performance
on the basis of out-of-sample classification accuracy (percent reduction in error) and predicted
probability reliability (calibration plots). Specifically, we used stochastic gradient boosted
vi
trees (bag fraction of 0.5) with a binomial deviance loss function (20, 30), which we
implemented in R using the gbm package (31). Gradient boosted trees have a number practical
characteristics that make them well-suited for use in our algorithm, namely their robustness to
irrelevant predictors, automatic variable selection, and ability to discover complex interactions
without requiring researcher specification. In addition to the well-known predictive
performance of boosted trees, they also have attractive theoretical characteristics. In particular,
it has been shown that given i.i.d. samples of a binary outcome and a feature vector that
predicts the outcome, boosted tree ensembles achieve consistency and converge to the minimal
error rate (Bayes risk) in the large sample limit given reasonable regularity conditions and
appropriate regularization techniques, such as early stopping (32–34).
We used cross-validation within the training data to select tuning parameter values,
including the interaction depth, learning rate, and number of boosting iterations (the early
stopping point) in our implementation of gradient boosted trees. Parameters were tuned
independently for each location-specific model. In addition, on the basis of our model fit
metrics, we also found model performance to be best without the use of probability calibration
methods.
Mapping
The next step in the algorithm involves mapping the refugee-level predicted probabilities from
the modeling stage to a case-level metric. As described above, the final output of the modeling
stage is a matrix of refugee-location predicted probabilities (M matrix), with rows representing
the individual refugees in the prediction data and columns representing the resettlement
locations under consideration. However, the information in this form is not immediately usable
for the matching stage of the algorithm because refugees are typicaly not assigned to locations
on an individual basis but rather on a case-level basis, where cases are most often family units.
Thus, a case-level metric must be constructed. In other words, refugee-location predicted
probabilities must be mapped to a case-location metric, where the metric is some mapping
function of the predicted probabilities. Specifically, for each case-location pair, the mapping
function is applied to the refugee-location predicted probabilities for all refugees belonging to
that case, yielding a single value for that case-location pair. This results in a new matrix (M∗
matrix) with the same number of columns (locations) as previously but now as many rows as
cases rather than refugees.
The mapping stage is described formally here. Begin with the modeling stage output matrix
M:
vii
individual refugees needing to be assigned (i.e. the number of refugees in the matrix R). Further,
let g = 1, ..., h denote the case to which each refugee belongs, with a total of h cases, where
h ≤ nR . The mapping stage process then proceeds as described below.
For j = 1, ..., k:
Let α̃gj = {αij ∀ i ∈ g}. (That is, α̃gj is the set of all αij for the j-th location
and refugees belonging to the g-th case.)
Compute γgj = φ(α̃gj ) where φ is a predetermined mapping function.
Arrange the γgj into a vector, ~γg = [γg1 , ..., γgk ].
2. Produce a matrix containing the case-level metric for all case-location pairs, with rows
corresponding to cases and columns corresponding to resettlement locations, as follows.
Arrange vectors ~γg produced in step 1 into rows of the matrix M∗ :
In step 1, the function φ must be specified. In our default implementation presented in the
main text, φ is the following: Y
φ(α̃gj ) = 1 − (1 − αij )
i∈g
That is, the default φ employed in the main text backtests maps individual probabilities to a
metric that measures the predicted probability that at least one refugee in the case has a positive
employment outcome at the location in question.
This formula employs a simplifying assumption that the probabilities of employment for
refugees within a case are independent. While unlikely to be true, we confirm that this
assumption is a reasonable implementation decision by estimating the intraclass (within case)
correlation coefficient (ICC) for free-case working-age refugees’ employment outcomes in our
data. For the most recent year of data in the United States (2016), we find the ICC to be 0.15,
and for the pooled data from 2011-2016, the ICC is 0.22. We opted to employ this particular φ
given that it best reflects the underlying goal of the refugee resettlement program to generate
self-sufficient refugee families.
viii
In addition, backtests are also reported in the SM employing the following alternative
functions for φ, which do not require independence or other additional assumptions:
1 X
mean : φ(α̃gj ) = (αij )
ng i∈g
Matching
The final stage of the algorithm is the optimal assignment of cases to specific locations. The
case-location metric is used as the assessment metric, and assignment is performed to satisfy a
chosen optimality criterion subject to specified constraints. The optimality criterion employed
in application backtests of the algorithm presented in the main text is a global maximum on the
case-level metric. In other words, the assignment process searches over the entire case-location
matrix and determines which case should be assigned to which location in order to maximize
the average value of the metric for all chosen case-location pairs.
Prior to actually performing the assignment according to the chosen optimality criterion,
however, constraints must be built into the process, where those constraints represent
real-world restrictions on how many cases/refugees can be sent to different locations, what
types of cases/refugees different locations can accommodate (e.g. medical condition
restrictions), etc. In other words, the constraints primarily involve limiting the number of
assignments to each location and prohibiting certain case-location pairs. From the
implementation standpoint, such constraints can be easily built into the process by
pre-specifying the number of available assignment slots for each location, and by setting the
case-location metric to an arbitrary low value for the prohibited case-location pairs.
Application of optimal matching will then find the globally optimal assignment while
implicitly respecting the constraints.
To perform the matching process, the task can be reformulated as a linear sum assignment
problem (LSAP) with a predetermined number of slots denoting the number of cases that can
be assigned to each resettlement location. We employ a standard LSAP formulation for this
purpose (35).
Begin with the mapping stage output matrix M∗ :
ix
Recall that j = 1, ..., k denotes the resettlement locations, and g = 1, ..., h denotes the refugee
cases, where h is the total number of refugee cases needing to be assigned (i.e. the number of
cases in the matrix R). The matching stage proceeds as follows:
1. Determine the number of case slots for each of the k resettlement locations based on
capacity constraints as follows.
For j = 1, ..., k:
That is, the columns of Q are simply duplicates of the columns of M∗ , where the number
of duplicates for the j-th column of M∗ is determined by tj .
That is, C is an h x h cost matrix whose entries are 1 − γgj for all possible refugee case-
location pairs, with columns duplicated to allow for multiple cases (tj ) to be sent to each
of the j locations subject to the constraint that kj=1 tj = h.
P
4. Match each refugee case to a location to achieve the maximum sum of the case-level
metric (i.e. minimum cost), subject to the location-specific constraints imposed by the tj .
This can be achieved via the standard LSAP approach, as follows.
x
Determine case-location matches (i.e. unique matches of rows to columns of X) by
solving for the matrix X that achieves the following:
h X
X h
arg min cab xab
X
a=1 b=1
h
X
xab = 1 f or b = 1, 2, ..., h
a=1
Various algorithms have been developed for solving the LSAP formulation above,
beginning with the introduction of the Hungarian algorithm in the 1950s (36, 37). We employ
the RELAX-IV cost flow solver developed by Bertsekas and Tseng (17) and implemented in R
by the optmatch package (16).
In addition to imposing capacity constraints on the number of cases each location can
accept, as described above, restrictions can also be imposed on specific case-location pairs.
For instance, it may be that certain locations cannot accommodate cases with severe medical
conditions. To incorporate such restrictions in the optimal matching of cases to locations, the
LSAP process proceeds in the same manner as described above with one difference: for the
entries of C corresponding to restricted case-location matches, the cost value cab is changed to
an arbitrarily high value, effectively ensuring that such a match will not be realized.
Data
United States
For the United States we draw on registry data from one of the largest resettlement agencies.
The data include all refugees that were resettled by this agency arriving from quarter 1, 2011 to
quarter 3, 2016. We restrict the data to refugees of working age defined as those between 18 and
64 years of age at the time of arrival. We remove a small number of duplicates and locations
that have had less than 200 refugees assigned to them over the entire period. In the final data
there are 33,782 refugees from 22,144 cases. Of those, 9,506 refugees are from free cases.
Table S1 shows the descriptive statistics for the sample of refugees from the United States.
Below is a list of variables and measures used in the backtests for the United States.
xi
• Male: Binary variable coded as 1 for males and 0 for females.
• Speaks English: Binary variable coded as 1 for refugees who speak English at the time of
arrival and 0 otherwise.
• Employed: Binary variable coded as 1 for refugees who are employed at 90 days after
arrival, and 0 otherwise.
• Free case: Binary variable coded as 1 for refugees who are free cases with no U.S. ties,
and 0 otherwise.
Switzerland
For Switzerland we draw on registry data from the ZEMIS database, which the SEM uses to
process asylum claims and record employment information for refugees with subsidiary
protection. The sample includes all persons who applied for asylum between 1999 and 2013,
were between 18 and 65 years of age upon arrival, and subsequently were granted an F permit
and subsidiary protection within five years of arrival. We remove one canton that had less than
100 refugees assigned over the entire period. We also remove observations with missing
outcome and/or arrival time data. In the final data used in the primary backtest reported in the
main text, there are 22,159 refugees.
Table S2 shows the descriptive statistics for the sample of refugees from Switzerland. Below
is a list of variables and measures used in the backtests for Switzerland.
• Employed year Y: Binary variable coded as 1 for refugees who are employed at the end
of year Y after arrival, and 0 otherwise.
xii
• Free case: Binary variable coded as 1 for refugees who are free cases, and 0 otherwise.
• Christian: Binary variable coded as 1 for refugees who are Christian, and 0 otherwise.
• Muslim: Binary variable coded as 1 for refugees who are Muslim, and 0 otherwise.
xiii
Supplementary Text
Additional Results and Diagnostics for the United States
Model Fit
To train the algorithm we employ the following predictors: Free case, Speaks English, Age
at arrival, Male, Education (ordered variable differentiating between no/unknown education,
less than secondary, secondary, technical/professional, and university), Country of origin (one
binary variable for each of the largest origin groups including Burma, Iraq, Bhutan, Somalia,
Afghanistan, Democratic Republic of Congo, Iran, Eritrea, Ukraine, Syria, Sudan, Ethiopia,
and Moldova), Year of arrival, and Month of arrival.
To evaluate the model fit of our backtest, we assess classification accuracy and probability
calibration. To assess classification accuracy, we compare the actual observed employment
outcomes of the 2016 quarter 3 (Q3) test set cohort to their predicted employment probabilities
at their actual assigned locations, and we compute the fitted models’ reduction in error
compared to the standard null model. The standard null model for binary classification is the
model that predicts the majority outcome for all test set observations. In our backtest, this
means that the null model predicts non-employment for all 2016 Q3 refugees; given a 34%
employment rate among those refugees, this results in a 34% classification error. We then
evaluate the algorithmic predicted probabilities of employment for the 2016 Q3 refugees at
their actual assigned locations. For all probabilities greater than 0.5, we consider this to be a
predicted classification of employed, and for all probabilities less than or equal to 0.5, not
employed. We then compute the percent of incorrect classifications, which is 26%. This
reduction from 34% to 26% error is a roughly 23% reduction in error.
To assess probability calibration, we perform a similar comparison using calibration plots
(also known as reliability curves). Figure S4 shows a calibration plot for the predicted
probabilities of the U.S. backtest on the 2016 Q3 test set, along with a histogram of the
predicted probabilities. In the calibration plot, each point represents a bin of refugees who
arrived in 2016 Q3, with binning in intervals of 0.1 along the x-axis. For instance, the first
point on the left of the plot corresponds to 2016 Q3 refugees with a predicted probability of
employment at their actual assigned location between 0 and 0.1. The x-axis measures the mean
predicted probability of employment at assigned locations for each bin, and the y-axis
measures the fraction of refugees in each bin who were actually employed at their assigned
locations. The closeness of the curve to the identity line indicates the reliability of the
predicted probabilities.
xiv
quarter 3 that are reported in the main text. We find that the employment gains are considerable
across these various time periods. For each test quarter the algorithmic assignments achieve
first order stochastic dominance over the actual assignments in terms of the distributions of
predicted probabilities of employment. Compared to the actual assignments, the algorithmic
assignments increase the average employment rate by 41% for refugees who arrived in 2016
quarter 3, 26% for refugees who arrived in 2016 quarter 2, 29% for refugees who arrived in
2016 quarter 1, and 66% for refugees who arrived in 2015 quarter 4.
xv
the results from the specification reported in the main text where we allow only as many cases
as were received in actuality (Algorithmic 0%). We find that relaxing the constraints by
allowing locations to receive more cases than they received in actuality allows the algorithm to
generate additional gains in the predicted probabilities of employment over the assignment
where the number of cases for each location are fixed based on the actual assignment.
Interestingly, these additional gains appear to disproportionately benefit those refugees at the
bottom of the distribution with low employment probabilities.
As reported in the main text, our constrained algorithmic assignment (Algorithmic 0%)
results in a predicted average employment rate that is 41% above the baseline employment rate
actually observed in 2016 quarter 3. Relaxing the assignment constraints by 10%, 20%, 30%,
40%, and 50% result in average gains above the baseline of 47%, 50%, 54%, 56%, and 59%,
respectively.
xvi
(.15 to .25) for refugees who arrived in 2012, and 73% (.15 to .26) for refugees who arrived in
2013, respectively.
xvii
Figures
xviii
Fig. S1: Variation in refugee employment in the United States, free cases only. Figure
replicates Fig. 1 in the main text using only free case refugees (n = 9506), showing
how refugee employment at 90 days after arrival varies as a function of refugees’ (a)
assigned resettlement location, (b) personal characteristics, and (c) synergies between their
characteristics and locations. Panel (a) shows average employment rates by resettlement
location. Panel (b) displays the estimated change in the probability of employment for various
refugee characteristics, pooling across refugees assigned to all resettlement locations. Dots
with horizontal lines indicate point estimates with heteroskedasticity-robust 95% confidence
intervals from linear least squares regression. The unfilled dots on the zero line denote reference
categories. Panel (c) displays similar results for two specific resettlement locations. The data
for all three panels include free-case working-age refugees resettled by one of the largest U.S.
resettlement agencies during the 2011-2016 period.
(a) Geographic context (b) Personal characteristics
1 (n=173)
2 (n=203)
3 (n=10) Male ●
4 (n=188)
5 (n=169)
6 (n=176)
7 (n=305) Age:
8 (n=219)
9 (n=391) 18−29 ●
10 (n=78) 30−39 ●
11 (n=127)
12 (n=151) 40−49 ●
13 (n=314)
14 (n=477) 50+ ●
15 (n=555)
16 (n=235)
Resettlement Location
(c) Synergies
Location 14 Location 31
Male ● ●
Age:
18−29 ● ●
30−39 ● ●
40−49 ● ●
50+ ● ●
Speaks English ● ●
Education:
None/Unknown ● ●
Less than Secondary ● ●
Secondary ● ●
Advanced ● ●
University ● ●
Nationality:
Other ● ●
Burma ● ●
Iraq ● ●
Bhutan ● ●
Somalia ● ●
Afghanistan ● ●
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3
Change in Pr(Employment)
xix
Fig. S2: Variation in predictor effects across cantons in Switzerland. Figure displays
the estimated changes in the probability of third-year employment for various refugee
characteristics, comparing results from Vaud (the largest primarily French-speaking canton)
and Zurich (the largest primarily German-speaking canton). Dots with horizontal lines indicate
point estimates with heteroskedasticity-robust 95% confidence intervals from linear least
squares regression. The unfilled dots on the zero line denote reference categories. The data
are from the Swiss State Secretariat for Migration (SEM) and include working-age refugees
who received subsidiary protection status in the 1999-2013 period (Vaud: n = 2398, Zurich:
n = 4137).
Vaud Zurich
Male ● ●
Age:
18−29 ● ●
30−39 ● ●
40−49 ● ●
50+ ● ●
Speaks French ● ●
Nationality:
Other ● ●
Somalia ● ●
Serbia ● ●
Iraq ● ●
Afghanistan ● ●
Sri Lanka ● ●
−0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4
Change in Pr(Employment)
xx
Fig. S3: Data-driven algorithm for refugee assignment. Figure shows a schematic of the
main stages of the algorithm.
MODELING
L R 1. Fit and validate a bundle of K Calibration
Model Data Prediction Data statistical models (one for each of Options and
Historical data used New data whose the K locations) on the L data using Metrics
for statistical model outcomes will be covariates to predict outcome. Platt scaling,
fitting predicted
2. Use the fitted models to construct K isotonic
N0 x (P+2) N1 x P (one for each location) predicted regression,
outcome values/probabilities for calibration curves
each of the N1 observations in R.
O M
Other Predicted Outcomes
Covariate Predicted values/probabilities
data for the N1 observations in R at MAPPING Case-level Metric
from R each of the K possible locations E.g. Probability of positive
Map the individual-level outcome for at least 1
N1 x P N1 x K individual in the case.
predicted values/probabilities
in M to a case-level outcome 𝛾𝛾𝑖𝑖𝑖𝑖 = 1 − �(1 − 𝛼𝛼𝑔𝑔𝑔𝑔 )
metric, such that each case has 𝑔𝑔 𝜖𝜖 𝑖𝑖
where i denotes case, j
K metric values (one for each denotes location, g denotes
M* location). This will be the
metric upon which assignment
individual, and α denotes the
individual-level outcome
Case Metric will be optimized. probability.
Case-level predicted outcome
metric for the S1 cases (of the N1
observations) at the K locations
S1 x K
MATCHING
Optimality
Use the case-level metric to assign each Criterion
A* case to a single location such that an Global maximum
optimal assignment is achieved, sum on case-level
Case Assignments
according to specified optimality criterion metric
Location assignments for each
of the S1 cases and subject to specified constraints.
S1 x 1
Constraints Optimization
Restrictions on number Technique
O A of assignments for each Implementation of
location, assignment of optimal matching (e.g.
Other Refugee Assignments
particular cases to RELAX-IV minimum cost
Covariate Location assignments for each
data of the N1 individual refugees specific locations, etc. flow solver)
from R from R
N1 x P N1 x 1
xxi
Fig. S4: Probability calibration plot (and histogram) for 2016 Q3 U.S. backtest. The top
panel (calibration plot) illustrates the reliability of the predicted probabilities of employment
for the 2016 Q3 test set (n = 919). Each point represents a bin of refugees who arrived in
2016 Q3, and the bins cover equal-sized intervals. The x-axis measures the mean predicted
probability of employment at assigned locations for each bin, and the y-axis measures the
fraction of refugees in each bin who were actually employed at their assigned locations. The
bottom panel (histogram) shows the distribution of predicted probabilities of employment at the
assigned locations.
(a) Calibration plot
1.00
●
0.75
●
Fraction of Positives
●
●
0.50
● ●
●
0.25
●
0.00
(b) Histogram
80
60
Count
40
20
0
0.00 0.25 0.50 0.75 1.00
Predicted Probability
xxii
Fig. S5: Probability calibration plot (and histogram) for 2013 Switzerland backtest. The
top panel (calibration plot) illustrates the reliability of the predicted probabilities of third-year
employment for the 2013 test set (n = 888). Each point represents a bin of refugees who arrived
in 2013, and the bins cover decile intervals of the predicted probabilities (deciles are used given
the extreme skew of the distribution). The x-axis measures the mean predicted probability of
employment at assigned cantons for each bin, and the y-axis measures the fraction of refugees
in each bin who were actually employed at the end of their third year at their assigned cantons.
The bottom panel (histogram) shows the distribution of predicted probabilities of employment
at the assigned cantons.
(a) Calibration plot
1.00
0.75
Fraction of Positives
0.50
0.25 ●
●
●
●
●
●
●
●
●
0.00
(b) Histogram
200
150
Count
100
50
0
0.00 0.25 0.50 0.75 1.00
Predicted Probability
xxiii
Fig. S6: Employment gains from data-driven refugee assignment in the United States for
various time periods. Figure displays the results of backtesting the algorithm using refugees
who arrived in the United States in different quarters. Panels show the empirical cumulative
distribution functions (ECDFs) of the refugees’ predicted 90-day employment probabilities
under their actual and algorithmic assignments for refugees who arrived in 2016 quarter 3 (panel
a, n = 919), 2016 quarter 2 (panel b, n = 506), 2016 quarter 1 (panel c, n = 367), and 2015
quarter 4 (panel d, n = 366), respectively. The data include working-age refugees resettled by
one of the largest U.S. resettlement agencies.
(a) Employment gains for 2016, quarter 3 arrivals (b) Employment gains for 2016, quarter 2 arrivals
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.50 0.50
0.25 0.25
0.00 0.00
(c) Employment gains for 2016, quarter 1 arrivals (d) Employment gains for 2015, quarter 4 arrivals
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
xxiv
Fig. S7: Employment gains from data-driven refugee assignment in Switzerland for
various time periods. Figure displays the results of backtesting the algorithm using refugees
who arrived in the Switzerland in different years. Panels show the empirical cumulative
distribution functions (ECDFs) of the refugees’ predicted third-year employment probabilities
under their actual and algorithmic assignments for refugees who arrived in 2010 (panel a,
n = 731), 2011 (panel b, n = 1043), 2012 (panel c, n = 1110), and 2013 (panel d, n = 888),
respectively. The data are from the Swiss State Secretariat for Migration (SEM) and include
working-age refugees who received subsidiary protection status.
(a) Employment gains for 2010 arrivals (b) Employment gains for 2011 arrivals
Assignment: Actual Algorithmic Assignment: Actual Algorithmic
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.50 0.50
0.25 0.25
0.00 0.00
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
Predicted Probability of Employment Predicted Probability of Employment
(c) Employment gains for 2012 arrivals (d) Employment gains for 2013 arrivals
Assignment: Actual Algorithmic
Assignment: Actual Algorithmic
1.00
1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75
0.75
0.50
0.50
0.25
0.25
0.00 0.00
xxv
Fig. S8: Employment gains from data-driven refugee assignment in the United States
using various case-level metrics. Figure displays the results of backtesting the algorithm
using refugees who arrived in the United States in 2016 quarter 3 (n = 919). Panels
show the empirical cumulative distribution functions (ECDFs) of the refugees’ predicted 90-
day employment probabilities under their actual and algorithmic assignments for refugees
who arrived in 2016 quarter 3 when the following mapping functions are used to transform
individual refugee-level predicted probabilities to a case-level metric: probability that at least
one refugee within case is employed (panel a), mean probability of employment within case
(panel b), maximum probability of employment within case (panel c), minimum probability of
employment within case (panel d). The data include working-age refugees resettled by one of
the largest U.S. resettlement agencies.
(a) Default mapping function (b) Alternative mapping: Mean probability within case
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
(c) Alternative mapping: Max probability within case (d) Alternative mapping: Min probability within case
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
xxvi
Fig. S9: Employment gains from data-driven refugee assignment in Switzerland for
shorter- and longer-term outcomes. Figure displays the results of backtesting the algorithm
when optimizing on shorter- and longer-term employment outcomes. The test set are refugees
who arrived in 2012. Panels show the empirical cumulative distribution functions (ECDFs)
of the refugees’ predicted employment probabilities at the end of their second year (panel a,
n = 1114), third year (panel b, n = 1110), and fourth year in Switzerland (panel c, n = 1067)
under their actual and algorithmic assignments. The data are from the Swiss State Secretariat for
Migration (SEM) and include working-age refugees who received subsidiary protection status.
1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75
0.50
0.25
0.00
(b) Gains for third-year employment (c) Gains for fourth-year employment
Assignment: Actual Algorithmic Assignment: Actual Algorithmic
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
xxvii
Fig. S10: Employment gains from data-driven refugee assignment in the United States in
2016 Q3 for various training period lengths. Figure displays the results of backtesting the
algorithm using refugees who arrived in the United States in 2016 Q3 (n = 919), using different
training period lengths for the modeling stage of the algorithm. Panels show the empirical
cumulative distribution functions (ECDFs) of the refugees’ predicted 90-day employment
probabilities under their actual and algorithmic assignments given training using a period of
5.5 years prior to 2016 Q3 (panel a), a training period of 4.5 years (panel b), a training period of
3.5 years (panel c), and a training period of 2.5 years (panel d), respectively. The data include
working-age refugees resettled by one of the largest U.S. resettlement agencies.
(a) Employment gains given 5.5 years of training data (b) Employment gains given 4.5 years of training data
Assignment: Actual Algorithmic
Assignment: Actual Algorithmic
1.00
1.00
0.75
0.75
0.50
0.50
0.25
0.25
0.00 0.00
(c) Employment gains given 3.5 years of training data (d) Employment gains given 2.5 years of training data
Assignment: Actual Algorithmic Assignment: Actual Algorithmic
1.00 1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
xxviii
Fig. S11: Employment gains from data-driven refugee assignment in the United States
for various location assignment constraints. Figure displays the results of backtesting the
algorithm with different location assignment constraints for refugees who arrived in the United
States in quarter 3, 2016 (n = 919). The plot shows the empirical cumulative distribution
functions (ECDFs) of the refugees’ predicted 90-day employment probabilities under their
actual and various algorithmic assignments. Algorithmic 0% imposes the constraint that each
location can only receive as many cases under the algorithmic assignment as were received
in actuality. Algorithmic 10%, 20%, 30%, 40%, and 50% allow for assignments with up to
10%, 20%, 30%, 40%, and 50% more cases being assigned to a location than were received
in actuality, respectively. The projected employment rate under the algorithmic assignment
for each scenario (beginning with the 0% scenario) is 48%, 50%, 51%, 52%, 53%, and
54%, respectively. Compared to the observed baseline employment rate of 34%, these gains
correspond to 41%, 47%, 50%, 54%, 56%, and 59% increases above the baseline, respectively.
The data include working-age refugees resettled by one of the largest U.S. resettlement
agencies.
1.00
Fraction of Refugees with Equal or Lower Predicted Probability of Employment
0.75
Assignment:
Actual
Algorithmic 0%
Algorithmic 10%
0.50
Algorithmic 20%
Algorithmic 30%
Algorithmic 40%
Algorithmic 50%
0.25
0.00
xxix
Tables
Mean SD
Male 0.53 0.50
Speaks English 0.42 0.49
Age:
18-29 0.44 0.50
30-39 0.28 0.45
40-49 0.16 0.37
50+ 0.11 0.31
Education:
None/Unknown 0.18 0.39
Less than Secondary 0.39 0.49
Secondary 0.21 0.41
Advanced 0.10 0.30
University 0.12 0.33
Origin:
Burma 0.23 0.42
Iraq 0.20 0.40
Bhutan 0.13 0.34
Somalia 0.11 0.31
Afghanistan 0.07 0.25
Other 0.26 0.44
Employed 0.23 0.42
Sample consists of refugees of working age that were resettled
by one of the largest resettlement agencies and arrived in the
period from quarter 1, 2011 to quarter 3, 2016. N = 33,782.
xxx
Table S2: Descriptive Statistics for Swiss Refugee Sample
Mean SD
Male 0.59 0.49
Speaks French 0.07 0.25
Age:
18-29 0.59 0.49
30-39 0.26 0.44
40-49 0.09 0.28
50+ 0.06 0.23
Country of Origin:
Serbia 0.14 0.35
Somalia 0.15 0.36
Afghanistan 0.12 0.33
Sri Lanka 0.09 0.28
Iraq 0.13 0.34
Other 0.37 0.48
Employed Year 1 0.04 0.19
Employed Year 2 0.15 0.36
Employed Year 3 0.22 0.42
Employed Year 4 0.28 0.45
Employed Year 5 0.33 0.47
Sample consists of subsidiary protection refugees
of working age that were assigned by the Swiss
State Secretariat for Migration in the 1999-2013
period. N = 22,159.
xxxi
Table S3: Cantons of Switzerland
Canton abbrv.
Aargau AG
Appenzell Ausserrhoden AR
Appenzell Innerrhoden AI
Basel-Landschaft BL
Basel-Stadt BS
Berne BE
Fribourg FR
Geneva GE
Glarus GL
Grisons GR
Jura JU
Lucerne LU
Neuchâtel NE
Nidwalden NW
Obwalden OW
Schaffhausen SH
Schwyz SZ
Solothurn SO
St. Gallen SG
Thurgau TG
Ticino TI
Uri UR
Valais VS
Vaud VD
Zug ZG
Zürich ZH
xxxii
References and Notes
1. J. Smith, L. Daynes, Borders and migration: An issue of global health importance. Lancet
Glob. Health 4, e85–e86 (2016). doi:10.1016/S2214-109X(15)00243-0 Medline
2. Médecins Sans Frontières, The Illness of Migration: Ten Years of Medical Humanitarian
Assistance to Migrants in Europe and in Transit Countries (2013); www.aerzte-ohne-
grenzen.de/sites/germany/files/attachments/msf-the-illness-of-migration-2013.pdf.
3. P. Connor, Explaining the refugee gap: Economic outcomes of refugees versus other
immigrants. J. Refug. Stud. 23, 377–397 (2010). doi:10.1093/jrs/feq025
4. J. Hainmueller, D. Hangartner, D. Lawrence, When lives are put on hold: Lengthy asylum
processes decrease employment among refugees. Sci. Adv. 2, e1600432 (2016).
doi:10.1126/sciadv.1600432 Medline
5. M. Marbach, J. Hainmueller, D. Hangartner, The Long-Term Impact of Employment Bans on
the Economic Integration of Refugees (Stanford-Zurich Immigration Policy Lab
Working Paper 17-03, 2017); https://ssrn.com/abstract=3078172.
6. Although our focus is on refugee resettlement, the same issues also apply to the assignment of
asylum seekers. Resettled refugees are persons who have officially been granted refugee
status in advance of their arrival into the host country. In contrast, an asylum seeker is a
person who has fled his or her home country and submitted a formal request for asylum
in a host country. Authorities in that country then process the request and decide whether
to grant official refugee status to the asylum seeker. This determination can take several
years to conclude, and in the interim, the asylum seeker is typically placed within a
specific location in the host country.
7. P.-A. Edin, P. Fredriksson, O. Åslund, Ethnic enclaves and the economic success of
immigrants—Evidence from a natural experiment. Q. J. Econ. 118, 329–357 (2003).
doi:10.1162/00335530360535225
8. L. A. Beaman, Social networks and the dynamics of labour market outcomes: Evidence from
refugees resettled in the US. Rev. Econ. Stud. 79, 128–161 (2011).
doi:10.1093/restud/rdr017
9. A. P. Damm, Ethnic enclaves and immigrant labor market outcomes: Quasi-experimental
evidence. J. Labor Econ. 27, 281–314 (2009). doi:10.1086/599336
10. J. Fernández-Huertas Moraga, H. Rapoport, Tradable immigration quotas. J. Public Econ.
115, 94–108 (2014). doi:10.1016/j.jpubeco.2014.04.002
11. J. Fernández-Huertas Moraga, H. Rapoport, Tradable refugee-admission quotas and EU
asylum policy. CESifo Econ. Stud. 61, 638–672 (2015). doi:10.1093/cesifo/ifu037
12. T. Andersson, L. Ehlers, Assigning Refugees to Landlords in Sweden: Stable Maximum
Matchings (Lund University, 2016);
http://project.nek.lu.se/publications/workpap/papers/wp16_18.pdf.
13. D. Delacrétaz, S. D. Kominers, A. Teytelboym, Refugee Resettlement (University of
Melbourne, 2016); www.t8el.com/jmp.pdf.
14. Fernández-Huertas Moraga and Rapoport (10, 11) couple an auction mechanism for
tradeable refugee quotas with a preference-matching algorithm that optimizes over
refugees’ preferences for resettlement countries and countries’ preferences over refugee
types. Andersson and Ehlers (12) focus on within-country matching and develop an
algorithm to find a stable maximum matching of refugees to landlords given induced
preferences for landlords and refugee families. Delacrétaz et al. (13) provide algorithms
that optimize match efficiency subject to multidimensional capacity constraints and
incorporate refugee preferences and location priorities.
15. The formula we use is ygj = 1 – ∏i∈g(1 – αij), where αij corresponds to the predicted
probability of a positive employment outcome for refugee i at location j, and g denotes a
particular case. Note that this formula uses a simplifying assumption that the
probabilities of employment for refugees within a case are independent. See the
supplement for additional details and alternative mapping functions—the mean,
maximum, and minimum predicted probability of employment within each case—that do
not require this assumption.
16. B. B. Hansen, S. O. Klopfer, Optimal full matching and related designs via network flows. J.
Comput. Graph. Stat. 15, 609–627 (2006). doi:10.1198/106186006X137047
17. D. P. Bertsekas, P. Tseng, RELAX-IV: A Faster Version of the RELAX Code for Solving
Minimum Cost Flow Problems (MIT Laboratory for Information and Decision Systems,
1994).
18. Each location can only accommodate a limited number of cases each year. Other less
common restrictions include the inability of certain locations to accept cases with severe
medical conditions or particular languages. The assignment algorithm is designed to
incorporate such constraints.
19. In the Swiss data, employment at the end of the first calendar year after arrival is associated
with a 62–percentage point (P < 0.0001) increase in the probability of employment at the
end of the second year and a 43–percentage point (P < 0.0001) increase at the end of the
third calendar year; the P values are from a linear regression of actual second- or third-
year employment on first-year employment, respectively.
20. J. H. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning (Springer, ed.
2, 2009).
21. J. H. Friedman, Greedy function approximation: A gradient boosting machine. Ann. Stat. 29,
1189–1232 (2001). doi:10.1214/aos/1013203451
22. L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001). doi:10.1023/A:1010933404324
23. H. Zou, T. Hastie, Regularization and variable selection via the elastic net. J. R. Stat. Soc. B
67, 301–320 (2005). doi:10.1111/j.1467-9868.2005.00503.x
24. J. Hainmueller, C. Hazlett, Kernel regularized least squares: Reducing misspecification bias
with a flexible and interpretable machine learning approach. Polit. Anal. 22, 143–168
(2014). doi:10.1093/pan/mpt019
25. B. Zadrozny, C. Elkan, Obtaining calibrated probability estimates from decision trees and
naive Bayesian classifiers. In Proceedings of the 18th International Conference on
Machine Learning (2001), pp. 609–616.
26. A. Niculescu-Mizil, R. Caruana, Predicting good probabilities with supervised learning. In
Proceedings of the 22nd International Conference on Machine Learning (2005), pp.
625–632.
27. A. Niculescu-Mizil, R. Caruana, Obtaining calibrated probabilities from boosting. In
Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005), pp.
413–420.
28. B. Zadrozny, C. Elkan, Transforming classifier scores into accurate multiclass probability
estimates. In Proceedings of the 8th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (2002), pp. 694–699.
29. J. C. Platt, in Advances in Large Margin Classifiers, A. J. Smola, P. Bartlett, B. Schölkopf,
D. Schuurmans, Eds. (MIT Press, 2000), chap. 5.
30. J. H. Friedman, Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).
doi:10.1016/S0167-9473(01)00065-2
31. G. Ridgeway, Package ‘gbm’, Tech. rep., CRAN (2017).
32. L. Breiman, Population theory for boosting ensembles. Ann. Stat. 32, 1–11 (2004).
doi:10.1214/aos/1079120126
33. W. Jiang, Process consistency for adaboost. Ann. Stat. 32, 13–29 (2004).
doi:10.1214/aos/1079120128
34. T. Zhang, B. Yu, Boosting with early stopping: Convergence and consistency. Ann. Stat. 33,
1538–1579 (2005). doi:10.1214/009053605000000255
35. R. Burkard, M. Dell’Amico, S. Martello, Assignment Problems (Society for Industrial and
Applied Mathematics, revised reprint, 2012).
36. H. W. Kuhn, The Hungarian method for the assignment problem. Nav. Res. Logist. 2, 83–97
(1955). doi:10.1002/nav.3800020109
37. J. Munkres, Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl.
Math. 5, 32–38 (1957). doi:10.1137/0105003