0% found this document useful (0 votes)

22 views23 pages

Degroot

descripcion del algoritmo genetico de matching para estudios observacionales.

Uploaded by

Gustavo Martinez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views23 pages

Degroot

descripcion del algoritmo genetico de matching para estudios observacionales.

Uploaded by

Gustavo Martinez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

NCSS Statistical Software NCSS.

com

Chapter 123

Data Matching –
Optimal and Greedy
Introduction
This procedure is used to create treatment-control matches based on propensity scores and/or observed covariate
variables. Both optimal and greedy matching algorithms are available (as two separate procedures), along with
several options that allow the user to customize each algorithm for their specific needs. The user is able to choose
the number of controls to match with each treatment (e.g., 1:1 matching, 1:k matching, and variable (full)
matching), the distance calculation method (e.g., Mahalanobis distance, propensity score difference, sum of rank
differences, etc.), and whether or not to use calipers for matching. The user is also able to specify variables whose
values must match exactly for both treatment and controls in order to assign a match. NCSS outputs a list of
matches by match number along with several informative reports and optionally saves the match numbers directly
to the database for further analysis.

Matching Overview

Observational Studies
In observational studies, investigators do not control the assignment of treatments to subjects. Consequently, a
difference in covariates may exist between treatment and control groups, possibly resulting in undesired biases.
Matching is often used to balance the distributions of observed (and possibly confounding) covariates.
Furthermore, in many observational studies, there exist a relatively small number of treatment group subjects as
compared to control group subjects, and it is often the case that the costs associated with obtaining outcome or
response data is high for both groups. Matching is used in this scenario to reduce the number of control subjects
included in the study. Common matching methods include Mahalanobis metric matching, propensity score
matching, and average rank sum matching. Each of these will be discussed later in this chapter. For a thorough
treatment of data matching for observational studies, the reader is referred to chapter 1.2 of D'Agostino, Jr.
(2004).

The Propensity Score

Ideally, one would match each treatment subject with a control subject (or subjects) that was an exact match on
each of the observed covariates. As the number of covariates increases or the ratio of the number of control
subjects to treatment subjects decreases, it becomes less and less likely that an exact match will be found for each
treatment subject. Propensity scores can be used in this situation to simultaneously control for the presence of
several covariate factors. The propensity score was introduced by Rosenbaum and Rubin (1983). The propensity
score for subject i (i = 1, …, N) is defined as the conditional probability of assignment to a treatment (Zi = 1)
versus the control (Zi = 0), given a set (or vector) of observed covariates, xi. Mathematically, the propensity score
for subject i can be expressed as

123-1
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

e( x i ) = pr ( Z i = 1 | X i = x i ) .
It is assumed that the Zi’s are independent, given the X’s. The observed covariates, xi, are not necessarily the same
covariates used in the matching algorithm, yi, although they could be. Rosenbaum and Rubin (1985a) suggest
using the logit of the estimated propensity score for matching because the distribution of transformed scores is
often approximately normal. The logit of the propensity score is defined as
 1 − e( x ) 
q( x ) = log  ,
 e( x ) 
Matching on the observed propensity score (or logit propensity score) can balance the overall distribution of
observed covariates between the treatment and control groups. The propensity score is often calculated using
logistic regression or discriminant analysis with the treatment variable as the dependent (group) variable and the
background covariates as the independent variables. Research suggests that care must be taken when creating the
propensity score model (see Austin et al. (2007)). For more information about logistic regression or discriminant
analysis, see the corresponding chapters in the NCSS manuals.

Optimal vs. Greedy Matching

Two separate procedures are documented in this chapter, Optimal Data Matching and Greedy Data Matching.
The goal of both algorithms is to produce a matched sample that balances the distribution of observed covariates
between the treatment and matched-control groups. Both algorithms allow for the creation of 1:1 or 1:k matched
pairings. Gu and Rosenbaum (1993) compared the greedy and optimal algorithms and found that “optimal
matching is sometimes noticeably better than greedy matching in the sense of producing closely matched pairs,
sometimes only marginally better, but it is no better than greedy matching in the sense of producing balanced
matched samples.” The choice of the algorithm depends on the research objectives, the desired analysis, and cost
considerations. We recommend using the optimal matching algorithm where possible.
The optimal and greedy algorithms differ in three fundamental ways:
1. Treatment of Previously-Matched Subjects
2. Complete vs. Incomplete Matched-Pair Samples
3. Variable (Full) Matching

Treatment of Previously-Matched Subjects

Optimal matching refers to the use of an optimization method based on the Relax-IV algorithm written by Dimitri
P. Bertsekas (see Bertsekas (1991)), which minimizes the overall sum of pair-wise distances between treatment
subjects and matched control subjects. The Relax-IV algorithm is based on network flow theory, and matching is
just one of its many uses. Optimal matching is not a linear matching algorithm in the sense that as the algorithm
proceeds, matches are created, broken, and rearranged in order to minimize the overall sum of match distances.
Greedy matching, on the other hand, is a linear matching algorithm: when a match between a treatment and
control is created, the control subject is removed from any further consideration for matching. When the number
of matches per treatment is greater than one (i.e., 1:k matching), the greedy algorithm finds the best match (if
possible) for each treatment before returning and creating the second match, third match, etc. Once a treatment
subject has been matched with the user-specified number of control subjects, the treatment subject is also
removed from further consideration. A familiar example of a greedy algorithm is forward selection used in
multiple regression model creation.

Complete vs. Incomplete Matched-Pair Samples

Optimal matching only allows for complete matched-pair samples, while greedy matching also allows for
incomplete matched-pair samples. A complete matched-pair sample is a sample for which every treatment is
matched with at least one control. An incomplete matched-pair sample is a sample for which the number of

123-2
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

treatment subjects matched is less than the total number of treatment subjects in the reservoir. Rosenbaum and
Rubin (1985b) present strong reasons for avoiding incomplete matched-pair samples.

Variable (Full) Matching

Variable (or “Full”) matching is only available using the optimal matching algorithm. In variable matching, a
different number of controls may be matched with each treatment. Each control is used only once, and each
treatment receives at least one control. All eligible controls (e.g. all controls for which at least one treatment-
control distance is non-infinite) are matched. Results from Gu and Rosenbaum (1993) suggest that in terms of
bias reduction, full matching performs much better than 1:k matching. If we require that every treatment have the
same number of controls, and the distributions between the two groups of covariates are not the same, then some
treatments will be paired with controls that are not good matches. Variable matching, on the other hand, is more
flexible in allowing control subjects to pair with the closest treatment subject in every case.
The gains in bias reduction for variable matching over 1:k matching, however, must be weighed against other
considerations such as simplicity and aesthetics. The analysis after 1:k matching would arguably be more simple;
a more complex analysis method (e.g. stratified analysis) would be employed after variable matching than would
be after 1:k matching.

The Distance Calculation Method

Several different distance calculation methods are available in the matching procedures in NCSS. The different
methods are really variations of three common distance measures:
1. Mahalanobis Distance
2. Propensity Score Difference
3. Sum of Rank Differences
The variations arise when using calipers for matching or when using forced match variables. A caliper is defined
in this context a restricted subset of controls whose propensity score is within a specified amount (c) of the
treatment subject’s propensity score. A forced match variable contains values which must match exactly in the
treatment and control for the subjects to be considered for matching. If the values for the forced match variables
do not agree, then the distance between the two subjects is set equal to ∞ (infinity), and a match between the two
is not allowed.

Distance Measures
The complete list of possible distance measures available in NCSS is as follows:
1. Mahalanobis Distance within Propensity Score Calipers (no matches outside calipers)
2. Mahalanobis Distance within Propensity Score Calipers (matches allowed outside calipers)
3. Mahalanobis Distance including the Propensity Score (if specified)
4. Propensity Score Difference within Propensity Score Calipers (no matches outside calipers)
5. Propensity Score Difference
6. Sum of Rank Differences within Propensity Score Calipers (no matches outside calipers)
7. Sum of Rank Differences within Propensity Score Calipers (matches allowed outside calipers)
8. Sum of Rank Differences including the Propensity Score (if specified)
Distance measures #2 and #7, where matches are allowed outside calipers in caliper matching, are only available
with greedy matching. All others can be used with both the greedy and optimal matching algorithms.

123-3
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

For distance measures that involve propensity score calipers, the caliper size is determined by the user-specified
radius, c. For any treatment subject, i, the jth, control subject is included in the ith treatment caliper if
| q( x i ) − q( x j ) |≤ c

where q( x i ) = e( x i ) is the propensity score based on the covariates x i . If the logit transformation is used in the
analysis, then q( x ) = log((1 − e( x )) / e( x )) . The width of each caliper is equal to 2c.

Which Distance Measure to Use?

The best distance measure depends on the number of covariate variables, the variability within the covariate
variables, and possibly other factors. Gu and Rosenbaum (1993) compared the imbalance of Mahalanobis distance
metrics versus the propensity score difference in optimal 1:1 matching for numbers of covariates (P) between 2
and 20 and control/treatment subject ratios between 2 and 6. Mahalanobis distance within propensity score
calipers was always best or second best. When there are many covariates (P = 20), the article suggests that
matching on the propensity score difference is best. The use of Mahalanobis distance (with or without calipers) is
best when there are few covariates on which to match (P = 2). In all cases considered by Gu and Rosenbaum
(1993), the Mahalanobis distance within propensity score calipers was never the worst method of the three.
Rosenbaum and Rubin (1985a) conducted a study of the performance of three different matching methods
(Mahalanobis distance, Mahalanobis distance within propensity score calipers, and propensity score difference) in
a greedy algorithm with matches allowed outside calipers and concluded that the Mahalanobis distance within
propensity score calipers is the best technique among the three. Finally, Rosenbaum (1989) reports parenthetically
that he has had “unpleasant experiences using standard deviations to scale covariates in multivariate matching,
and [he] is inclined to think that either ranks or some more resistant measure of spread should routinely be used
instead.”
Based on these results and suggestions, we recommend using the Mahalanobis Distance within Propensity Score
Calipers as the distance calculation method where possible. The caliper radius to use is based on the amount of
bias that you want removed.

What Caliper Radius to Use?

The performance of distance metrics involving calipers depends to some extent on the caliper radius used. For
instances in the literature where we found reports, comparisons, or studies based on caliper matching, Cochran
and Rubin (1973) was nearly always mentioned as the literature used in determining the caliper radius (or “caliper
width” as they call it) for the study. The following table (Table 2.3.1 from Cochran and Rubin (1973)) can be used
to determine the appropriate coefficient and/or caliper radius to use:

Table 2.3.1 from Cochran and Rubin (1973). Percent Reduction in bias of x for caliper matching to within
± a (σ 12 + σ 22 )/2

a σ 12 /σ 22 = 1/2 σ 12 /σ 22 = 1 σ 12 /σ 22 = 2
0.2 0.99 0.99 0.98
0.4 0.96 0.95 0.93
0.6 0.91 0.89 0.86
0.8 0.86 0.82 0.77
1.0 0.79 0.74 0.69

The caliper radius to use depends on the desired bias reduction (table body), the coefficient a, and the ratio of the
treatment group sample variance of q(x ) , σ 12 , to the control group sample variance of q(x ) , σ 22 . “Loose
Matching” corresponds to a ≥ 1.0, while “Tight Matching” corresponds to a ≤ 0.2. The caliper radius is calculated
as

123-4
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

c = a (σ12 + σ 22 )/ 2 = a × SIGMA
NCSS allows you to choose the caliper radius using the syntax “a*SIGMA”, where you specify the value for a
(e.g. “0.2*SIGMA”) or by entering the actual value directly for c (e.g. “0.5”). In the case of the former, the
program calculates the variances of the treatment and control group propensity scores for you and determines the
pooled standard deviation, sigma. You may want to run descriptive statistics on the treatment and control group
propensity scores to determine the variance ratio of your data in order to find the appropriate value of a (from the
table above) for your research objectives.

Data Structure
The propensity scores and covariate variables must each be entered in individual columns in the database. Only
numeric values are allowed in propensity score and covariate variables. Blank cells or non-numeric (text) entries
are treated as missing values. If the logit transformation is used, values in the propensity score variable that are
not between zero and one are also treated as missing. A grouping variable containing two (and only two) unique
groups must be present. A data label variable is optional. The following is a subset of the Propensity dataset,
which illustrates the data format required for the greedy and optimal data matching procedures.

Propensity dataset (subset)

ID Exposure X1 … Age Race Gender Propensity
A Exposed 50 … 45 Hispanic Male 0.7418116515
B Not Exposed 4 … 71 Hispanic Male 0.01078557025
C Not Exposed 81 … 70 Caucasian Male 0.0008716385678
D Exposed 31 … 33 Hispanic Female 0.5861360724
E Not Exposed 65 … 38 Black Male 0.1174339761
F Exposed 22 … 29 Black Female 0.07538899371
G Not Exposed 36 … 57 Black Female 0.008287371892
H Not Exposed 31 … 52 Caucasian Male 0.4250166047
I Not Exposed 46 … 39 Hispanic Female 0.2630767334
J Exposed 3 … 58 Hispanic Male 0.4858799526
K Not Exposed 84 … 24 Black Female 0.1251753736

Procedure Options
This section describes the options available in both the optimal and greedy matching procedures.

Variables Tab
Specify the variables to be used in matching and storage, along with the matching options.

Data Variables

Grouping Variable
Specify the variable that contains the group assignment for each subject. The values in this variable may be text or
numeric, but only two unique values are allowed. One value should designate the treatment group and the other
should designate the control group. The value assigned to the treatment group should be entered under “Treatment
Group” to the right.

123-5
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Treatment Group
Specify the value in the Grouping Variable that is used to designate the treatment group. This value can be text or
numeric.

Propensity Score Variable

Specify the variable containing the propensity scores to be used for matching. This variable is optional if the
Distance Calculation Method is not specifically based on the propensity score. If no covariate variables are
specified, then you must specify a propensity score variable. If caliper matching is used, this variable must be
specified. Only numeric values are allowed. Text values are treated as missing values in the reports. If the logit
transformation is used, all values in this variable must be between zero and one, otherwise they are treated as
missing. Propensity scores are often obtained using logistic regression or discriminant analysis.

Use Logit
This option specifies whether or not to use the logit transformation on the propensity score. If selected, all
calculations and reports will be based on the logit propensity score (where applicable).

Forced Match Variable(s)

Specify variables for which the treatment and control values must match exactly in order to create a match. More
than one variable may be specified. This variable is optional. Variables such as gender and race are commonly
used as forced match variables.
The use of forced match variables may greatly restrict the number of possible matches for each treatment. If you
are using greedy matching, the use of this variable may result in unmatched treatment subjects. If you are using
optimal matching, the use of forced match variable(s) may result in an infeasible (unsolvable) problem for the
matching algorithm. If the optimal matching algorithm is unable to find a solution, try eliminating one or more
forced match variable(s).

Covariate Variable(s)
Specify variables to be used in distance calculations between treatment and control subjects. Only numeric values
are allowed. Text values are treated as missing. Covariate variables are optional, however, if no propensity score
variable is specified you must specify at least one covariate variable. If the distance calculation method involves
only the propensity score (e.g. propensity score difference) and one or more covariate variables are specified, then
the covariate variables are only used in group comparison reports (they are not used in matching nor to determine
whether or not a row contains missing values during matching).

Data Label Variable

The values in this variable contain text (or numbers) and are used to identify each row. This variable is optional.

Storage Variable

Store Match Numbers In

Specify a variable to store the match number assignments for each row. This variable is optional. If no storage
variable is specified, the match numbers will not be stored in the database, but matching reports can still be
generated.

Optimization Algorithm Options

Maximum Iterations
Specify the number of optimization iterations to perform before exiting. You may choose a value from the list, or
enter your own. This option is available in order to avoid an infinite loop. We have found that as the number of
Matches per Treatment increases, it takes more and more iterations in order to arrive at a solution.

123-6
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Matching Options

Distance Calculation Method

Specify the method to be used in calculating distances between treatment and control subjects. If the distance
method involves propensity score calipers, then a Propensity Score Variable must also be specified. Eight
different distance measures are available in NCSS. For the formulas that follow, we will adopt the following
notation:
1. The subscript i refers to the ith treatment subject.
2. The subscript j refers to the jth control subject.
3. d (i, j ) is the estimated distance between subjects i and j.
4. x is the vector of observed covariates used to estimate the propensity score.
5. q( x ) = e( x ) is the propensity score based on the covariates x. If the logit transformation is used in the
analysis, then q( x ) = log((1 − e( x )) / e( x )) .
6. y is the vector of observed covariates used in the distance calculation. y is not necessary equivalent to x,
although it could be.
7. u = ( y , q( x )) is the vector of observed covariates and the propensity score (or logit propensity score).
8. C is the sample covariance matrix of the matching variables (including the propensity score) from the full
set of control subjects.
9. c is the caliper radius. The width of each caliper is 2c.
10. FM i ,l and FM j ,l are the values of the lth forced match variable for subjects i and j, respectively. If no
forced match variables are specified, then FM i ,l = FM j ,l for all l .

11. Ri,p and Rj,p are the ranks of the pth covariate values or propensity score for subjects i and j, respectively.
Average ranks are used in the case of ties.
The options are:

• Mahalanobis Distance within Propensity Score Calipers (no matches outside calipers)
(u − u j )T C −1 (u i − u j ) if | q( x i ) − q( x j ) |≤ c and FM i ,l = FM j ,l for all l
d (i , j ) =  i
 ∞ otherwise

• Mahalanobis Distance within Propensity Score Calipers (matches allowed outside calipers)
(u i − u j )T C −1 (u i − u j ) if | q( x i ) − q( x j ) |≤ c and FM i ,l = FM j ,l for all l

 | q( x i ) − q( x j ) | if | q( x i ) − q( x j ) |> c for all unmatched j
d (i , j ) = 
 and FM i ,l = FM j ,l for all l
 ∞ otherwise

The absolute difference, | q( x i ) − q( x j ) | , is only used in assigning matches if there are no available controls
for which | q( x i ) − q( x j ) |≤ c .

• Mahalanobis Distance including the Propensity Score (if specified)

(u − u j )T C −1 (u i − u j ) if FM i ,l = FM j ,l for all l
d (i , j ) =  i
 ∞ otherwise

123-7
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

• Propensity Score Difference within Propensity Score Calipers (no matches outside calipers)
| q( x i ) − q( x j ) | if | q( x i ) − q( x j ) |≤ c and FM i ,l = FM j ,l for all l
d (i , j ) = 
 ∞ otherwise

• Propensity Score Difference

| q( x i ) − q( x j ) | if FM i ,l = FM j ,l for all l
d (i , j ) = 
 ∞ otherwise

• Sum of Rank Differences within Propensity Score Calipers (no matches outside calipers)

d (i , j ) =  ∑ |R
p i, p − R j, p | if | q( x i ) − q( x j ) |≤ c and FM i ,l = FM j ,l for all l
 ∞ otherwise

The absolute difference, | q( x i ) − q( x j ) | , is only used in assigning matches if there are no available controls
for which | q( x i ) − q( x j ) |≤ c .

• Sum of Rank Differences including the Propensity Score (if specified)


d (i , j ) =  ∑ |R p i, p − R j, p | if FM i ,l = FM j ,l for all l
 ∞ otherwise

In the Greedy Data Matching procedure, two distance calculation methods are available that are not in the
Optimal Data Matching procedure (option #2 and option #7). Both involve caliper matching with matches
allowed outside calipers. When matches are allowed outside calipers, the algorithm always tries to find matches
inside the calipers first, and only assigns matches outside calipers if a match was not found inside. Matches
outside calipers are created based solely on the propensity score, i.e., if matches outside calipers are allowed and
no available control subject exists that is within c propensity score units of a treatment subject, then the control
subject with the nearest propensity score is matched with the treatment. This type of matching algorithm is
described in Rosenbaum and Rubin (1985a).

Matches per Treatment

Choose the number of controls to match with each treatment. You may choose one of the values for the list or
enter an integer value of your own. For greedy matching, the value you enter can be no larger than
controls/treatments rounded up to the next highest integer. When the number of matches per treatment is greater
than one, the greedy algorithm finds the best match (if possible) for each treatment before returning and creating
the second match, third match, etc. For optimal matching, the value can be no larger than controls/treatments
rounded down to the next lowest integer. The options are:

• Variable (Full Matching) (Optimal Data Matching Only)

This option causes the optimal matching algorithm to match a variable number of controls to each treatment.
Each control is used only once, and each treatment is matched with at least one control. All eligible controls
(e.g. all controls where at least one treatment-control distance is non-infinite) are matched.

123-8
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

• Maximum Possible
This option causes the program to assign the maximum number (k) of matches that can be made between
treatments and controls. If greedy matching is used and controls/treatments is not an integer, then using this
option will result in incomplete pair-matching.

• Integer Values
If an integer value is entered or selected, then the program attempts to create the specified number of control
matches for each treatment.

Order for Matching

This option specifies the order in which subjects are entered into the matching algorithm. In the case of tied
distance values, the matches created depend on the order in which the treatment and control subjects are
considered. The options are:

• Random
Both treatment and control subjects are randomly ordered before entering into the matching algorithm. When
the number of matches per treatment is greater than one, the greedy algorithm finds the best match (if
possible) for each treatment before returning and creating the second match, third match, etc. It is likely that
match assignments will change from run-to-run when using random ordering.

• Sorted by Distance (Greedy Data Matching Only)

This option causes the program to sort the matrix of all pair-wise treatment-control distances, and assign
matches starting with the smallest distance and working toward the largest until all treatments have been
matched with the specified number of controls.

• Sorted by Row Number

Both treatment and control subjects are entered into the matching algorithms according to their location in the
database. When the number of matches per treatment is greater than one, the greedy algorithm finds the best
match (if possible) for each treatment before returning and creating the second match, third match, etc.

Caliper Radius
This option specifies the caliper radius, c, to be used in caliper matching. The caliper radius is calculated as

c = a (σ12 + σ 22 )/ 2 = a × SIGMA

where a is a user-specified coefficient, σ12 is the sample variance of q(x ) for the treatment group, and σ 22 is the
sample variance of q(x ) for the control group. NCSS allows you to enter the caliper radius using the syntax
“a*SIGMA”, where you specify the value for a (e.g. “0.2*SIGMA”) or by entering the actual value directly for c
(e.g. “0.5”). In the case of the former, the program calculates the variances of the treatment and control group
propensity scores for you. You may want to run descriptive statistics on the treatment and control group
propensity scores to determine the variance ratio of your data in order to find the appropriate value of a (from the
table above) for your research objectives.

Reports Tab
The following options control the format of the reports that are displayed.

Select Reports

Data Summary Report ... Matching Detail Report

Indicate whether to display the indicated reports.

123-9
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Incomplete Matching Report (Greedy Data Matching Only)

Indicate whether to display the incomplete matching report that lists the treatments that were not paired with the
specified number of controls.

Report Options

Variable Names
This option lets you select whether to display variable names, variable labels, or both.

Report Options – Decimals

Propensity Scores/Covariates … Standardized Differences

Specify the number of digits after the decimal point to be displayed for output values of the type indicated.

Example 1 – Optimal (1:1) Matching using the Mahalanobis

Distance within Propensity Score Calipers
This tutorial describes how to create 1:1 treatment-control matches using the Mahalanobis Distance within
Propensity Score Calipers distance metric. The data used in this example are contained in the PROPENSITY
database. The propensity scores were created using logistic regression with Exposure as the dependent variable,
X1 – Age as numeric independent variables, and Race and Gender as categorical independent variables. The
propensity score represents the probability of being exposed given the observed covariate values. The optimal
matching algorithm will always produce a complete matched-pair sample.
You may follow along here by making the appropriate entries or load the completed template Example 1 by
clicking on Open Example Template from the File menu of the Data Matching – Optimal window.

1 Open the Propensity dataset.

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file Propensity.NCSS.
• Click Open.

2 Open the Data Matching - Optimal window.

• Using the Data or Tools menu or the Procedure Navigator, find and select the Data Matching - Optimal
procedure.
• On the menus, select File, then New Template. This will fill the procedure with the default template.

3 Specify the variables.

• On the Data Matching - Optimal window, select the Variables tab.
• Enter Exposure in the Grouping Variable box.
• Enter “Exposed” (no quotes) in the Treatment Group box.
• Enter Propensity in the Propensity Score Variable box.
• Make sure Use Logit is checked.
• Enter X1-Age in the Covariate Variable(s) box.
• Enter ID in the Data Label Variable box.
• Enter C11 in the Store Match Numbers In box.
• Enter 1.5*Sigma in the Caliper Radius box.
• Leave all other options at their default values.

123-10
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

4 Specify the reports.

• On the Data Matching - Optimal window, select the Reports tab.
• Put a check mark next to Matching Detail Report. Leave all other options at their default values.

5 Run the procedure.

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

The following reports will be generated for both optimal and greedy matching with slight variations depending on
the algorithm selected.

Data Summary Report

Rows Read 30
Rows with Missing Data 0
Treatment Rows 8
Control Rows 22

----- Data Variables -----

Grouping Variable Exposure
- Treatment Group "Exposed"
- Control Group "Not Exposed"
Data Label Variable ID

----- Variables Used in Distance Calculations -----

Propensity Score Variable Logit(Propensity)
Covariate Variable 1 X1
Covariate Variable 2 X2
Covariate Variable 3 X3
Covariate Variable 4 X4
Covariate Variable 5 X5
Covariate Variable 6 X6
Covariate Variable 7 Age

----- Storage Variable -----

Match Number Storage Variable C11

This report gives a summary of the data and variables used for matching.

Matching Summary Report

Distance Calculation Method Mahalanobis Distance within Propensity Score Calipers

(no matches outside calipers)
Caliper Radius 2.63288
Order for Matching Random
Controls Matched per Treatment 1
Sum of Match Mahalanobis Distances 53.94887
Average Match Mahalanobis Distance 6.74361

Percent Percent
Group N Matched Matched Unmatched Unmatched
Exposed 8 8 100.00% 0 0.00%
Not Exposed 22 8 36.36% 14 63.64%

This report gives a summary of the matches created, as well as a summary of the matching parameters used by the
matching algorithm.

123-11
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Distance Calculation Method

This is the method used to calculate distances between treatment and control subjects.

Caliper Radius
This is the caliper radius entered or calculated by the program. This line is only displayed if caliper matching
based on propensity scores was used.

Order for Matching

This is the order used in matching as selected on the procedure window.

Controls Matched per Treatment

This is the target number of controls to match with each treatment. This value is specified on the procedure
window.

Sum of Match Mahalanobis Distances (Sum of Match Propensity Score Differences or Sum of
Match Rank Differences)
This is the sum of Mahalanobis distances, propensity score differences, or rank differences (depending on the
distance calculation method selected) for all matched-pairs.

Average Match Mahalanobis Distance (Average Match Propensity Score Difference or Average
Match Rank Differences)
This is the average Mahalanobis distances, propensity score difference, or rank difference (depending on the
distance calculation method selected) for all matched-pairs. This is calculated as the [Sum of Match Distances (or
Differences)]/[Number of Matches Formed].

Group (e.g. Exposure)

This specifies either the treatment or the control group. The title of this column is the Grouping Variable name (or
label).

N
This is the number of candidates for matching in each group, i.e. the number of subjects with non-missing values
for all matching variables in each group.

Matched (Unmatched)
This is the number of subjects that were matched (unmatched) from each group.

Percent Matched (Percent Unmatched)

This is the percent of subjects that were matched (unmatched) from each group.

Group Comparison Reports

Group Comparison Report for Variable = Logit(Propensity)
Mean Standardized
Group Type Exposure N Mean SD Difference Difference (%)
Before Matching Exposed 8 -0.18344 1.39
Not Exposed 22 2.63066 2.06 -2.81410 -160.32%

After Matching Exposed 8 -0.18344 1.39

Not Exposed 8 0.82159 1.33 -1.00503 -73.88%

123-12
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Group Comparison Report for Variable = X1

Mean Standardized
Group Type Exposure N Mean SD Difference Difference (%)
Before Matching Exposed 8 39.50000 20.96
Not Exposed 22 45.90909 26.11 -6.40909 -27.07%

After Matching Exposed 8 39.50000 20.96

Not Exposed 8 26.50000 13.60 13.00000 73.58%
.
.
.

(output reports continue for each covariate variable specified)

This report provides summary statistics by group for the data in the propensity score variable and each covariate
variable both before and after matching. Notice that the matching seemed to improve the balance of the propensity
scores (Standardized Difference dropped from –160% to
–73%) between the treatment and control groups, but worsened the balance for the covariate X1 (Standardized
Difference increased from –27% to 73.58%).

Group Type
This specifies whether the summary statistics refer to groups before or after matching.

Group (e.g. Exposure)

This specifies either the treatment or the control group. The title of this column is the grouping variable name (or
label).

N
This is the number of non-missing values in each variable by group. If there are missing values in covariates that
were not used for matching, then these numbers may be different from the total number of subjects in each group.

Mean
This is the average value for each variable by group.

SD
This is the standard deviation for each variable by group.

Mean Difference
This is the difference between the mean of the treatment group and the mean of the control group.

Standardized Difference (%)

The standardized difference can be used to measure the balance between the treatment and control groups before
and after matching. If a variable is balanced, then the standardized difference should be close to zero. The
standardized difference is the mean difference as a percentage of the average standard deviation
100( x t , p − x c , p )
Standardized Difference (%) =
( s t2, p − s 2c , p ) / 2

where x t , p and xc , p are the treatment and control group means for the pth covariate variable, respectively, and
s 2t , p and s 2c , p are the treatment and control group sample variances for the pth covariate variable, respectively.

123-13
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Matching Detail Report

Matching Detail Report
Treatment = "Exposed", Control = "Not Exposed"

------------- Treatment ------------- -------- Matched Control --------

Match Mahalanobis Logit Logit
Number Distance Row Propensity ID Row Propensity ID
1 4.32807 1 -1.05541 A 8 0.30221 H
2 5.05385 4 -0.34801 D 22 -1.28232 V
3 9.07686 6 2.50671 F 16 3.28652 P
4 3.99318 10 0.05650 J 24 1.73357 X
5 13.85904 14 -1.11718 N 28 -0.07642 BB
6 9.25961 19 -1.31100 S 27 0.85319 AA
7 5.06011 26 1.16584 Z 29 0.72590 CC
8 3.31815 30 -1.36499 DD 9 1.03004 I

This report provides a list of all matches created and important information about each match.

Match
This is the match number assigned by the program to each match and stored to the database (if a storage variable
was specified).

Mahalanobis Distance (Propensity Score |Difference| or Sum of Rank |Differences|)

This is the estimated distance between the treatment and matched control. The column title depends on the
distance calculation method selected.

Row
This is the row of the treatment or control subject in the database.

Propensity Score (or first covariate variable)

This is the value of the propensity score (or logit propensity score if ‘Use Logit’ was selected). If no propensity
score variable was used in distance calculations, then this is the value of first covariate variable specified. The title
of this column is based on the propensity score variable name (or label) or the first covariate variable name (or
label).

Data Label (e.g. ID)

This is the identification label of the row in the database. The title of this column is the data label variable name
(or label).

123-14
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Example 2 – Greedy (1:2) Matching using the Propensity Score

Difference with Forced Match Variables
Continuing with Example 1, we will now use the greedy matching algorithm to create matches while using race
and gender as a forced match variables. This will force the algorithm to find control matches for treatments where
the gender and race match exactly, i.e., a male can only be matched with a male, and a female can only be
matched with a female, etc. Please note that the optimal matching algorithm can also be used with forced match
variables, but we use the greedy matching algorithm here to display the incomplete matched-pair sample that
results.
You may follow along here by making the appropriate entries or load the completed template Example 2 by
clicking on Open Example Template from the File menu of the Data Matching – Greedy window.

1 Open the Propensity dataset.

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file Propensity.NCSS.
• Click Open.

2 Open the Data Matching - Greedy window.

• Using the Data or Tools menu or the Procedure Navigator, find and select the Data Matching - Greedy
procedure.
• On the menus, select File, then New Template. This will fill the procedure with the default template.

3 Specify the variables.

• On the Data Matching - Greedy window, select the Variables tab.
• Enter Exposure in the Grouping Variable box.
• Enter “Exposed” (no quotes) in the Treatment Group box.
• Enter Propensity in the Propensity Score Variable box.
• Make sure Use Logit is checked.
• Enter Race-Gender in the Forced Match Variable(s) box.
• Enter X1-Age in the Covariate Variable(s) box.
• Enter ID in the Data Label Variable box.
• Enter C11 in the Store Match Numbers In box.
• Choose Propensity Score Difference in the Distance Calculation Method box.
• Enter 2 in the Matches per Treatment box.
• Leave all other options at their default values.

4 Specify the reports.

• On the Data Matching - Greedy window, select the Reports tab.
• Put a check mark next to Matching Detail Report and Incomplete Matching Report. Leave all other
options at their default values.

5 Run the procedure.

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

123-15
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Greedy Data Matching Output

Data Summary Report

Rows Read 30
Rows with Missing Data 0
Treatment Rows 8
Control Rows 22

----- Data Variables -----

Grouping Variable Exposure
- Treatment Group "Exposed"
- Control Group "Not Exposed"
Data Label Variable ID

----- Variables Used in Distance Calculations -----

Propensity Score Variable Logit(Propensity)
Forced Match Variable 1 Race
Forced Match Variable 2 Gender

----- Storage Variable -----

Match Number Storage Variable C11

Matching Summary Report

Distance Calculation Method Propensity Score Difference

Order for Matching Sorted by Distance
Controls Matched per Treatment 2
Sum of Match Propensity Score Differences 14.63954
Average Match Propensity Score Difference 1.46395

Percent Percent
Exposure N Matched Matched Unmatched Unmatched
Exposed 8 6 75.00% 2 25.00%
Not Exposed 22 10 45.45% 12 54.55%

Group Comparison Report for Variable = Logit(Propensity)

Mean Standardized
Group Type Exposure N Mean SD Difference Difference (%)
Before Matching Exposed 8 -0.18344 1.39
Not Exposed 22 2.63066 2.06 -2.81410 -160.32%

After Matching Exposed 6 0.11751 1.50

Not Exposed 10 1.48046 1.55 -1.36296 -89.21%

Group Comparison Report for Variable = X1

Mean Standardized
Group Type Exposure N Mean SD Difference Difference (%)
Before Matching Exposed 8 39.50000 20.96
Not Exposed 22 45.90909 26.11 -6.40909 -27.07%

After Matching Exposed 6 33.66667 20.79

Not Exposed 10 44.90000 30.57 -11.23333 -42.97%

.
.
.

(output reports continue for each covariate variable specified)

Matching Detail Report

Treatment = "Exposed", Control = "Not Exposed"

Logit ------------- Treatment ------------- -------- Matched Control --------

Match Propensity Logit Logit
Number |Difference| Row Propensity ID Row Propensity ID
1 1.20120 4 -0.34801 D 27 0.85319 AA
1 1.37805 4 -0.34801 D 9 1.03004 I
2 0.15966 6 2.50671 F 15 2.34705 O
2 0.56240 6 2.50671 F 11 1.94431 K
3 0.66941 10 0.05650 J 29 0.72590 CC
3 4.46221 10 0.05650 J 2 4.51870 B
4 1.61321 19 -1.31100 S 8 0.30221 H
4 3.92267 19 -1.31100 S 12 2.61167 L
5 0.58806 26 1.16584 Z 23 1.75390 W
6 0.08266 30 -1.36499 DD 22 -1.28232 V

Notice that only the propensity score variable was used in distance calculations, but group comparison reports
were generated for each covariate variable specified. In the Matching Detail Report, you can see that not all
treatments were matched (incomplete matching). Finally, notice that race and gender were both used as Forced
Match variables.
If you go back to the spreadsheet and sort the data on C11 (click on Data > Sort from the NCSS Home window),
you will notice that matches were only created where the race and gender were identical for both the treatment
and control.

Incomplete Matching Report

Incomplete Matching Report
Exposure = "Exposed"

Treatment Matches Logit

Row (Target = 2) Propensity ID
1 0 -1.05541 A
14 0 -1.11718 N
26 1 1.16584 Z
30 1 -1.36499 DD

This report lists the treatments that were not paired with the target number of controls (2 in this case). Rows 1 and
14 were not paired with any controls. Rows 26 and 30 were only paired with 1 control. All other treatment rows
were paired with 2 treatments. Incomplete matching is usually due to the use of forced match variables, using
caliper matching, or setting Matches per Treatment to ‘Maximum Possible’.

Treatment Row
This is the row in the database containing the treatment subject that was not fully matched.

Matches (Target = k)
This is the number of matches that were found for each treatment. The target represents the number of Matches
per Treatment specified on the input window.

Propensity Score (or first covariate variable)

This is the value of the propensity score (or logit propensity score if ‘Use Logit’ was selected) for the
incompletely-matched treatment. If no propensity score variable was used in distance calculations, then this is the
value of first covariate variable specified. The title of this column is based on the propensity score variable name
(or label) or the first covariate variable name (or label).

Data Label (e.g. ID)

This is the identification label of the incompletely-matched row in the database. The title of this column is the
data label variable name (or label).

Example 3 – Matching on Forced Match Variables Only

Continuing with Example 2, suppose we wanted to form matches based solely on forced match variables, i.e., we
want the matches to have exactly the same values for each covariate. We could enter all of the covariates in as
forced match variables, but with a database as small as we are using, we are unlikely to find any matches. We will
use the greedy data matching procedure to illustrate how you can assign matches based on the gender and race
forced match variables only. Random ordering is used to ensure that the treatments are randomly paired with
controls (where the forced match variable values match).
In order to complete this task, you must first create a new column in the database filled with 1’s. You can do this
by clicking on the first cell in an empty column and selecting Edit > Fill from the NCSS Home window (for Fill
Value(s) enter 1, for Increment enter 0, and click OK). A column of ones has already been created for you in the
Propensity dataset. This column of ones is necessary because the matching procedure requires either a propensity
score variable or a covariate variable to run.
You may follow along here by making the appropriate entries or load the completed template Example 3 by
clicking on Open Example Template from the File menu of the Data Matching – Greedy window.

1 Open the Propensity dataset.

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file Propensity.NCSS.
• Click Open.

2 Open the Data Matching - Greedy window.

3 Specify the variables.

• On the Data Matching - Greedy window, select the Variables tab.
• Enter Exposure in the Grouping Variable box.
• Enter “Exposed” (no quotes) in the Treatment Group box.
• Enter Ones (or the name of your variable containing all 1’s) in the Propensity Score Variable box.
• Make sure Use Logit is unchecked.
• Enter Race-Gender in the Forced Match Variable(s) box.
• Enter ID in the Data Label Variable box.
• Enter C11 in the Store Match Numbers In box.
• Choose Propensity Score Difference in the Distance Calculation Method box.
• Enter 2 in the Matches per Treatment box.
• Choose Random in the Order for Matching box.
• Leave all other options at their default values.

4 Specify the reports.

• On the Data Matching - Greedy window, select the Reports tab.
• Put a check mark next to Matching Detail Report and Incomplete Matching Report. Leave all other
options at their default values.

5 Run the procedure.

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

Matching Reports
Matching Detail Report
Treatment = "Exposed", Control = "Not Exposed"

Logit ------------- Treatment ------------- -------- Matched Control --------

Match Propensity Logit Logit
Number |Difference| Row Propensity ID Row Propensity ID
1 0.00000 1 1.00000 A 29 1.00000 CC
2 0.00000 4 1.00000 D 13 1.00000 M
2 0.00000 4 1.00000 D 20 1.00000 T
3 0.00000 6 1.00000 F 15 1.00000 O
3 0.00000 6 1.00000 F 11 1.00000 K
4 0.00000 10 1.00000 J 2 1.00000 B
5 0.00000 19 1.00000 S 12 1.00000 L
5 0.00000 19 1.00000 S 8 1.00000 H
6 0.00000 26 1.00000 Z 23 1.00000 W
7 0.00000 30 1.00000 DD 22 1.00000 V

Incomplete Matching Report

Exposure = "Exposed"

Treatment Matches Logit

Row (Target = 2) Propensity ID
1 1 1.00000 A
10 1 1.00000 J
14 0 1.00000 N
26 1 1.00000 Z
30 1 1.00000 DD

The matching detail report is not very informative because all of the propensity scores are equal to 1. If you run
the procedure several times, you will notice that the controls are randomly pairing with the treatments when the
race and gender are the same. Your report may be slightly different from this report because random ordering was
used. If you sort on C11, you will see that all matched pairs have the same value for race and gender.

Example 4 – Validation of the Optimal Data Matching Algorithm

using Rosenbaum (1989)
Rosenbaum (1989) provides an example of both optimal and greedy matching using a well-known dataset from
Cox and Snell (1981), which involves 26 U.S. light water nuclear power plants (six “partial turnkey” plants are
excluded in the analysis). Seven of the plants were constructed on sites where a light water reactor had existed
previously; these are the treatments. The 19 remaining plants serve as the controls. The sum of rank differences
was used to calculate distances between treatment and control plants. Two covariate variables were used in the
analysis: the date the construction permit was issued (Date), and the capacity of the plant (Capacity). Site was
used as the grouping variable with “Existing” as the treatment group. Rosenbaum (1989) reports the following
optimal pairings by plant number (treatment, control):
(3,2), (3,21), (5,4), (5,7), (9,7), (9,10), (18,8), (18,13), (20,14), (20,15), (22,17), (22,26), (24,23), (24,25)
The data used in this example are contained in the CoxSnell dataset.
You may follow along here by making the appropriate entries or load the completed template Example 4 by
clicking on Open Example Template from the File menu of the Data Matching – Optimal window.

1 Open the CoxSnell dataset.

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file CoxSnell.NCSS.
• Click Open.

2 Open the Data Matching - Optimal window.

3 Specify the variables.

• On the Data Matching - Optimal window, select the Variables tab.
• Enter Site in the Grouping Variable box.
• Enter “Existing” (no quotes) in the Treatment Group box.
• Make sure nothing is entered in the Propensity Score Variable box.
• Enter Date-Capacity in the Covariate Variable(s) box.
• Enter Plant in the Data Label Variable box.
• Choose Sum of Rank Differences including the Propensity Score (if specified) in the Distance
Calculation Method box.
• Enter 2 in the Matches per Treatment box.
• Leave all other options at their default values.

4 Specify the reports.

• On the Data Matching - Optimal window, select the Reports tab.
• Put a check mark next to Matching Detail Report. Leave all other options at their default values.

5 Run the procedure.

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

Matching Reports
Matching Summary Report

Distance Calculation Method Sum of Rank Differences including the Propensity Score
Order for Matching Random
Controls Matched per Treatment 2
Sum of Match Rank Differences 74.00000
Average Match Rank Difference 5.28571

Percent Percent
Site N Matched Matched Unmatched Unmatched
Existing 7 7 100.00% 0 0.00%
New 19 14 73.68% 5 26.32%

Matching Detail Report

Treatment = "Existing", Control = "New"

------------- Treatment ------------- -------- Matched Control --------

Match Sum of Rank
Number |Differences| Row Date Plant Row Date Plant
1 18.50000 1 2.33000 3 23 3.75000 21
1 0.00000 1 2.33000 3 9 2.33000 2
2 0.00000 2 3.00000 5 10 3.00000 4
2 10.50000 2 3.00000 5 12 3.17000 7
3 5.50000 3 3.42000 9 20 3.42000 16
3 5.50000 3 3.42000 9 14 3.33000 10
4 0.00000 4 3.42000 18 17 3.42000 13
4 2.50000 4 3.42000 18 13 3.42000 8
5 0.00000 5 3.92000 20 18 3.92000 14
5 2.50000 5 3.92000 20 19 3.92000 15
6 12.00000 6 5.92000 22 26 6.08000 26
6 5.00000 6 5.92000 22 21 4.50000 17
7 4.00000 7 5.08000 24 24 4.67000 23
7 8.00000 7 5.08000 24 25 5.42000 25

The optimal match-pairings found by NCSS match those in Rosenbaum (1989) exactly. Notice, however, that the
distances (Sum of Rank |Differences|) are slightly different in some instances from those given in Table 1 of the
article. This is due to the fact that Rosenbaum (1989) rounds all non-integer distances in their reports. This
rounding also affects the overall sum of match rank differences; NCSS calculates the overall sum as 74, while
Rosenbaum (1989) calculates the overall sum as 71, with the difference due to rounding.
123-21
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Data Matching – Optimal and Greedy

Example 5 – Validation of the Greedy Data Matching Algorithm

using Rosenbaum (1989)
Continuing with Example 4, Rosenbaum (1989) also reports the results from the greedy matching algorithm,
where the order for matching is sorted by distance. The article reports the following greedy pairings by plant
number (treatment, control):
(3,2), (3,19), (5,4), (5,21), (9,10), (9,7), (18,8), (18,13), (20,14), (20,15), (22,17), (22,26), (24,23), (24,25)
You may follow along here by making the appropriate entries or load the completed template Example 5 by
clicking on Open Example Template from the File menu of the Data Matching – Greedy window.

1 Open the CoxSnell dataset.

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file CoxSnell.NCSS.
• Click Open.

2 Open the Data Matching - Greedy window.

3 Specify the variables.

• On the Data Matching - Greedy window, select the Variables tab.
• Enter Site in the Grouping Variable box.
• Enter “Existing” (no quotes) in the Treatment Group box.
• Make sure nothing is entered in the Propensity Score Variable box.
• Enter Date-Capacity in the Covariate Variable(s) box.
• Enter Plant in the Data Label Variable box.
• Choose Sum of Rank Differences including the Propensity Score (if specified) in the Distance
Calculation Method box.
• Enter 2 in the Matches per Treatment box.
• Leave all other options at their default values.

4 Specify the reports.

• On the Data Matching - Greedy window, select the Reports tab.
• Put a check mark next to Matching Detail Report. Leave all other options at their default values.

5 Run the procedure.

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

Output
Matching Summary Report

Distance Calculation Method Sum of Rank Differences including the Propensity Score
Order for Matching Sorted by Distance
Controls Matched per Treatment 2
Sum of Match Rank Differences 80.00000
Average Match Rank Difference 5.71429

Percent Percent
Site N Matched Matched Unmatched Unmatched
Existing 7 7 100.00% 0 0.00%
New 19 14 73.68% 5 26.32%

Matching Detail Report

Treatment = "Existing", Control = "New"

------------- Treatment ------------- -------- Matched Control --------

Match Sum of Rank
Number |Differences| Row Date Plant Row Date Plant
1 0.00000 1 2.33000 3 9 2.33000 2
1 21.00000 1 2.33000 3 22 4.17000 19
2 0.00000 2 3.00000 5 10 3.00000 4
2 15.50000 2 3.00000 5 23 3.75000 21
3 4.00000 3 3.42000 9 12 3.17000 7
3 5.50000 3 3.42000 9 14 3.33000 10
4 0.00000 4 3.42000 18 17 3.42000 13
4 2.50000 4 3.42000 18 13 3.42000 8
5 0.00000 5 3.92000 20 18 3.92000 14
5 2.50000 5 3.92000 20 19 3.92000 15
6 5.00000 6 5.92000 22 21 4.50000 17
6 12.00000 6 5.92000 22 26 6.08000 26
7 4.00000 7 5.08000 24 24 4.67000 23
7 8.00000 7 5.08000 24 25 5.42000 25

The greedy match-pairings found by NCSS match those in Rosenbaum (1989) exactly. Again, some of the
distances are different from those in Table 1 of the article because of rounding. NCSS calculates the overall sum
of rank differences as 80, while Rosenbaum (1989) calculates the overall sum as 79 with the difference due to
rounding.

Macbeth (An Undoing) Rehearsal Draft Dec 2023
No ratings yet
Macbeth (An Undoing) Rehearsal Draft Dec 2023
81 pages
Lesson Plan in Science 7: San Fernandinonational High School
No ratings yet
Lesson Plan in Science 7: San Fernandinonational High School
4 pages
Paper 2016
No ratings yet
Paper 2016
23 pages
Cheat Sheet PSM
No ratings yet
Cheat Sheet PSM
3 pages
Journal of Statistical Software
No ratings yet
Journal of Statistical Software
52 pages
NIH Public Access: Author Manuscript
No ratings yet
NIH Public Access: Author Manuscript
29 pages
Chapter 11 Quasiexperimental Designs
No ratings yet
Chapter 11 Quasiexperimental Designs
22 pages
Matching Method (PSM) - Mbarara. Toko
No ratings yet
Matching Method (PSM) - Mbarara. Toko
28 pages
Propensity Score Matching
No ratings yet
Propensity Score Matching
14 pages
Matching To Remove Bias in Observational Studies
No ratings yet
Matching To Remove Bias in Observational Studies
26 pages
Wacholder III
No ratings yet
Wacholder III
9 pages
EH426 AT3 2024 Matching
No ratings yet
EH426 AT3 2024 Matching
31 pages
PSM Article
No ratings yet
PSM Article
7 pages
Note JR I
No ratings yet
Note JR I
6 pages
Propensity Score Modelling
100% (2)
Propensity Score Modelling
59 pages
Matching Estimator
No ratings yet
Matching Estimator
38 pages
Propensity Score Matching With SPSS
No ratings yet
Propensity Score Matching With SPSS
30 pages
FLAME: A Fast Large-Scale Almost Matching Exactly Approach To Causal Inference
No ratings yet
FLAME: A Fast Large-Scale Almost Matching Exactly Approach To Causal Inference
23 pages
Introduction To Propensity Score Analysis
No ratings yet
Introduction To Propensity Score Analysis
41 pages
The Central Role of The Propensity Score in Observational Studies For Causal Effects
No ratings yet
The Central Role of The Propensity Score in Observational Studies For Causal Effects
15 pages
Simple and Bias-Corrected Matching Estimators For Average Treatment Effects
No ratings yet
Simple and Bias-Corrected Matching Estimators For Average Treatment Effects
57 pages
Cem: Coarsened Exact Matching in Stata: 9, Number 4, Pp. 524-546
No ratings yet
Cem: Coarsened Exact Matching in Stata: 9, Number 4, Pp. 524-546
23 pages
2018 Mansournia Case Control
No ratings yet
2018 Mansournia Case Control
23 pages
PSM Article
No ratings yet
PSM Article
7 pages
Germany17 Jann
No ratings yet
Germany17 Jann
84 pages
Propensity Score Matching: A Primer For Educational Researchers
No ratings yet
Propensity Score Matching: A Primer For Educational Researchers
59 pages
Scoring Match
No ratings yet
Scoring Match
32 pages
Slides QSMII Chapter 3
No ratings yet
Slides QSMII Chapter 3
12 pages
Propensity Score Analysis Fundamentals and Developments High-Resolution PDF Download
100% (12)
Propensity Score Analysis Fundamentals and Developments High-Resolution PDF Download
15 pages
Performing A 1 N Case-Control Match On Propensity Score
No ratings yet
Performing A 1 N Case-Control Match On Propensity Score
11 pages
A Tutorial and Case Study in Propensity Score Analysis-An Application To Education Research
No ratings yet
A Tutorial and Case Study in Propensity Score Analysis-An Application To Education Research
13 pages
1 s2.0 S109830151063152X Main
No ratings yet
1 s2.0 S109830151063152X Main
9 pages
Chapter 3
No ratings yet
Chapter 3
12 pages
An Introduction To Implementing Propensity Score Matching With SAS®
No ratings yet
An Introduction To Implementing Propensity Score Matching With SAS®
12 pages
10 1002@sici1097-02581998101517@192265@@aid-Sim9183 0 Co2-B
No ratings yet
10 1002@sici1097-02581998101517@192265@@aid-Sim9183 0 Co2-B
17 pages
General Summary: Chapter 12. Experimental Design: One-Way Correlated Samples Design
No ratings yet
General Summary: Chapter 12. Experimental Design: One-Way Correlated Samples Design
20 pages
Evaluation Designs: Experimental (Randomized)
No ratings yet
Evaluation Designs: Experimental (Randomized)
4 pages
Prop Scores
No ratings yet
Prop Scores
77 pages
Implementing Propensity Score Matching Estimators With STATA
100% (1)
Implementing Propensity Score Matching Estimators With STATA
15 pages
How To Choose The Right Statistical Test
No ratings yet
How To Choose The Right Statistical Test
3 pages
Statistics in Medicine - 2023 - Ségalas - Propensity Score Matching After Multiple Imputation When A Confounder Has Missing
No ratings yet
Statistics in Medicine - 2023 - Ségalas - Propensity Score Matching After Multiple Imputation When A Confounder Has Missing
14 pages
Measure of Association and Risk of Diseases-1
No ratings yet
Measure of Association and Risk of Diseases-1
23 pages
PSMatching
No ratings yet
PSMatching
55 pages
Propensity Score Matching
100% (1)
Propensity Score Matching
41 pages
PSM1
No ratings yet
PSM1
39 pages
Strategy Matching
No ratings yet
Strategy Matching
3 pages
Causal Obs
No ratings yet
Causal Obs
35 pages
Some Practical Guidance For The Implementation of Propensity Score Matching
No ratings yet
Some Practical Guidance For The Implementation of Propensity Score Matching
33 pages
Ver Invariant and Metric Free Proximities For Data Jss.v025.i11
No ratings yet
Ver Invariant and Metric Free Proximities For Data Jss.v025.i11
22 pages
Comparing - Experimental - and - Matching - Methods - Using A Large-Scale Voter Mobilization Experiment
No ratings yet
Comparing - Experimental - and - Matching - Methods - Using A Large-Scale Voter Mobilization Experiment
26 pages
OLS and Matching
No ratings yet
OLS and Matching
20 pages
Simple and Bias-Corrected Matching Estimators For Average Treatment Effects
No ratings yet
Simple and Bias-Corrected Matching Estimators For Average Treatment Effects
52 pages
Glossary Research Methdology
No ratings yet
Glossary Research Methdology
43 pages
AP MR Glossary
No ratings yet
AP MR Glossary
5 pages
10 3 Way Propensity Matching
No ratings yet
10 3 Way Propensity Matching
9 pages
Probability Versus Non Probability Sampling
No ratings yet
Probability Versus Non Probability Sampling
5 pages
Becker Ichino Pscore SJ 2002
No ratings yet
Becker Ichino Pscore SJ 2002
20 pages
S 99a E - 10 18 PDF
No ratings yet
S 99a E - 10 18 PDF
3 pages
Midterm Practice Test
No ratings yet
Midterm Practice Test
44 pages
Orems Self Care Deficit Theory
No ratings yet
Orems Self Care Deficit Theory
22 pages
Lesson 11: The Computer As The Teacher's Tool
No ratings yet
Lesson 11: The Computer As The Teacher's Tool
19 pages
Diane Arbus
No ratings yet
Diane Arbus
1 page
LESSON 6 Physiological Variables That Influence Laboratory
No ratings yet
LESSON 6 Physiological Variables That Influence Laboratory
5 pages
Breast Cancer Diagnosed During Pregnancy
No ratings yet
Breast Cancer Diagnosed During Pregnancy
3 pages
Mock Memo - Manisha Aswal
No ratings yet
Mock Memo - Manisha Aswal
9 pages
SAT Passage-Based Reading Process: Like A Machine
No ratings yet
SAT Passage-Based Reading Process: Like A Machine
3 pages
Nghe Chép Chính T - Advanced
No ratings yet
Nghe Chép Chính T - Advanced
72 pages
Phrasal Verb
No ratings yet
Phrasal Verb
6 pages
Power System Analysis
No ratings yet
Power System Analysis
40 pages
Environmental Uncertainty - The Construct and Its Application
No ratings yet
Environmental Uncertainty - The Construct and Its Application
18 pages
An Intelligent Student Advising System: A Spanning Tree Approach
No ratings yet
An Intelligent Student Advising System: A Spanning Tree Approach
30 pages
Good Luck and Congratulations
No ratings yet
Good Luck and Congratulations
4 pages
Edexcel GCSE 2022 Maths Topic Checklist - Higher
No ratings yet
Edexcel GCSE 2022 Maths Topic Checklist - Higher
3 pages
Unit 3: Evaluating and Selecting Literature For Children
No ratings yet
Unit 3: Evaluating and Selecting Literature For Children
44 pages
Bronx Masquerade Selection Test
No ratings yet
Bronx Masquerade Selection Test
5 pages
Essential Words at Home Binisaya
No ratings yet
Essential Words at Home Binisaya
8 pages
Professional Salesmanship Chap01
No ratings yet
Professional Salesmanship Chap01
34 pages
GST 222 Non Military Means of Conflict Resolution
No ratings yet
GST 222 Non Military Means of Conflict Resolution
22 pages
2020-05-21 Intervenor Motion To Submitt Supp Authority
No ratings yet
2020-05-21 Intervenor Motion To Submitt Supp Authority
4 pages
Chengalpattu
No ratings yet
Chengalpattu
14 pages
Final Research Questionnaire
No ratings yet
Final Research Questionnaire
7 pages
Universal Script ? RichHub V1 - Roblox Scripts ScriptBlox
No ratings yet
Universal Script ? RichHub V1 - Roblox Scripts ScriptBlox
1 page
Offer Letter
No ratings yet
Offer Letter
3 pages
Skala BiK 2008
No ratings yet
Skala BiK 2008
19 pages
Snakes - Identification of Poisonous & Non-Poisonous Snakes, Poison Apparatus, Venom & Its Effect
No ratings yet
Snakes - Identification of Poisonous & Non-Poisonous Snakes, Poison Apparatus, Venom & Its Effect
6 pages