A Deep Learning Approach For Linear Complementarity Problems
A Deep Learning Approach For Linear Complementarity Problems
Wissam AlAli
min c⊤ x − r⊤ y subject to M x ≥ r, M ⊤ y ≤ c, x ≤ y, x, y ≥ 0
preserves a primal-dual structure crucial for capturing the complementary constraints x⊤ (M x−r) =
0.
Recall the bilinear relaxation:
min c⊤ x − r⊤ y,
x, y, c
subject to M x ≥ r,
M ⊤ y ≤ c, (3.8)
x ≤ y,
x, y ≥ 0,
where M ∈ Rn×n and r ∈ Rn . Invoking Theorem 3.1, a particularly effective choice is setting
c = M x. Substituting c = M x into (3.8) yields
min x⊤ M x − r⊤ y,
x, y
subject to M ⊤ y ≤ M ⊤ x,
x ≤ y, (3.17)
M x ≥ r,
x, y ≥ 0.
1
The resulting objective
x⊤ M x − r ⊤ y
can be decomposed into a difference of convex functions, making (3.17) suitable for algorithms
grounded in DC programming [6, 7, 8, 9].
One might wonder why we do not simply decompose the following problem and apply DC
algorithms from the literature, such as those in [10], as a stand-alone approach:
min x⊤ M x − r⊤ x subject to M x ≥ r, x ≥ 0.
While this simpler model reflects the LCP constraints and its global solution is a feasible solution
to the LCP (of course if the LCP is feasible) [10], it does not incorporate any dual-like component.
By contrast, the model (3.17) explicitly introduces y via
M ⊤ y ≤ M ⊤ x, x ≤ y,
which encodes dual-type constraints aligned with the universal relaxation theory. Consequently,
(x, y) that solves (3.17) is far more likely to satisfy the complementarity condition x⊤ (M x − r) = 0.
Thus, incorporating y enforces a primal-dual perspective, leading to more robust identification of
genuinely feasible LCP solutions.
To illustrate that relying solely on a primal formulation—like the one proposed in [10]—can
still yield infeasible LCP solutions in practice, consider the matrix
4 2 0 3 1
−1 4 −3 −6 −2
M = 1 −1 1
and r = .
1 1
0 1 0 5 1
Decomposing the objective function and applying a DC algorithm, the method can converge to a
spurious solution such as
0.53
0.33
x = 0.66 ,
0.13
which fails to satisfy x⊤ (M x − r) = 0. Indeed, the sequence may stabilize at this point even though
it is not LCP-feasible. By comparison, our full model (3.17) maintains the dual-style constraints
x ≤ y, M ⊤ y ≤ M ⊤ x, substantially reducing the risk of converging to such infeasible points.
Before discussing the DCA-based algorithms for solving our LCP formulation, we provide a
concise overview of DC programming and DCA. For a comprehensive treatment of this topic, we
refer the reader to [6, 7, 8, 9] and the references therein.
2
Subdifferentials. For a proper, l.s.c. convex function θ on Rp , its subdifferential at x ∈ dom(θ)
is n o
∂θ(x) := y ∈ Rp θ(z) ≥ θ(x) + ⟨z − x, y⟩ for all z ∈ Rp .
If θ is differentiable at x, then ∂θ(x) = {∇θ(x)}. A necessary local optimality condition for the
DC program min{ g(x) − h(x)} is
∂h(x∗ ) ⊂ ∂g(x∗ ),
and in many important classes of DC programs (e.g. polyhedral ones), this condition also suffices
for local optimality [7, 8, 9].
where g ∗ and h∗ are the convex conjugates of g and h, respectively. Under mild conditions, strong
duality holds, i.e. αD = α. This symmetry between primal and dual DC programs is central to the
theoretical underpinnings of DCA.
The DC Algorithm (DCA) exploits the structure f = g − h by successively linearizing −h,
leading to a series of convex subproblems in g. A typical iteration of DCA is as follows [10]:
DCA is a descent method without explicit line searches, ensuring that the sequence {g(x(k) ) −
h(x(k) )}
is nonincreasing. In practice, DCA often converges linearly for polyhedral DC programs
and exhibits strong empirical performance on a wide range of nonconvex applications [6, 7, 8, 9].
A representative decomposition is to split x⊤ M x into positive and negative parts with respect to
M . Specifically, if we denote M + = max(M, 0) and M − = max(−M, 0) elementwise, then
x⊤ M x = x⊤ (M + − M − ) x = x⊤ M + x − x⊤ M − x.
3
Thus we may set
f1 (x, y) = x⊤ M + x, f2 (x, y) = x⊤ M − x + r⊤ y,
and write
F (x, y) = f1 (x, y) − f2 (x, y).
Each DCA iteration then involves:
1. Calculating a subgradient (or gradient, if differentiable) of f2 at the current iterate x(k) , y (k) .
2. Minimizing the
(k+1) (k+1)
convex function f1 − ⟨∇f2 , (·)⟩ subject to the linear constraints to update
x ,y .
Proposition. Let C be the set defined in (2). The following statements are equivalent:
(k) C(k)
where is defined by the linear constraints M ⊤ y ≤ M ⊤ x, M x ≥ r, x ≤ y, x, y ≥ 0. Let
(x , y ) be the sequence generated by our DCA scheme, where F is decomposed as F = f1 −f2
with f1 (x, y) = x⊤ M + x and f2 (x, y) = x⊤ M − x + r⊤ y. Then the following properties hold:
4
(ii) Convergence to a Critical Point. If the optimal value of the above problem is finite, then
∗ ∗ (k) (k)
any limit point x , y of the sequence (x , y ) satisfies the necessary local optimality
condition
∂ x⊤ M − x + r ⊤ y ⊤ +
∗ ∗
⊂ ∂ x M x ∗ ∗
.
(x ,y ) (x ,y )
5
2 Computational Results
We tested both our DC-based approach (Section 1) and the Rectified Convex Relaxation (ReCR)
method (Section 3.3.1) on various linear complementarity problem (LCP) instances [4]. All im-
plementations were done in Matlab, using CVX interfaced with the Gurobi solver. We employed a
uniform tolerance of ε = 10−5 .
LCP 7 [1]. A family of matrices of various dimensions is discussed in [1], with r = −e.
6
LCP Dim DC-based ReCR Obj.
iteration CPU(s) iteration CPU(s) Value
0 2 1 0.008 1 0.005 0
1 3 2 0.007 2 0.07 0
2 4 1 0.009 2 1 0
3 4 1 0.009 4 1.3 0
4 3 1 0.005 1 0.004 0
5 3 1 0.003 1 1 0
7 300 1 0.005 1 1 0
7 500 1 0.005 2 2 0
7 1000 1 0.005 1 1 0
8 300 1 0.005 1 1 0
8 500 1 2.0000 2 3 0
8 1000 1 4.0000 3 4 0
10 300 1 0.0800 1 1 0
10 500 1 3.0000 2 8 0
10 1000 1 5.0000 3 10 0
Average 1.067 0.9390 1.8 2.29 0
Table 1: Comparison of Rectified DC-based method vs. ReCR across various LCP instances. “it-
eration” denotes the main iteration count; “CPU(s)” is the run time in seconds; “Obj. Value” is
the final objective.
LCP 10 [5]. Similarly, LCP 10 in [5] uses A10 = diag(1/n, 2/n, . . . , 1) and r = e.
From Table 1, both methods successfully achieve an objective of zero in all tested instances,
thereby recovering feasible solutions to the LCP. In terms of iteration counts, the DC-based solver
exhibits a smaller average value (roughly 1.067) compared with ReCR (about 1.8). The CPU times
also tend to favor the DC approach, with an average of 0.94 seconds, whereas ReCR reports an
average of 2.29 seconds. Notably, this faster convergence of the DC method is consistent with its
ability to directly incorporate both primal and dual-like constraints, a factor that often leads to
stronger descent properties in the underlying difference-of-convex structure.
From an algorithmic deployment perspective, both approaches are viable and robust. The ReCR
framework has the advantage of a straightforward linear subproblem at each step and a well-defined
rectification step; the DC-based approach, on the other hand, leverages a decomposition that may
yield fewer overall iterations and shorter runtime in practice, given the same solver environment.
The choice between the two methods may thus hinge on the problem size, solver availability, or
the need for guaranteed theoretical behaviors. In settings where time-to-solution is critical, these
numerical results highlight the potential benefit of the DC-based model, particularly for large-scale
LCPs.
First Study (General Random LCPs, n = 50). In this study, we generated 100 LCPs of
dimension n = 50, without imposing any conditions such as positive (semi)definiteness. The entries
7
of the matrix M were sampled uniformly from [−1, 1], and the vector r was drawn from a standard
normal distribution. As expected, many of these instances were not feasible. Tables for ReCR and
DC are combined in Table 2.
ReCR DC
Status Count Iter CPU(s) Count Iter CPU(s)
Feasible 40 6.54 18.25 44 3.46 7.56
Infeasible 60 N/A N/A 56 N/A N/A
Table 2: Comparison of ReCR and DC for 100 random LCPs with n = 50. “Status” refers to
feasibility as determined by each method. “Iter” and “CPU(s)” represent the average iteration
count and CPU time, respectively, over the instances in that status category.
From Table 2, 40 out of 100 generated LCPs were found to be feasible under ReCR, while 44
were deemed feasible under the DC approach. This slight discrepancy indicates that each method
may interpret borderline feasibility differently or terminate under different internal criteria. For
the feasible subset, ReCR required about 6.54 iterations on average (18.25 seconds), whereas DC
converged in 3.46 iterations (7.56 seconds). This reflects a shorter CPU time for DC on the feasible
cases, suggesting that its difference-of-convex framework can home in on solutions efficiently.
Dim ReCR DC
Iter CPU(s) Iter CPU(s)
5 6.80 5.56 2.0 2.10
10 6.15 7.26 2.5 3.60
20 6.20 16.84 4.5 6.30
50 6.20 50.60 6.4 24.34
100 12.60 110.60 10.2 99.56
200 50.70 300.20 45.3 279.34
500 60.25 680.20 57.2 620.56
1000 75.60 4900.15 65.3 4500.45
Table 3: Comparison of ReCR and DC for larger-dimensional randomly scaled LCPs. Values
represent averages over 50 runs for each dimension.
Table 3 clearly shows that as the dimensionality grows, both algorithms face an increase in av-
erage iteration counts and CPU times. Nonetheless, DC consistently maintains slightly lower (or
comparable) iteration counts than ReCR, particularly in moderate dimensions. For very large
systems (e.g. n = 1000), both methods become computationally demanding, exceeding thousands
of seconds on average. However, DC’s iteration counts remain below ReCR’s in every tested di-
mension, reflecting the algorithm’s ability to leverage difference-of-convex decompositions even in
high-dimensional settings.
8
From a practical standpoint, these outcomes underscore how both methods can solve large-scale
LCPs but at a nontrivial computational cost when the matrix is indefinite and grows in size. The
DC framework, by embedding a dual-like perspective and efficiently handling the non-convex term
x⊤ M x, tends to converge in fewer iterations. ReCR, while globally convergent under boundedness
assumptions, may exhibit longer runtimes or higher iteration counts, especially when the scaling
matrix D significantly alters the conditioning of M .
9
References
[1] Ahn, B.-H.: Iterative methods for linear complementarity problems with upper bounds on
primary variables. Math. Program. 26, 295–315 (1983)
[2] Chen, X., Ye, Y.: On smoothing methods for the P0 matrix linear complementarity problem.
SIAM J. Optim. 11(2), 341–363 (2000)
[3] Fernandes, L., Friedlander, A., Guedes, M.C., Judice, J.: Solution of a general linear com-
plementarity problem using smooth optimization and its application to bilinear programming
and LCP. Appl. Math. Optim. 43, 1–19 (2001)
[4] Floudas, C.A., et al.: Handbook of Test Problems in Local and Global Optimization. Nonconvex
Optimization and Its Applications, vol. 33. Kluwer Academic, Dordrecht (1999)
[5] Geiger, C., Kanzow, C.: On the resolution of monotone complementarity problems. Comput.
Optim. Appl. 5(2), 155–173 (1996)
[6] Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic
problems by DC algorithms. J. Glob. Optim. 11(3), 253–285 (1997)
[7] Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and
DCA revisited with DC models of real-world nonconvex optimization problems. Ann. Oper.
Res. 133, 23–46 (2005)
[8] Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory,
algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
[9] Pham Dinh, T., Le Thi, H.A.: D.C. optimization algorithms for solving the trust region
subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
[10] H. A. Le Thi and T. Pham Dinh. On solving linear complementarity problems by DC pro-
gramming and DCA. Computational Optimization and Applications, 50:507–524, 2011.
10