STAT 603 – ACTIVE CONSTRAINT METHODS II
1. The set-up
We continue our saga by considering the least-squares problem with linear inequality constraints
(LCLS)
minimize
1
2
k A x − b k2
subject to C x 6 d (component wise) ,
where A ∈ Rm×n , b ∈ Rm , C ∈ R`×n and d ∈ Rk are given and x ∈ Rn .
We shall assume that the columns of A are linearly independent, and
likewise for the rows of C.
The active constraint method discussed here is an “independent” version. Lawson and Hanson (1974) explain how (LCLS) can be transformed into (NNLS).
The algorithm is iterative in nature, but at the same time terminates
in a finite number of steps. The starting point is an initial guess, x1 , in
the algebraic interior of the constraints, i.e.,
C x1 < d .
(1)
as well as an initial guess as to which constraints are active, based on
the current guess for the solution. Thus, P1 = { } is the (empty) set of
indices j for which x1j = 0. (Finding an x1 such that (1) holds, possibly
with 6 instead of strict inequalities, is called a feasibility problem.)
Both the guess for the solution and the guess for the active constraints
will change in the course of the computation.
Now, we take our first steps. Compute z 2 as the solution to
(2)
minimize
1
2
k A x − b k2
subject to [ C x ]j = dj
for all j ∈ P1 .
(In other words, without any constraints.)
1
2
STAT 603 – ACTIVE CONSTRAINT METHODS II
Now, there are two possibilities : either z 2 satisfies C z 2 6 d or it does
not. If it does, then x = z 2 in fact solves (LCLS). Since the gradient
vanishes, we make take the Lagrange multipliers for (LCLS) to be equal
to zero.
If z 2 does not satisfy C z 2 6 d, then determine t ∈ ( 0 , 1 ) such that
def
x2 = x1 + t ( z 2 − x1 )
(3)
satisfies C x2 6 d, and for at least one index j, we have [ C x2 ]j = dj .
Thus, x2 is obtained by moving from x1 to z 2 until one hits the boundary of the constraint set. Note that it follows from general convexity
considerations that
(4)
1
2
k A x2 − b k2 <
1
2
k A x1 − b k2 ,
so that the objective function in (LCLS) is strictly decreasing. (Actually,
the very first time this may fail, if x1 is the solution of (LCLS). This seems
unlikely.)
(5) Computational details. Fix j and consider
(6)
[ C x2 ]j = [ C x1 ]j + t ( [ C z 2 ]j − [ C x1 ]j ) .
If [ C z 2 ]j 6 [ C x1 ]j , then, shirley, [ C x2 ]j 6 [ C x1 ]j 6 dj for all
t > 0, so this produces no constraints on t . If, on the other hand,
[ C z 2 ]j > [ C x1 ]j , then solving the equation [ C x2 ]j = dj for t yields
(7)
dj − [ C x1 ]j
.
tj =
[ C z 2 ]j − [ C x1 ]j
Note that tj > 0. Doing this for all components, we find that
(8)
t∗ = min tj : [ C z 2 ]j > [ C x1 ]j
is the value of t we are after. Since [ C z 2 ]j > dj for at least one index
j, then t∗ < 1. Of course, since the relevant tj are strictly positive, then
so is t∗ . We then have x2 = x1 + t∗ ( z 2 − x1 ) .
Finally, update the active constraint sets,
(9)
P2 = j : [ C x2j ]j = dj .
STAT 603 – ACTIVE CONSTRAINT METHODS II
3
2. The general step of the algorithm
We now describe the general step of the algorithm. Thus, we have a
guess xk satisfying C xk 6 d , and the guess for the active constraints
(10)
Pk = j : [ C xk ]j = dj .
It is notationally useful to associate with the index set Pk a matrix Pk
such that Pk C xk = Pk d incorporates all of the active constraints, i.e., Pk
consists of rows of the ` × ` identitymatrix
with row numbers belonging
to Pk . By way of example, if Pk = 1 , 3 , then
1 0 0 0 ··· 0
Pk =
∈ R2×` .
0 0 1 0 ··· 0
Now, compute z k+1 , the solution to
(11)
minimize
1
2
k A x − b k2
subject to Pk C x = Pk d .
Next, we must determine whether z k+1 in fact solves (LCLS). First, is
z k+1 feasible ? If z k+1 does not satisfy C z k+1 6 d , then, we compute
t∗ ∈ ( 0 , 1 ) such that
(12)
def
xk+1 = xk + t∗ z k+1 − xk
satisfies C xk+1 6 d , and [ C xk+1 ]j = dj for at least one index j .
The details were discussed in (5). We also need to update the active
constraints,
(13)
Pk+1 = j : [ C xk+1 ]j = dj ,
and continue by considering (11) again with Pk+1 .
If z k+1 satisfies the constraints, we set xk+1 = z k+1 . It is important to
note that now, xk+1 solves the problem (11) with the constraints C x 6 d
added. (This duplicates the equality constraints, but never mind.) So,
x = xk+1 solves
(14)
minimize
1
2
k A x − b k2
subject to Pk C x = Pk d , C x 6 d .
4
STAT 603 – ACTIVE CONSTRAINT METHODS II
In particular, then also
1
2
(15)
k A xk+1 − b k2 <
1
2
k A xk − b k2 .
Now, does xk+1 satisfy the (other) Lagrange multipliers conditions for
a solution of (LCLS) ? That is, does x = xk+1 satisfy
AT ( A x − b ) + C λ = AT b ,
C x 6 d , λ > 0 , λ, C x − d = 0 ,
(16)
for suitable multipliers λ ? Note that the Lagrange multiplier conditions
for (11) are
AT ( A x − b ) + C T PkT µ = AT b
(17)
Pk C x = Pk d ,
without sign restrictions on the Lagrange multipliers µ. So, the question
is whether the pair ( x , µ ) that solves (15) leads to a pair ( x , λ ) that
satisfies (14). The “natural” matching would be to set the Lagrange
multipliers corresponding to the nonactive constraints equal to zero, and
the ones corresponding to the active constraints equal to µ. (Algorithmically, this means the coefficients of µ need to be inserted into the vector
λ in the right places.) Then,
(18)
λ , C x − d = µ , Pk ( C x − d ) = 0 .
Thus, in (14), the only concern is whether λ satisfies λ > 0, or, what is
the same thing, whether µ > 0 .
This is a question of inspection. If µ > 0, then xk+1 is the solution
of (LCLS). If not, then some constraints were mistakenly identified as
active : Remove the constraints with µj < 0 from the active list,
(19)
Pk+1 = Pk − j : µj < 0 .
Actually, it is a toss-up whether we want to remove all constraints that
we guess should not be active, or only the worst offender corresponding
to the smallest µj . And now, we proceed again to (11).
STAT 603 – ACTIVE CONSTRAINT METHODS II
5
3. The algorithm terminates
Although the above algorithm is iterative, in fact it terminates after a
finite number of steps.
Note that
the algorithm constructs a sequence
k
k
2
{ x }k for which k A x − b k k is strictly decreasing. Combined with
the fact that after at most every ` steps, we have an actual solution of a
problem (14), and then we can never revisit the same active constraint
configuration encoded in Pk . Since there are only finitely many such Pk ,
the algorithm must terminate after a finite number of steps.
Potentially, the number of steps could be very large. However, since
k A xk − b k2 is decreasing, we can never visit problems (14) for which
the minimum is larger than our current value of the objective function.
Thus, a lot of implicit pruning of all the configurations Pk is going on
behind the scenes. Typically (?), the number of steps is about 2 `.
4. Plain vanilla monotone regression
An application is provided by nonparametric monotone regression. Suppose we have the model
(20)
i = 1, 2, · · · , n ,
Yi = fo (Xi ) + εi ,
where
(21)
ε = (ε1 , ε2 , · · · , εn )T ∼ N ( 0 , σ 2 ) ,
for some unknown σ 2 . We assume that X1 < X2 < · · · < Xn .
The basic assumption is that the function fo is monotone increasing.
We might know this if fo represents a growth curve, or if we are not
quite sure if fo is increasing, we might wish to compare the monotone
estimator to a spline estimator.
The estimation problem is
minimize
n
P
| f (Xi ) − Yi |2
i=1
subject to f increasing .
To get a finite dimensional model, let
bi = f (Xi ) ,
i = 1, 2, · · · , n ,
6
STAT 603 – ACTIVE CONSTRAINT METHODS II
Then f increasing means b1 6 b2 6 · · · 6 bn which translates into the
constraints D b 6 0 where
(22)
D=
−1 1
0 −1 1
0 0 −1
..
..
..
.
.
.
0 0 0
1
.. . .
.
.
0 −1
∈ R(n−1)×n .
1
Thus, the problem is
(23)
minimize
k b − y k2
subject to D b 6 0 .
Try it, with such examples as fo (x) = 1 − e−c x , 0 6 x 6 1 , with c = 1
or c = 5 or some such.
References
Lawson, C.L., Hanson, R.J. (1974), Solving least squares problems, Prentice-Hall,
Englewood Cliffs, New Jersey (Reprinted : SIAM, Philadelphia, 1995).
© Copyright 2026 Paperzz