An Algorithm for the Matrix Lambert W

. An algorithm is proposed for computing primary matrix Lambert W functions of a square matrix A , which are solutions of the matrix equation We W = A . The algorithm employs the Schur decomposition and blocks the triangular form in such a way that Newton’s method can be used on each diagonal block, with a starting matrix depending on the block. A natural simpliﬁcation of Newton’s method for the Lambert W function is shown to be numerically unstable. By reorganizing the iteration a new Newton variant is constructed that is proved to be numerically stable. Numerical experiments demonstrate that the algorithm is able to compute the branches of the matrix Lambert W function in a numerically reliable way.


Introduction.
The Lambert W function of the complex number a is defined implicitly through the scalar equation (1.1) we w = a, which has a countably infinite set of solutions for a = 0 and just one solution for a = 0. Any solution w = −1 of (1.1) can be extended to an analytic function w, such that we w = a, defined in all of the complex plane except a suitable curve, the branch cut.Any such function is said to be an analytic branch of the Lambert W function and can be further extended on the branch cut (excluding the singular points) in order to be continuous and differentiable (for w = −1) on one side of the branch cut.
For background on the Lambert W function and details of applications we refer the reader to [11], [13].We follow the numbering of the branches, the notation, and the branch cuts of [11] (see Figure 1).The branches are denoted by W k (a) for k ∈ Z, where W 0 (z) is the principal branch whose branch cut is (−∞, −1/e], while for k = 0, the branch cut is (−∞, 0].The extension to the branch cuts is made using the "counter-clockwise continuity" rule [26], that is, considering the one-sided limit from above the branch cut. For each a ∈ C \ {0} the set {W k (a) : k ∈ Z} is the set of solutions of (1.1), while for a = 0 the unique solution is W 0 (0) = 0.The latter implies that W k (0) is undefined for k = 0, namely, 0 is a singular point.In the matrix case, for A ∈ C n×n any solution of the equation W e W = A can be called a matrix Lambert W function [10].The solutions of this matrix equation break into two types.
• Primary matrix Lambert W functions, defined using the theory of primary matrix functions [20, sect. 1.2].Each such function can be written as a polynomial in A. • Nonprimary matrix Lambert W functions, none of which can be written as a polynomial in A. It can be shown that the eigenvalues of any matrix Lambert W function of A are Lambert W functions of the eigenvalues of A. A matrix Lambert W function is nonprimary if and only if its spectrum contains two different branches of the same eigenvalue of A. This may happen only when A has an eigenvalue appearing in two different Jordan blocks.Our focus here is on primary matrix Lambert W functions.
Primary matrix Lambert W functions can be described by specifying a branch for each distinct eigenvalue of A. For an eigenvalue λ, the branch W k can be freely chosen with the following exceptions: • if λ = 0, then we must choose k = 0, since elsewhere the scalar function is not defined; • if λ = −1/e appears in a nontrivial Jordan block then we must choose k ∈ {0, −1}; for k ∈ {0, −1} the scalar function is defined at −1/e (in fact, W 0 (−1/e) = W −1 (−1/e) = −1) but it cannot be extended to a differentiable function.Let λ 1 , . . ., λ t be the distinct eigenvalues of A in a specific order.With the constraints given above, we define W k1,k2,...,kt (A) as the primary Lambert W function whose eigenvalues are W k1 (λ 1 ), . . ., W kt (λ t ).
If we want the same branch W k for every eigenvalue it is convenient to use the notation W k (A) instead of W k,...,k (A).In this case we say that W k (A) is an unmixed branch of the matrix Lambert W function.
The matrix Lambert W function arises in the numerical solution and stability analysis of delay differential (systems of) equations [5], [9], [25], [32], [33], where the principal Lambert W function of a matrix, W 0 (A), is used to deduce properties of the stability of the system.Cepeda-Gomez and Michiels [9] show with examples that the principal branch is not sufficient to determine the stability of all systems, but rather W −1 (A) is needed as well.Thus an algorithm that can compute any branch is required.The matrix Lambert W function has been recently considered in a problem of quantum computing [31], where the matrix argument A is normal (A * A = AA * ).
The purpose of this work is to introduce an algorithm for computing the unmixed branches of the Lambert W function (provided they are well defined) of a general matrix A ∈ C n×n .The algorithm is based on the Newton iteration applied to the matrix equation W e W − A = 0.
We follow the usual idea of considering the scalar function f (w) = we w − a, applying Newton's method to obtain the iteration (1.2) and then translating from scalars to matrices.The resulting matrix iteration coincides with Newton's method applied to W e W − A = 0 when the initial value is a polynomial of A and both are well defined.However, this iteration suffers from two problems.First, it is numerically unstable in finite precision arithmetic.Second, it is not easy to find a starting matrix such that the Newton iteration converges to the desired matrix Lambert W function.We develop a heuristic strategy for the initial value based on two different series expansions of the scalar Lambert W function, at ∞ (and 0) and around −1/e, respectively.For any scalar a few terms of one of these two expansions always yields a good initial value.In the matrix case we would like to use one of the two expansions applied to A, but if A has two eigenvalues for which two different series expansions need to be used as starting points neither of these expansions can be used.We are able to resolve both these problems.We identify a new iteration that is equivalent to Newton's method but numerically stable.Moreover, we consider an ordered Schur form T of A such that the leading eigenvalues are the ones for which the approximation at ∞ (and 0) yields a suitable initial value for Newton's method while the trailing eigenvalues are the ones for which the approximation at −1/e gives a good initial value.Blocking T as a 2 × 2 block matrix and using the Schur-Parlett algorithm we obtain the correct evaluation of the Lambert W function on the whole matrix A. The resulting algorithm for computing the matrix Lambert W function requires O(n 3 ) arithmetic operations and is much more numerically reliable than an algorithm based on diagonalization used in [5].
The paper is organized as follows.In section 2 we give necessary background on Fréchet derivatives and the branch cuts of the Lambert W function.Section 3 treats the choice of the initial value for Newton's method in the scalar case.In section 4 we derive a numerically stable form of a simplified version of Newton's method for the matrix Lambert W function and combine it with a suitably blocked Schur decomposition to construct a complete algorithm for the Lambert W function of a matrix.Section 5 is devoted to numerical experiments.

The Fréchet derivative.
We first give some notation and properties of the Fréchet derivative of a matrix function that will be needed in the following sections.The Fréchet derivative of f at A ∈ C n×n , when it exists, is the unique linear mapping Df (A)[E] satisfying for all E ∈ C n×n .Since Df (A)[E] is linear in E, the vec operator, which converts a matrix to a vector by stacking the columns on top of each other, yields vec(Df (A)[E]) = K(A) vec(E) for an n 2 × n 2 matrix K(A) called the Kronecker matrix.The vec operator interacts nicely with the Kronecker product; in particular, vec(AXB) = (B T ⊗ A) vec(X).
The Fréchet derivative of a matrix function at a matrix A in the direction H is usually much more complicated than its scalar counterpart, but when H commutes with A there is a simple expression, as shown in the following result [20,Prob. 3.8].
Lemma 2.1.Let f : Ω → C be analytic, let U n be the subset of C n×n comprising matrices whose spectrum belongs to Ω, and let f n , f n : U n → C n×n be the matrix functions induced by f and f , respectively.Then for any H commuting with For instance, if A commutes with H, then D exp(A)[H] = exp(A)H, which mimics the scalar case.A formula for the derivative of the exponential function in a general direction is not so simple.We will use the following expression of the Kronecker matrix associated with the derivative of the exponential [20,Thm. 10.13]: (2.1) where f 1 (z) = (e z − 1)/z for z = 0 and f 1 (0) = 1.

Branch cuts of the Lambert W function.
As noted in section 1, the branch cut for the principal branch W 0 is (−∞, −1/e], while for W k , with k = 0, it is (−∞, 0].This choice of branch cuts is convenient because it allows simple asymptotic expansions for W k (z) as z → ∞ based on the branches of the complex logarithm [11].On the other hand, it makes it complicated to understand the behavior of the branches W 1 (z) and W −1 (z) in a neighborhood of the point z = −1/e.
In fact, the image of a small circle around −1/e under W −1 (z) comprises two disjoint curves: the half circle below the real axis is mapped into a curve adjacent to the branch −2 region, while the half circle above the real axis is mapped into a curve adjacent to the branch 0 and 1 regions.Thus, on the one hand, for ε sufficiently small, the function defined as W −1 (z) in the half circle {|z + 1/e| ε : Im z > 0} cannot be extended analytically in a neighborhood of −1/e.On the other hand, the function defined as W −1 (z) in the half circle {|z + 1/e| ε : Im z < 0} can be extended analytically in a neighborhood of −1/e.This explains the lack of symmetry in treating the branch cut.
The situation for the branch W 1 is similar (see Figure 2): on one hand, for ε sufficiently small, the function defined as W 1 (z) in the half circle {|z + 1/e| ε : Im z > 0} can be extended analytically in a neighborhood of −1/e.On the other hand, the function defined as W 1 (z) in the half circle {|z + 1/e| ε : Im z < 0} cannot be extended analytically in a neighborhood of −1/e. .The image through W 1 (z) (right) of a region of the complex plane with a highlighted circle around the branch point −1/e (left).The image of the branch cut (−∞, 0) when we approach it from above is the regular curve separating the range of W 1 (z) from the range of W 2 (z); while the image of the same half line when we approach it from below splits in two curves meeting at the point −1: the left one separates the range of W 1 (z) from that of W −1 (z); the right one separates the same range from that of W 0 (z).This illustrates why we treat in a different way the branch W 1 (z) when we approach the negative real axis clockwise or counter-clockwise; we can loosely say that there is one branch cut from above and two from below.The colors have been used to show what the image of any (part of a) branch cut is.
branch 1 (while it is a branch point for the branch −1); nevertheless, in a neighborhood of −1/e the function W 1 behaves as if it had a branch point there.
Finally, the point 0 turns out to be singular for each branch different from W 0 and, in particular, lim z→0 |W k (z)| = ∞ for |k| > 0.

Computing the scalar Lambert W function.
The Lambert W function has a qualitative behavior comparable to the logarithm, but it cannot be computed with the same techniques because of the lack of suitable identities.The customary methods for computing the scalar Lambert W function are based on a suitable rootfinding method applied to the equation we w = a, such as the Newton or the Halley methods, which guarantee local superlinear convergence for w = −1 and linear convergence for w = −1.These methods are based on iterations of the form z i+1 = ϕ(z i ) and require determination of z 0 such that z i converges to the value W k (a) for the k of interest.For this purpose some heuristic techniques have been proposed and implemented.They are based on two asymptotic expansions of the function W k (z), which we briefly recall.
It is known that [11, eq. (4.18)] for each k one has, as z → ∞, Stirling cycle number of the first kind [28, p. 631].Here, for a nonzero z ∈ C, ln z denotes the principal logarithm, which is the solution of the equation e x = z having imaginary part in the strip {z ∈ C : −π < Im(z) π}.It has been proved that the same expansion holds for z → 0 and k = 0 (see [11] and the references therein).The expansion implies (3.2) which holds either as z → 0 and k = 0 or as z → ∞.
Another expansion is known for z → −1/e, which converges to W 0 (z) in a neighborhood of −1/e, to W −1 when Im(z) 0 and z approaches −1/e, and to W 1 when Im(z) < 0 and z approaches −1/e (compare the discussion in section 2.2 about the choice of the branches).The way to construct the series can be found in [11, p. 350]; the first few terms are A widely used strategy [11], [30] is to choose as initial guess for the selected rootfinding algorithm one of the two expansions (up to a certain term) that is hopefully sufficiently close to W k (z) to guarantee convergence.
We consider a discretization Q of the subset Q = {z = a + ib : |a|, |b| 3} of the complex plane and run Newton's method for each z ∈ Q with z 0 = ϕ k (z) (or with z 0 = ψ k (z)).Let N be a fixed positive integer.If the value W k (z) is approximated within a specific tolerance in m < N steps then we set s k (z) = m, and otherwise s k (z) is set to N .In this way, we have a function s k : Q → {0, 1, . . ., N}, and assigning a color to each element of {0, . . ., N} we obtain an illustration of the convergence behavior of the Newton method.
In Figure 3 we show the result of the experiment for k = 0, together with the circle |z − 1/2| = 3/2.As one can see, choosing the approximation z 0 = ϕ k (z) outside the circle gives convergence in no more than 7 steps (blue and green area of left plot), while choosing the approximation z 0 = ψ k (z) for z inside the circle gives convergence in no more than 8 steps (light green area of the right plot).
Notice that the chosen areas for the initial values are somewhat arbitrary.For instance, for k = 0, we have put the center at z = 1/2 and the radius R = 3/2 for simplicity, but any 1.35 R 1.60 would give the same convergence properties.A similar argument can be used for |k| = 1, yielding a range of useful radii which includes the interval 0.25 R 0.40.This fact gives some flexibility when designing the algorithm for computing the matrix Lambert W function.
for an initial guess X 0 .Let W * be a solution of F (W ) = 0. From the standard theory of Newton's method we know that if DF (W * ) is nonsingular then the sequence (4.1) converges quadratically to W * for any X 0 sufficiently close to W * .The computation of the Newton step using (4.1) is not recommended since it would require dealing with n 2 × n 2 matrices.Hence the Newton method cannot be used in this form.
However, assuming that the starting matrix is a polynomial in A it is possible to simplify Newton's method to a more computationally useful formula, which directly generalizes Newton's method in the scalar case: . ., for an initial guess Y 0 ∈ P(A), where P(A) is the set of polynomials in the matrix A. We call this iteration simplified Newton method.A condition for the equivalence of the regular and simplified Newton methods is proved in the next result.
Theorem 4.1.Let A ∈ C n×n and X 0 = Y 0 ∈ P(A).If both sequences X k of (4.1) and Y k of (4.2) are well defined then X k = Y k ∈ P(A) for all k 0.
Proof.We use an induction argument.Let X k = Y k ∈ P (A).We know that The equivalence of the two methods on the set P(A) does not imply that they are equivalent in a neighborhood of a solution of F (W ) = 0.This is problematic because in finite precision arithmetic we cannot guarantee that for formula (4.2) the computed iterates Y k stay in P(A).The components of Y k in the subspace complementary to P(A), which are created by rounding errors, will be small initially but they can potentially grow greatly.
The theory in [20, sect.4.9.4] says that for stability of an iteration with iteration function g at the fixed point W we need Dg(W ) to have bounded powers.The latter condition holds if ρ(Dg(W )) < 1 (where ρ denotes the spectral radius), or if Dg(W ) is idempotent.In this case errors will not be amplified to first order when iterates are in the neighborhood of W .To test the stability of the iteration G we find the Kronecker matrix form of the Fréchet derivative.
Lemma 4.2.Let W be a solution of F (W ) = 0, and assume that −1 is not an eigenvalue of W . Then the Kronecker matrix representation of DG(W ) is where f 1 (z) = (e z − 1)/z for z = 0 and f 1 (0) = 1.
Proof.First we compute derivative of the function G, using the product and chain rules: At a fixed point W we have (W 2 + Ae −W )(W + I) −1 = W and thus

Using (2.1) we obtain
which implies the result.
Using Lemma 4.2 we can express the stability condition in terms of the spectrum Λ(W ) of W .A sufficient condition for stability is that It is not difficult to find examples where the condition is violated, so we conclude that Newton's method can be unstable in a rather large number of cases.
In our previous work we have on several occasions been able to manipulate an unstable iteration into an equivalent stable iteration [18], [21], [22], [23], [24].Inspired by these results, we were able to find a stable variant of the simplified Newton method, namely, (4.4)The equivalence of (4.2) and (4.4) and the stability of (4.4) are proved in the following result.When the sequences are converging we have Z k → W and H k → 0, and thus we are interested in fixed points of the iteration of the type [W T 0] T .Theorem 4.3.Let A ∈ C n×n and Y 0 = Z 0 ∈ P(A).The sequence Y k of (4.2) is well defined if and only if the sequence Z k of (4.4) is well defined and we have and thus Dϕ ([ W 0 ]) is idempotent.Proof.The equivalence between the two sequences is proved by induction.First, observe that, when −1 is not in the spectrum of Y 0 , we have Y 1 − Y 0 = H 0 and thus Z 1 = Y 1 .Then, assuming that k 1 steps have been performed and that Y h = Z h for h k, we can construct Y k+1 if and only if we can construct Z k+1 , that is when

and this follows from
where we have used the equality which follows from (4.2), the fact that e U+V = e U e V when U and V commute, and the inductive hypothesis.
The statement about Dϕ can be proved by a straightforward but tedious computation that we will omit.
The subtlety of the stabilization process can be seen from the fact that the variant of (4.4) is unstable.Notice that (4.5) differs from the stable variant (4.4)only in that the equation for H k+1 contains the product H k (Z k + I) instead of (Z k + I)H k .This small change in the iteration results in the derivative not always having a spectral radius less than or equal to 1 at a fixed point of the type [W T 0] T .

Choice of the initial value.
To design an algorithm for computing a primary matrix Lambert W function we need to devise a strategy to find a starting matrix for (4.4).
Let A be a square matrix whose distinct eigenvalues are λ 1 , . . ., λ t .Consider a matrix iteration of the type where R(x, a) is a rational function of both of its arguments.It is known [20,Thm. 4.15] (see also [24]) that with X 0 ∈ P(A) the iteration converges if the scalar sequences x k , λ j ), with x (j) 0 = P (λ j ), converge for j = 1, . . ., t.It is not difficult to show that the theorem is true also for R(x, a) = (x 2 + ae −x )/(x + 1).In particular, if x (j) k converges to W kj (λ j ), for any j, then the matrix sequence X k converges to the solution of the matrix equation W e W = A whose eigenvalues are W k1 (λ 1 ), . . ., W kt (λ t ), which is the primary matrix function W k1,...,kt (A).
In order to compute the matrix Lambert W function W k1,...,kt (A) through Newton's method it is therefore sufficient to choose a matrix in P(A) whose eigenvalues yield sequences converging to W k1 (λ 1 ), . . ., W kt (λ t ).In general this is very complicated, but it can be simplified if one needs an unmixed branch W k (A).
We have seen in section 3 a heuristic strategy for the choice of the initial values such that the Newton method should converge to any branch of W k (z).The initial value is chosen as one of the two approximations ϕ k (z) and ψ k (z) defined in (3.4).In particular, for |k| > 1 and for any z = 0 the initial value x 0 = ϕ k (z) gives a sequence x k converging to W k (z).Thus, in order to obtain a sequence converging to W k (A) it is enough to choose X 0 = ϕ k (A).
For |k| 1 the situation is more complicated, since we have two different strategies for the initial value depending on where z lies.In particular we have determined a region where we will choose x 0 = ϕ k (z) and a region where we will choose x 0 = ψ k (z).In the matrix case, we have a stable method for computing W k (A) for A whose eigenvalues all belong to U k , where we choose X 0 = ϕ k (A), and for A whose eigenvalues all belong to V k , where we choose X 0 = ψ k (A).Nevertheless, it may happen that A has eigenvalues in both regions and then neither X 0 = ϕ k (A) nor X 0 = ψ k (A) gives the desired branch of the matrix Lambert W function.
To overcome this problem we use a strategy borrowed from the Schur-Parlett algorithm [14], [29].We consider a Schur form T of the matrix A, ordered such that T is a 2 × 2 block matrix whose (1, 1) block contains the eigenvalues in U k and whose (2, 2) block contains the eigenvalues in V k .Hence we write the matrix and its Lambert W function as where the size of T 11 and X 11 equals the number of eigenvalues λ of A belonging to U k , say m.Notice that m can be n or 0, in which case T = T 11 and T = T 22 , respectively.Since all the eigenvalues of T 11 and T 22 belong to the region of convergence of Newton's method with initial values X 0 = ϕ(T ) and X 0 = ψ(T ), respectively, we can use Newton's method to calculate X 11 = W k (T 11 ) and X 22 = W k (T 22 ), and since W (T ) is a primary function of T we have the commutativity condition We can determine X 12 by equating (1, 2) blocks to obtain the Sylvester equation which has a unique solution since the matrices T 11 and T 22 have no eigenvalues in common [19,Chap. 16].
This algorithm has one weak point: if an eigenvalue of T 11 is near an eigenvalue of T 22 then the Sylvester equation may be ill-conditioned [19, sect. 16.3] even if the Lambert W function is well-conditioned.The problem can be addressed by observing that the circle which separates the two regions is not a sharp division of the two regions of convergence (see the discussion at the end of section 3), so a slightly smaller or larger circle yields the same convergence results.If two eigenvalues of A are very near the circle but in different regions it is possible to reduce or increase the radius of the circle in order to let the two eigenvalues belong to the same region.Nevertheless, there is no a priori lower bound independent of A on the gap between the eigenvalues in T 11 and those in T 22 , while a lower bound dependent on the size , where R M and R m are the largest and the smallest radii, respectively, that may be chosen for the splitting.In our experiments, even for matrices of medium size, say n = 1000, in the worst case we have not noticed a significant lack of accuracy due to ill conditioning of the Sylvester equation.

The algorithm.
We now state our algorithm for computing the matrix Lambert W function using the stabilized Newton method.The algorithm is for the unmixed case.It can be adapted for the mixed case by a more sophisticated blocking strategy, but we will not consider that here.
Algorithm 1. Input: an integer k, 3. Reorder the Schur form of A in order to have T as in (4.6), such that the eigenvalues of T 11 (possibly an empty matrix) belong to U and the eigenvalues of T 22 (possibly an empty matrix) belong to V (this task can be accomplished by applying a unitary similarity as explained in [6]).4a.If T 11 is nonempty then compute X 11 = W k (T 11 ) as the limit of (4.4) with For better accuracy, H 0 should be evaluated as 4b.If T 22 is nonempty (which implies |k| 1) then compute X 22 = W k (T 22 ) as the limit of (4.4) with Z 0 = (−1) k (2eT 22 + 2I) 1/2 − I. 5. Solve the Sylvester equation (4.8) for X 12 (this task can be accomplished using the Bartels and Stewart algorithm [7]).6. Form W k (T ) = X11 X12 0 X22 and recover W k (A) by reversing the similarities of steps 3 and 1.The matrix exponentials and logarithms in step 4 are evaluated using the algorithms of Al-Mohy and Higham [1], [2], which exploit triangularity.The matrix square roots in step 4b are evaluated using the Björck-Hammarling recurrence [8], with the efficient blocked implementation of [16].
It is worth pointing out that, in spite of the similarities between the Lambert W function and the logarithm, the computation of the former is much more expensive than that of the latter.In particular, two matrix logarithms can be required to obtain the initial value for the Newton iteration (see line 4a in Algorithm 1), while each step of the Newton iteration requires the computation of one matrix exponential.The total cost of the algorithm is still O(n 3 ) operations, but with a constant much larger than that for the logarithm using the algorithm of [2].We also note that Newton iterations for matrix functions are invariably started with a scalar multiple of I or A [20]; our choice of nontrivial functions of A as starting matrices is novel.
When A is real and has no real eigenvalues on (−∞, −1/e), it can be proved that W 0 (A) is real (for instance, using [21,Thm. 3.2]).In this case, a real Schur decomposition can be employed in the algorithm, since the sets U and V are symmetric about the real axis, so a nonreal eigenvalue and its complex conjugate belong to the same convergence region and hence can go in the same block.(In contrast, for the Schur-Parlett algorithm usually only the complex Schur form can be used [4], [14]).The logarithm evaluations are now done using the algorithm of Al-Mohy, Higham, and Relton [3], which is designed for real matrices and works entirely in real arithmetic.

Numerical experiments.
We present some numerical experiments to illustrate the behavior of our algorithm in finite precision arithmetic.The tests are performed using MATLAB R2011b, for which the unit roundoff is u = 2 −53 ≈ 1.1×10 −16 .
The algorithm is compared with a diagonalization approach that computes This approach has been used in earlier work [5] but is obviously applicable only when A has a complete set of eigenvectors.
The Schur-Parlett algorithm (implemented in the MATLAB function funm) is of limited use, since it is designed for entire functions.For multivalued functions it could group eigenvalues across a branch cut, leading to a wrong branch.The possibility of constructing a suitable modification of the Schur-Parlett algorithm for multivalued function is still under investigation.
As a measure of the accuracy of a computed approximation W to W k (A) we consider the Frobenius norm relative error W − W F / W F , where W is a reference solution computed by using the diagonalization approach at high precision (with the variable precision arithmetic of the Symbolic Math Toolbox) and then rounding the result to double precision.
Another measure of quality is the relative residual of W , (5.1) As pointed out by Deadman and Higham [15], the difficulty with such residuals is knowing how small we can reasonably expect them to be.Rather than carry out a perturbation analysis as in [15], here we will take advantage of the availability of W and we will simply compare the residual of W with that of W . Experiment 1.To compare our Algorithm 1 with the diagonalization approach, we consider the matrix whose eigenvector matrix becomes increasingly ill conditioned as ε → 0. In Figure 4 we plot the relative error of the two methods for computing W −1 (A), for ε = 10  with t varying from 0 to 16.As expected the diagonalization approach loses accuracy as ε tends to zero while our method has accuracy independent of the eigenvector conditioning.
A similar figure is obtained considering any other branch of the Lambert W function.
Experiment 2. We pick 47 matrices of size 10 × 10 from the MATLAB gallery function.We compare the error in computing W 0 (A) for Algorithm 1 and the diagonalization approach.In Figures 5 and 6 we show the relative residuals and relative errors, respectively.Figure 6 also shows an estimate of cond(W 0 , A)u, where cond(W 0 , A) is a 1-norm condition number for W 0 defined in the usual way for matrix functions [20, sect.3.1] and estimated using the code funm_condest1 from the Matrix Function Toolbox [17], [20,Alg. 3.20].
As one can see in Figure 5 the residual for Algorithm 1 is never more than a couple of orders of magnitude larger than that of the reference solution, whereas diagonalization gives much larger residuals than Algorithm 1 on a number of problems.Consistent with this behavior, Figure 6 shows that Algorithm 1 has error less than cond(W 0 , A)u in almost every case, whereas the error for diagonalization greatly exceeds cond(W 0 , A)u in several cases.For all the matrices tested the Newton iteration required no more than 9 steps to converge for each of the (at most two) diagonal blocks.
Experiment 3. We repeat the previous test but now computing W −1 (A), in order to show that our method is able to compute also the nonprincipal branches of the Lambert W function.We have removed singular matrices from the test, since W −1 (z) has a singularity for z = 0.The residuals are plotted in Figure 7, which shows a similar behavior to the case of W 0 (A).[12] call the Lambert W function "the first nontrivial example of a multivalued function", since the logarithm (its close relative) and the inverse trigonometric functions have such a regular branch structure that a new notation is not needed to specify the branch.We have presented the first numerically reliable algorithm for computing an arbitrary branch of the matrix Lambert W function.Corless and Jeffrey also comment that "the multivalued nature of W 'stress tests' naming conventions, numerics on branches, computer-aided analysis, and the results of series computations".To that list we can add the evaluation of a nonentire function of a triangular matrix.To derive our algorithm we had to construct a numerically stable Newton iteration, formulate starting matrices that are nontrivial functions of the matrix of interest (in contrast to other Newton iterations for matrix functions), and use a Schur form with a novel blocking.Our algorithm builds on previous work on matrix functions in that it employs algorithms for the matrix exponential, logarithm, and square root.Empirically, the algorithm produces residuals within a couple of orders of magnitude of those of a reference solution.

Concluding remarks. Corless and Jeffrey
Given the wide range of situations in which the scalar Lambert W function arises [11], we can expect to see more of the matrix version, both in practical applications, such as those cited in section 1, and in more theoretical studies, such as that in [27] concerning chain rules for matrix functions.

Fig. 1 .
Fig. 1.The ranges of the branches of the Lambert W function and the values of W k (1) (+), W k (10 + 10i) (×), and W k (−0.1) (o).The colors of the curves that separate two adjacent regions denote the branches to which the curves belong.The corresponding plot for the logarithm consists of horizontal strips of height 2π with boundaries at odd multiples of π.

1 Fig. 2
Fig.2.The image through W 1 (z) (right) of a region of the complex plane with a highlighted circle around the branch point −1/e (left).The image of the branch cut (−∞, 0) when we approach it from above is the regular curve separating the range of W 1 (z) from the range of W 2 (z); while the image of the same half line when we approach it from below splits in two curves meeting at the point −1: the left one separates the range of W 1 (z) from that of W −1 (z); the right one separates the same range from that of W 0 (z).This illustrates why we treat in a different way the branch W 1 (z) when we approach the negative real axis clockwise or counter-clockwise; we can loosely say that there is one branch cut from above and two from below.The colors have been used to show what the image of any (part of a) branch cut is.
c 2015 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 07/12/15 to 86.5.35.10.Redistribution subject to CCBY license and the proof is completed.

Fig. 5 .
Fig. 5. Residuals in computing W 0 (A), where A ∈ C n×n ranges over a set of 47 matrices chosen from the MATLAB gallery function.The results are ordered by nonincreasing residual of the reference solution.

Fig. 6 .
Fig. 6.Relative errors in computing W 0 (A), where A ∈ C n×n ranges over 47 matrices chosen from the MATLAB gallery function.The results are ordered by nonincreasing value of cond(W 0 , A)u.

Fig. 7 .
Fig. 7. Residuals in computing W −1 (A), where A ∈ C n×n varies over 45 matrices chosen from the MATLAB gallery function.The results are ordered by nonincreasing residual of the reference solution.