Estimating the Condition Number of the Frechet Derivative of a Matrix Function

The Frechet derivative $L_f$ of a matrix function $f : \mathbb{C}^{n\times n} \to \mathbb{C}^{n\times n}$ is used in a variety of applications and several algorithms are available for computing it. We define a condition number for the Frechet derivative and derive upper and lower bounds for it that differ by at most a factor $2$. For a wide class of functions we derive an algorithm for estimating the 1-norm condition number that requires $O(n^3)$ flops given $O(n^3)$ flops algorithms for evaluating $f$ and $L_f$; in practice it produces estimates correct to within a factor $6n$. Numerical experiments show the new algorithm to be much more reliable than a previous heuristic estimate of conditioning.


Introduction.
Condition numbers are widely used in numerical analysis for measuring the sensitivity of the solution to a problem when the input data is subject to small perturbations such as rounding or measurement errors [14]. For matrix functions the condition number is intimately related to the Fréchet derivative. The Fréchet derivative of f : C n×n → C n×n is a mapping L f (A, ·): C n×n → C n×n that is linear in its second argument and for any E ∈ C n×n satisfies The Fréchet derivative of a matrix function also arises as an object of interest in its own right in a variety of applications, of which some recent examples are the computation of correlated choice probabilities [1], computing linearized backward errors for matrix functions [6], analysis of complex networks [7], [8], Markov models of cancer [9], computing matrix geometric means [21], nonlinear optimization for model reduction [25], [26], and tensor-based morphometry [30]. Software for computing Fréchet derivatives of matrix functions is available in a variety of languages [16]. The aims of this work are to define the condition number of the Fréchet derivative, obtain bounds for it, and construct an efficient algorithm to estimate it. We first recall the definition of the condition number of a matrix function f . The absolute condition number of f is defined by and is characterized as the norm of the Fréchet derivative [15,Thm. 3.1], [27]: (1.2) cond abs (f, In practice the relative condition number is more relevant. The two condition numbers are closely related, since (1.4) cond rel (f, A) = cond abs (f, A) A f (A) .
Associated with the Fréchet derivative is its Kronecker matrix form [15, eq. (3.17)], which is the unique matrix K f (A) ∈ C n 2 ×n 2 such that for any E ∈ C n×n , (1.5) vec The principal use of the Kronecker form is that its norm, estimated via the 1-norm or 2-norm power methods, for example, gives an estimate of cond abs (f, A) [15, sect. 3.4], [22]. To investigate the condition number of the Fréchet derivative we will need higher order Fréchet derivatives of matrix functions, which were recently investigated by Higham and Relton [19]. We now summarize the key results that we will need from that work.
The second Fréchet derivative L (2) f (A, E 1 , E 2 ) ∈ C n×n is linear in both E 1 and E 2 and satisfies Higham and Relton show that a sufficient condition for the second Fréchet derivative L (2) f (A, ·, ·) to exist is that f is 4p − 1 times continuously differentiable on an open set containing the eigenvalues of A [19,Thm. 3.5], where p is the size of the largest Jordan block of A. This condition is certainly satisfied if f is 4n−1 times continuously differentiable on a suitable open set. In this work we assume without further comment that this latter condition holds: it clearly does for common matrix functions such as the exponential, logarithm, and matrix powers A t with t ∈ R or indeed for any analytic function. Under this condition it follows that L (2) f (A, E 1 , E 2 ) is independent of the order of E 1 and E 2 (which is analogous to the equality of mixed second order partial derivatives for scalar functions) [19, sect. 2].
One method for computing Fréchet derivatives is to apply f to a 2n × 2n matrix and read off the top right-hand block [15, eq. (3.16)], [24]: Higham and Relton [19,Thm. 3.5] show that the second Fréchet derivative can be calculated in a similar way, as the top right-hand block of a 4n × 4n matrix: There is also a second order Kronecker matrix form [19, sect. 4], analogous to (1.5) and denoted by K (2) f (A) ∈ C n 4 ×n 2 , such that for any E 1 and E 2 , This paper is organized as follows. In section 2 we define the absolute and relative condition numbers of a Fréchet derivative, relate the two, and bound them above and below in terms of the second Fréchet derivative and the condition number of f . The upper and lower bounds that we obtain differ by at most a factor 2. Section 3 relates the bounds to the Kronecker matrix K (2) f (A) at the cost, for the 1-norm, of introducing a further factor n of uncertainty, and this leads to an O(n 3 ) flops algorithm given in section 4 for estimating the 1-norm condition number of the Fréchet derivative. We test the accuracy and robustness of our algorithm via numerical experiments in section 5. Concluding remarks are given in section 6.

The condition number of the Fréchet derivative.
We begin by proposing a natural definition for the absolute and relative condition numbers of a Fréchet derivative and showing that the two are closely related. We define the absolute condition number of a Fréchet derivative L f (A, E) by which measures the maximal effect that small perturbations in the data A and E can have on the Fréchet derivative. Similarly, we define the relative condition number by where the changes are now measured in a relative sense. By taking ΔA and ΔE sufficiently small we can rearrange (2.2) to obtain the approximate upper bound A useful property of the relative condition number is its lack of dependence on Furthermore we can obtain a similar relationship to (1.4) relating the absolute and relative condition numbers. This is useful since it allows us to state results and algorithms using the absolute condition number before reinterpreting them in terms of the relative condition number.
Lemma 2.1. The absolute and relative condition numbers of the Fréchet derivative L f are related by Proof. Using (2.4) and setting s E = A and δ = A we have In order to bound the relative condition number we will derive computable bounds on the absolute condition number and use the relationship in Lemma 2.1 to translate them into bounds on the relative condition number. We first obtain lower bounds.
Lemma 2.2. The absolute condition number of the Fréchet derivative satisfies both of the lower bounds Proof. For the first bound we set ΔA = 0 in (2.1) and use the linearity of the derivative: = cond abs (f, A). Similarly, for the second bound we set ΔE = 0 and obtain, using (1.7), Next, we derive an upper bound. Lemma 2.3. The absolute condition number of the Fréchet derivative satisfies Proof. Notice that by linearity of the second argument of L f , The first term on the right-hand side of (2.6) is equal to max ΔA =1 L (2) f (A, E, ΔA) by (2.5). For the second half of the bound (2.6) we have, using (1.7) and the fact that L Combining the two halves of the bound gives the result.
We now give the corresponding bounds for the relative condition number.
Proof. To show cond rel (L f , A, E) ≥ 1 we can use Lemmas 2.1 and 2.2, along with the linearity of L f (A, E) in E, as follows: For the other inequalities apply Lemma 2.1 to Lemmas 2.2 and 2.3 similarly. Theorem 2.4 gives upper and lower bounds for cond rel (L f , A, E) that differ by at most a factor 2. During our numerical experiments in section 5 we found that typically cond abs (f, A) and sM were of comparable size, though on occasion they differed by many orders of magnitude. Finding sufficient conditions for these two quantities to differ significantly remains an open question which will depend on the complex interaction between f , A, and E.
There are already efficient algorithms for estimating cond abs (f, A) based on matrix norm estimation [15, sect. 3.4] in conjunction with methods for evaluating the Fréchet derivative [2], [3], [4], [17]. The key question is therefore how to estimate the maximum of L (2) f (A, E, ΔA) over all ΔA with ΔA = 1. This is the subject of the next section.

Maximizing the second Fréchet derivative.
Our techniques for estimating the required maximum norm of the second Fréchet derivative are analogous to those for estimating cond abs (f, A), so we first recall the latter.
We begin by considering the Frobenius norm, because the condition number of a matrix function can be computed as [15, eq. (3.20)] In practice we usually estimate cond abs (f, To estimate K f (A) 1 the block 1-norm power method of Higham and Tisseur [20] is used [15,Alg. 3.22]. This approach requires around 4t matrix-vector products in total (using both K f (A) and K f (A) * ) and produces estimates rarely more than a factor 3 from the true norm. The parameter t is usually set to 2, but can be increased for greater accuracy at the cost of extra flops. Using (1.10) we obtain a result similar to (3.1) for maximizing the norm of the second Fréchet derivative: 3) The next result shows that using the 1-norm instead gives the same accuracy guarantees as (3.2).
Proof. Making use of (1.10), for the lower bound we obtain For the upper bound, using (1.10) again, Explicitly computing matrix-vector products with (vec(E) T ⊗I n 2 )K (2) f (A) and its conjugate transpose is not feasible, as computing K Fortunately we can compute the matrix-vector products implicitly since, by (1.10), where the evaluation of the right-hand side costs only O(n 3 ) flops using (1.9). This is analogous to the relation Proof. Our proof is by induction on k, where the base case k = 1 is established by Higham and Lin [17,Lem. 6.2]. Assume that the result holds for the kth Fréchet derivative, which exists under the given assumptions. Then, since the Fréchet derivative is equal to the Gâteaux derivative [19], Using the inductive hypothesis the right-hand side becomes The conditions of Theorem 3.2 are not very restrictive; they are satisfied by the exponential, the logarithm, real powers A t (t ∈ R), the matrix sign function, and trigonometric functions, for example. The condition Proof. We will need to use the Kronecker product property We also need the commutation (or vec-permutation) matrix C n ∈ C n 2 ×n 2 , which is a permutation matrix defined by the property that for A ∈ C n×n , vec(A T ) = C n vec(A). It is symmetric and satisfies, for A, B ∈ C n×n and x, y ∈ C n [10], [23,Thm. 3.1], We will prove that the two matrices in the theorem statement are equal by showing that they take the same value when multiplied by the arbitrary vector v = vec(V ), where V ∈ C n×n . Multiplying both sides by v and taking vec of the right-hand side we find that we need to show Manipulating the left-hand side we have using (3.8) for the last equality. Therefore it remains to show that the proof of which can be found in the appendix. Theorem 3.3 shows that we can compute matrix-vector products with the conjugate transpose as (vec(E) T ⊗ I n 2 )K where the final equality is from Theorem 3.2. Therefore the block 1-norm estimator can be used to estimate efficiently (vec(E) T ⊗ I n 2 )K (2) f (A) 1 in Theorem 3.1.

An algorithm for estimating the relative condition number.
We are now ready to state our complete algorithm for estimating the relative condition number of a Fréchet derivative in the 1-norm.
In the following algorithm we use the unvec operator, which for a vector v ∈ C n 2 returns the unique matrix in C n×n such that vec(unvec(v)) = v. Algorithm 4.1. Given A ∈ C n×n , E ∈ C n×n , and f satisfying the conditions of Theorem 3.2 this algorithm produces an estimate γ of the relative condition number cond rel (L f , A, E). It uses the block 1-norm estimation algorithm of [20] with t = 2, which we denote by normest (an implementation is [12, funm condest1]). 1 Compute f (A) and L f (A, E) via specialized algorithms such as those in [2], [4], or [17] if possible. Alternatively, compute L f (A, E) by finite differences, the complex step method [3], or (1.8).
Calculate W = L (2) f (A, E, V ) using (1.9) for example. 10 Return vec(W ) to the norm estimator. 11 . . . To compute (vec(E) T ⊗ I n 2 )K The quality of the estimate returned by Algorithm 4.1 depends on the quality of the underlying bounds and the quality of the computed norm estimate. The estimate has a factor 2 uncertainty from Theorem 2.4 and another factor n uncertainty from (3.2) and Theorem 3.1. The norm estimates are usually correct to within a factor 3, so overall we can expect the estimate from Algorithm 4.1 to differ from cond rel (f, A) by at most a factor 6n.
Even though the Fréchet derivative L (2) f (A, E 1 , E 2 ) is linear in E 1 and E 2 , the scaling of E 1 and E 2 may affect the accuracy of the computation. Heuristically we might expect that scaling E 1 and E 2 so that A 1 ≈ E 1 1 ≈ E 2 1 would give good accuracy. When implementing Algorithm 4.1 we scale E 1 and E 2 in this way before taking the derivatives and rescaling the result.

Numerical experiments.
Our experiments are all performed in MATLAB R2013a. We examine the performance of Algorithm 4.1 for the matrix logarithm and matrix powers A t with t ∈ R using the Fréchet derivative evaluation algorithms from [4] and [17], respectively. Throughout this section u = 2 −53 denotes the unit roundoff. Since the Fréchet derivative algorithms in question have been shown to perform in a forward stable manner in [4] and [17] (assessed therein using the Kronecker condition number estimator that we will show tends to underestimate the true condition number) we expect their relative errors to be bounded by the condition number times the unit roundoff.
We will compare Algorithm 4.1, denoted in this section by condest FD, with three other methods in terms of the accuracy and reliability of using the estimated value of cond rel (L f , A, E)u as a bound on the relative error where L f (A, E) is the computed Fréchet derivative. Unfortunately, we cannot directly assess the quality of our condition number estimates as we have no way to compute the exact condition number cond rel (L f , A, E).
For our tests we need to choose the matrices A and E at which to evaluate the Fréchet derivative and its condition number. For A we use the same test matrices as in [4] and [17]. These (mostly 10 × 10) matrices are from the Matrix Computation Toolbox [11], the MATLAB gallery function, and the literature. Ideally we would choose the direction E as a direction that maximizes the relative error above; however, it is unclear how to do so without resorting to expensive optimization procedures. Instead we choose the direction E to be a matrix with normal (0, 1) distributed elements, but we give a specific example of a worst case direction for the matrix logarithm in section 5.3.
To compute an accurate value of L f (A, E), used solely to calculate the relative errors mentioned above, we evaluate (1.8) in 250 digit precision by performing the di- 0 X ], applying f to the diagonal matrix D, and returning the (1, 2) block. If the matrix [ X E 0 X ] is not diagonalizable we add a random perturbation of norm 10 −125 to make the eigenvalues distinct. This idea was introduced by Davis [5] and has been used in [4] and [17]. These high precision calculations are performed in the Symbolic Math Toolbox.
We compare our algorithm against three approximations. The first is where ΔA and ΔE are chosen to have normal (0, 1) distributed elements and then are scaled so that 2)). We would expect this method to generally underestimate the condition number since ΔA and ΔE are unlikely to point in the directions of greatest sensitivity. This estimate will be referred to as the random method throughout this section. Since this method requires only two Fréchet derivative evaluations (as opposed to around 17 for Algorithm 4.1) one possible extension of this method would be to run it k times and take the mean as an estimate of the condition number. Further experiments, not reported here, took k = 5, 10, and 20 without seeing any significant change in the results. Our next alternative approximation is where K f (A) is the Kronecker form of the Fréchet derivative in (1.5), and ΔA is generated with normal (0, 1) distributed elements and then scaled so that ΔA 1 / A 1 = = 10 −8 . This heuristic approximation has been used in [2], [4], but has two drawbacks. First, the dependence on E is ignored, which (see Lemmas 2.2 and 2.3) essentially corresponds to neglecting an additive cond abs (f, A) term and so could lead to underestimating the condition number. Second, the random direction ΔA will generally not point in the direction in which K f (A) is most sensitive, again leading to underestimation. We refer to this as the Kronecker method in our experiments. This method costs O(n 5 ) flops and is therefore the most expensive. We might also try running this method k times and taking the mean of the results, in an attempt to better estimate the condition number. Further experiments averaging k = 5, 10, and 20 runs of this algorithm made no significant difference to the results.
The final approximation method for comparison is a modification of Algorithm 4.1 that estimates the second Fréchet derivative by the finite difference approximation L function g(A) = L f (A, E) with the option to use finite differences selected, with the default value t = 10 −8 . We will refer to this method as fin diff in our experiments. This method has essentially identical cost to Algorithm 4.1, the only difference being the computation of the second Fréchet derivatives.

Condition number of Fréchet derivative of matrix logarithm.
In our first experiment we compute the Fréchet derivative of the logarithm of 66 test matrices using the algorithm of Al-Mohy, Higham, and Relton [4]. Figure 1 shows the normwise relative errors and the estimates of cond rel (L log , A, E)u.
We see that fin diff and condest FD give similar output in most cases, as do Kronecker and random, though neither of these latter two seems able to yield values higher than 10 −8 (the length of the finite difference step used in the algorithm). All four methods agree on which problems are well conditioned. On the right-hand side of the figure we see that some relative errors are slightly above the estimates. However all are within a factor 2.7 of the estimate from condest FD, which is much less than the factor 6n we can expect in the worst case, as explained at the end of section 4.
For the ill conditioned problems both Kronecker and fin diff fail to return condition number estimates for some of the test matrices, as indicated by the broken lines at the left end of Figure 1. This is due to a perturbed matrix A + V having negative eigenvalues during the computation of the Fréchet derivatives using finite differences, which raises an error since the principal matrix logarithm and its Fréchet derivative are not defined for such matrices. In principle this same problem could happen when using the random method. Since condest FD computes bounds on the second Fréchet derivative without perturbing A it does not encounter this problem. In section 5.3 we analyze the second test matrix in more detail and find that, despite the error bound being pessimistic, the condition number truly is as large as estimated by fin diff and condest FD.

Condition number of Fréchet derivative of matrix power.
Our second experiment compares the algorithms on the function A t with t = 1/15 over 60 test matrices from the previous set, where the Fréchet derivative is computed using the algorithm of Higham and Lin [17]. Figure 2 shows the normwise relative errors and the estimated quantities cond(L x t , A, E)u, sorted by decreasing condest FD. Again we see that the condition number estimates from Kronecker and random are bounded above by about 10 −8 , though the actual relative errors are sometimes much higher.
The methods return similar condition number estimates for the well conditioned problems but give very different results on the ill conditioned problems in the first 10 test cases. In particular fin diff, Kronecker, and random do not provide reliable error bounds for the badly conditioned cases, their bounds being several orders of magnitude lower than the observed relative errors for test matrices 6 and 9. There is also some significant overestimation by fin diff on test matrix 8. In contrast, condest FD provides reliable error bounds for all the ill conditioned problems.
Similar experiments with the matrix exponential, not reported here, show analogous results: both condest FD and fin diff give good bounds on the relative errors while Kronecker and random generally underestimate them. The only difference is that fin diff also gives good bounds for the ill-conditioned problems, instead of failing or giving spurious results as above. This example is particularly interesting because the condition number estimated by Algorithm 4.1 is large, cond rel (L log , A, E) ≈ 1.5 × 10 20 , but we observed a relative error of around 10 −10 when computing the Fréchet derivative in our experiments. We will show that a tiny perturbation to A that greatly changes L log (A, E) exists.

An ill conditioned
What we need to do is find a matrix V with V 1 = 1 such that L (2) log (A, E, V ) 1 is large, since by Theorem 2.4 this will imply that cond rel (L log , A, E) is large. Such a V can be obtained as output from the 1-norm estimator. However, we will obtain it from first principles by applying direct search optimization [13], with the code mdsmax from [11] that implements the algorithm from [28], [29]. Direct search yields log (A, E, V ) 1 = 1.4 × 10 44 . Calculating the Fréchet derivatives L log (A, E) and L log (A + uV, E) in 250 digit arithmetic-using the procedure outlined at the beginning of this section-leads to a relative difference of showing that the Fréchet derivative evaluation is extremely sensitive to perturbations in the direction V . We were fortunate not to experience this sensitivity during the evaluation of L log (A, E). This computation confirms that, as condest FD suggests, a relative perturbation of order u to A can produce a change of order 1 in the Fréchet derivative. But as we saw in the experiments, ill conditioning is not identified consistently by the approximations from fin diff, Kronecker, or random.

Conclusion.
We have defined, for the first time, the condition number of the Fréchet derivative of a matrix function and derived an algorithm for estimating it (Algorithm 4.1) that applies to a wide class of functions that includes the exponential, the logarithm, and real matrix powers. In practice, the algorithm produces estimates within a factor 6n of the true 1-norm condition number at a cost of O(n 3 ) flops, given O(n 3 ) flops algorithms for computing the function and its Fréchet derivative. The norms being estimated by the algorithm involve n 4 × n 2 matrices, so structure is being exploited. An interesting open question is whether the highly structured nature of the second Fréchet derivative and its Kronecker form can be exploited to gain further theoretical insight into the conditioning of the Fréchet derivative.
The new algorithm is particularly useful for testing the forward stability of algorithms for computing Fréchet derivatives, and for this purpose our experiments show it to be much more reliable than a heuristic estimate used previously.
We will begin by showing that vec(K (2) f (A) * ) = vec(K (2) f (A * )C n ) which (after some manipulation) reduces the problem to one showing that f (A * )).
Before proceeding we recall that C n is a permutation matrix corresponding to some permutation σ on the integers from 1 to n 2 . This permutation can be defined by the property that when vec(E i ) = e i is the ith standard basis vector then which follows from the observation that C n vec(E i ) = C n e i = e σ(i) = vec(E σ(i) ) along with C n vec(E i ) = vec(E T i ).
To do so, we will expand the rows and columns then show they are equal elementwise. Since [19] L (2) f (A, E k , E j ) =    Suppressing the dependence on t, we need to prove that e T j vec(L f (A, E i )) = e T σ(i) vec(L f (A, E σ(j) )), since these are the (i, j) elements of the left-and right-hand side of (A.3), respectively (with the complex conjugation removed from both sides). Beginning from the righthand side we have   To complete the result we need to prove (A.1). To make the notation slightly easier we will use X = A * from now on. By [23, Thm. 3.1 (i)] we can write where e k ∈ C n 2 , and so the left-hand side of (A.1) becomes (C n 2 ⊗ I n 2 ) vec(K f (X, E 1 ) . . .