Forward-over-reverse mode with reverse mode source differentiation and Taylor objects
ADiMat has a source code transformation for reverse mode differentiation which can evaluate gradients and vector-Jacobian products efficiently. By combination with the class tseries implementing truncated Taylor series by operator overloading this becomes a tool for computing derivatives of truncated Taylor series. A very common special case is obtained by setting the truncation order of the Taylor objects to one. In scalar mode, this produces a Hessian-vector evaluator. The Taylor objects compute a Jacobian-vector product and the reverse mode then gives the derivative of that w.r.t. all independent variables, which is the corresponding Hessian-vector product.
Contents
Differentiating in reverse mode
Differentiate the function in question in reverse mode. Lets consider the function fprod.m
type fprod fun = @fprod admTransform(@fprod, admOptions('m', 'r'))
function z = fprod(a) ind = repmat({':'}, [length(size(a)) 1]); dim = 2; ind{dim} = 1; z = a(ind{:}); for i=2:size(a, dim) ind{dim} = i; z = z .* a(ind{:}); end fun = @fprod ans = 1
This produces a function a_fprod which can be used to compute the gradient of the function. It has two inputs x and a_z, the latter of which must be an array of the same shape as z and must be filled with a reverse derivative direction y. The reverse mode code will compute y * J, where J is the Jacobian. Here the function result z is scalar, J is a gradient, so we just set a_z to one. If z was not scalar, we would compute a vector-Jacobian product y * J instead.
adimat_derivclass scalar_directderivs adimat_adjoint default x = 1:6; a_z = 1; [a_x z] = a_fprod(x, a_z)
a_x = 720 360 240 180 144 120 z = 720
Setting up the Taylor class
This is enabled by the adimat_adjoint command.
adimat_adjoint taylor2
Creating Taylor Objects for the inputs
The differentiated function has two input parameters, x and a_z. Both are now to become Taylor objects. We set the truncation order to one.
set(tseries2(), 'maxorder', 1);
tx = tseries2(x);
ta_z = tseries2(a_z);
A Taylor object contains the coefficients of the Taylor series up to the truncation order. The coefficients can be accessed by cell-style indexing with braces {}. The k-th entry holds the coefficients of order k-1. For example tx{1} holds the zeroth order coefficient, which is the value and tx{2} holds the first derivative, which is zero be default.
tx{1} tx{2}
ans = 1 2 3 4 5 6 ans = 0 0 0 0 0 0
Seeding: Initializing coefficients of independent variables
To perform this derivative computation, we have to initialize some of the derivatives, namely those of the input x. We fill these with a derivative direction vector v. Let's choose here the third canonical unit vector.
v = zeros(numel(x), 1); v(3) = 1; tx{2}(:) = v;
ta_z could also be considered a derivative. It holds the adjoint vector y, in the zero-order (value) coefficient, just as a_z does. The first order coefficient of ta_y is zero.
Running the function
Now we run the function with the Taylor objects.
[ta_x tz] = a_fprod(tx, ta_z)
ta_x = tseries2 object: 1-by-6 tz = tseries2 object: 1-by-1
Both function results are now also Taylor objects.
Interpreting the results
What do we have obtained now? Well to the easy parts first: We know that in the zero-order fields of the outputs ta_x and tz we will find the same values as in a_x and z. So, in tz{1} the function result, and in a_x{1} the gradient, that is y * J.
assert(isequal(tz{1}, z)) assert(isequal(ta_x{1}, a_x))
Also, we should find the Jacobian-vector product J * v in the first order coefficient of tz. This is the result of the plain forward mode through the Taylor objects. Here, we should find the derivative of z w.r.t. x(3).
tz{2}
ans = 240
This leaves us with ta_x{2}, the first order components of ta_x. It is the reverse mode derivative of J*v, which is in z{2}, along direction w. Here, we get the third column (or row) of the Hessian.
ta_x{2}
ans = 240 120 0 60 48 40
Alternatively, ta_x{2} is the first order derivative of the gradient w * J, which is in ta_x{1}, along direction v. This is the derivative of the gradient w.r.t. x(3).
% FM-over-RM AD % % f(x) ------------- diff. in FM ---> df(x)/dx * v % | | % diff. in diff. in % RM RM % | | % v v % d w^T*f(x)/dx --- diff. in FM ---> d w^T * (df(x)/dx * v) /dx % = d d w^T*f(x)/dx /dx * v
Verifying the results
We compute the Hessian and gradient with the plain ADiMat driver for Hessians with finite differences.
adopts = admOptions(); adopts.functionResults = {z}; [H grad] = admHessFD(fun, 1, x, adopts); assert(isequal(H(:,3), ta_x{2}(:))) assert(isequal(H(3,:), ta_x{2})) assert(isequal(grad, ta_x{1}))
When passing both derivative direction vectors into admHessian it will perform exactly the same computation as we did here manually.
[Hv grad] = admHessian(fun, {a_z(:).', 1, v}, x, adopts); assert(isequal(Hv, ta_x{2}(:)))