Forward-over-reverse mode with reverse mode source differentiation and Taylor objects

ADiMat has a source code transformation for reverse mode differentiation which can evaluate gradients and vector-Jacobian products efficiently. By combination with the class tseries implementing truncated Taylor series by operator overloading this becomes a tool for computing derivatives of truncated Taylor series. A very common special case is obtained by setting the truncation order of the Taylor objects to one. In scalar mode, this produces a Hessian-vector evaluator. The Taylor objects compute a Jacobian-vector product and the reverse mode then gives the derivative of that w.r.t. all independent variables, which is the corresponding Hessian-vector product.

Contents

Differentiating in reverse mode

Differentiate the function in question in reverse mode. Lets consider the function fprod.m

type fprod
fun = @fprod
admTransform(@fprod, admOptions('m', 'r'))
function z = fprod(a)
  ind = repmat({':'}, [length(size(a)) 1]);
  dim = 2;
  ind{dim} = 1;
  z = a(ind{:});
  for i=2:size(a, dim)
    ind{dim} = i;
    z = z .* a(ind{:});
  end

fun = 
    @fprod
ans =
     1

This produces a function a_fprod which can be used to compute the gradient of the function. It has two inputs x and a_z, the latter of which must be an array of the same shape as z and must be filled with a reverse derivative direction y. The reverse mode code will compute y * J, where J is the Jacobian. Here the function result z is scalar, J is a gradient, so we just set a_z to one. If z was not scalar, we would compute a vector-Jacobian product y * J instead.

adimat_derivclass scalar_directderivs
adimat_adjoint default
x = 1:6;
a_z = 1;
[a_x z] = a_fprod(x, a_z)
a_x =
   720   360   240   180   144   120
z =
   720

Setting up the Taylor class

This is enabled by the adimat_adjoint command.

adimat_adjoint taylor2

Creating Taylor Objects for the inputs

The differentiated function has two input parameters, x and a_z. Both are now to become Taylor objects. We set the truncation order to one.

set(tseries2(), 'maxorder', 1);

tx = tseries2(x);
ta_z = tseries2(a_z);

A Taylor object contains the coefficients of the Taylor series up to the truncation order. The coefficients can be accessed by cell-style indexing with braces {}. The k-th entry holds the coefficients of order k-1. For example tx{1} holds the zeroth order coefficient, which is the value and tx{2} holds the first derivative, which is zero be default.

tx{1}
tx{2}
ans =
     1     2     3     4     5     6
ans =
     0     0     0     0     0     0

Seeding: Initializing coefficients of independent variables

To perform this derivative computation, we have to initialize some of the derivatives, namely those of the input x. We fill these with a derivative direction vector v. Let's choose here the third canonical unit vector.

v = zeros(numel(x), 1);
v(3) = 1;
tx{2}(:) = v;

ta_z could also be considered a derivative. It holds the adjoint vector y, in the zero-order (value) coefficient, just as a_z does. The first order coefficient of ta_y is zero.

Running the function

Now we run the function with the Taylor objects.

[ta_x tz] = a_fprod(tx, ta_z)
ta_x = 
	tseries2 object: 1-by-6
tz = 
	tseries2 object: 1-by-1

Both function results are now also Taylor objects.

Interpreting the results

What do we have obtained now? Well to the easy parts first: We know that in the zero-order fields of the outputs ta_x and tz we will find the same values as in a_x and z. So, in tz{1} the function result, and in a_x{1} the gradient, that is y * J.

assert(isequal(tz{1}, z))
assert(isequal(ta_x{1}, a_x))

Also, we should find the Jacobian-vector product J * v in the first order coefficient of tz. This is the result of the plain forward mode through the Taylor objects. Here, we should find the derivative of z w.r.t. x(3).

tz{2}
ans =
   240

This leaves us with ta_x{2}, the first order components of ta_x. It is the reverse mode derivative of J*v, which is in z{2}, along direction w. Here, we get the third column (or row) of the Hessian.

ta_x{2}
ans =
   240   120     0    60    48    40

Alternatively, ta_x{2} is the first order derivative of the gradient w * J, which is in ta_x{1}, along direction v. This is the derivative of the gradient w.r.t. x(3).

%                       FM-over-RM AD
%
% f(x) ------------- diff. in FM ---> df(x)/dx * v
%   |                                      |
% diff. in                             diff. in
%   RM                                    RM
%   |                                      |
%   v                                      v
% d w^T*f(x)/dx  --- diff. in FM ---> d w^T * (df(x)/dx * v) /dx
%                                     = d d w^T*f(x)/dx /dx * v

Verifying the results

We compute the Hessian and gradient with the plain ADiMat driver for Hessians with finite differences.

adopts = admOptions();
adopts.functionResults = {z};
[H grad] = admHessFD(fun, 1, x, adopts);
assert(isequal(H(:,3), ta_x{2}(:)))
assert(isequal(H(3,:), ta_x{2}))
assert(isequal(grad, ta_x{1}))

When passing both derivative direction vectors into admHessian it will perform exactly the same computation as we did here manually.

[Hv grad] = admHessian(fun, {a_z(:).', 1, v}, x, adopts);
assert(isequal(Hv, ta_x{2}(:)))