Forward-over-reverse mode with reverse mode source differentiation and Taylor objects

ADiMat has a source code transformation for reverse mode differentiation which can evaluate gradients and vector-Jacobian products efficiently. By combination with the class tseries implementing truncated Taylor series by operator overloading this becomes a tool for computing derivatives of truncated Taylor series. A very common special case is obtained by setting the truncation order of the Taylor objects to one. In scalar mode, this produces a Hessian-vector evaluator. The Taylor objects compute a Jacobian-vector product and the reverse mode then gives the derivative of that w.r.t. all independent variables, which is the corresponding Hessian-vector product.

Differentiating in reverse mode
Setting up the Taylor class
Creating Taylor Objects for the inputs
Seeding: Initializing coefficients of independent variables
Running the function
Interpreting the results
Verifying the results

Differentiating in reverse mode

Differentiate the function in question in reverse mode. Lets consider the function fprod.m

type fprod
fun = @fprod
admTransform(@fprod, admOptions('m', 'r'))

function z = fprod(a)
  ind = repmat({':'}, [length(size(a)) 1]);
  dim = 2;
  ind{dim} = 1;
  z = a(ind{:});
  for i=2:size(a, dim)
    ind{dim} = i;
    z = z .* a(ind{:});
  end

fun = 
    @fprod
ans =
     1

This produces a function a_fprod which can be used to compute the gradient of the function. It has two inputs x and a_z, the latter of which must be an array of the same shape as z and must be filled with a reverse derivative direction y. The reverse mode code will compute y * J, where J is the Jacobian. Here the function result z is scalar, J is a gradient, so we just set a_z to one. If z was not scalar, we would compute a vector-Jacobian product y * J instead.

adimat_derivclass scalar_directderivs
adimat_adjoint default
x = 1:6;
a_z = 1;
[a_x z] = a_fprod(x, a_z)

a_x =
   720   360   240   180   144   120
z =
   720

Setting up the Taylor class

This is enabled by the adimat_adjoint command.

adimat_adjoint taylor2

Creating Taylor Objects for the inputs

The differentiated function has two input parameters, x and a_z. Both are now to become Taylor objects. We set the truncation order to one.

set(tseries2(), 'maxorder', 1);

tx = tseries2(x);
ta_z = tseries2(a_z);

A Taylor object contains the coefficients of the Taylor series up to the truncation order. The coefficients can be accessed by cell-style indexing with braces {}. The k-th entry holds the coefficients of order k-1. For example tx{1} holds the zeroth order coefficient, which is the value and tx{2} holds the first derivative, which is zero be default.

tx{1}
tx{2}

ans =
     1     2     3     4     5     6
ans =
     0     0     0     0     0     0

Seeding: Initializing coefficients of independent variables

To perform this derivative computation, we have to initialize some of the derivatives, namely those of the input x. We fill these with a derivative direction vector v. Let's choose here the third canonical unit vector.

v = zeros(numel(x), 1);
v(3) = 1;
tx{2}(:) = v;

ta_z could also be considered a derivative. It holds the adjoint vector y, in the zero-order (value) coefficient, just as a_z does. The first order coefficient of ta_y is zero.

Running the function

Now we run the function with the Taylor objects.

[ta_x tz] = a_fprod(tx, ta_z)

ta_x = 
	tseries2 object: 1-by-6
tz = 
	tseries2 object: 1-by-1

Both function results are now also Taylor objects.

Interpreting the results

What do we have obtained now? Well to the easy parts first: We know that in the zero-order fields of the outputs ta_x and tz we will find the same values as in a_x and z. So, in tz{1} the function result, and in a_x{1} the gradient, that is y * J.

assert(isequal(tz{1}, z))
assert(isequal(ta_x{1}, a_x))

Also, we should find the Jacobian-vector product J * v in the first order coefficient of tz. This is the result of the plain forward mode through the Taylor objects. Here, we should find the derivative of z w.r.t. x(3).

tz{2}

ans =
   240

This leaves us with ta_x{2}, the first order components of ta_x. It is the reverse mode derivative of J*v, which is in z{2}, along direction w. Here, we get the third column (or row) of the Hessian.

ta_x{2}

ans =
   240   120     0    60    48    40

Alternatively, ta_x{2} is the first order derivative of the gradient w * J, which is in ta_x{1}, along direction v. This is the derivative of the gradient w.r.t. x(3).

%                       FM-over-RM AD
%
% f(x) ------------- diff. in FM ---> df(x)/dx * v
%   |                                      |
% diff. in                             diff. in
%   RM                                    RM
%   |                                      |
%   v                                      v
% d w^T*f(x)/dx  --- diff. in FM ---> d w^T * (df(x)/dx * v) /dx
%                                     = d d w^T*f(x)/dx /dx * v

Verifying the results

We compute the Hessian and gradient with the plain ADiMat driver for Hessians with finite differences.

adopts = admOptions();
adopts.functionResults = {z};
[H grad] = admHessFD(fun, 1, x, adopts);
assert(isequal(H(:,3), ta_x{2}(:)))
assert(isequal(H(3,:), ta_x{2}))
assert(isequal(grad, ta_x{1}))

When passing both derivative direction vectors into admHessian it will perform exactly the same computation as we did here manually.

[Hv grad] = admHessian(fun, {a_z(:).', 1, v}, x, adopts);
assert(isequal(Hv, ta_x{2}(:)))