The ADiMat Handbook: Using ADiMat

4. Using ADiMat

It is assumed, that the reader is familiar with the general technique of automatic differentiation (AD). That is, the keywords independent and dependent variable and top-level function are familiar to him. Some recommended literature on the subject is the book Evaluating Derivative or the papers The History of AD or ADiFor user manual.

4.1 General description

ADiMat applies the source transformation approach of automatic differentiation to Matlab programs. A Matlab program consists of one or more m-files defining functions. The file that contains the starting point of the computation (the top level function) is called the master file. The computation of derivatives with ADiMat consists of two steps:

Transformation of the source code to produce differentiated code.
Evaluation of the transformed function to compute the actual derivative numbers.

For most users the best option should be to use the functions admDiffFor, admDiffVFor, and admDiffRev, described in the next section, which perform both steps automatically. However, both steps can be still done individually, which may be necessary when you want to use special features of ADiMat. If you are interested in the details, read the sections Source code transformation and Running the transformed code later in this chapter.

4.2 High level user interface

The Jacobian matrix of the top-level function, evaluated at certain arguments, is computed by the functions admDiffFor, admDiffVFor, and admDiffRev. These functions all have the same interface, and the expect the name of or a handle to the top level function, the seed matrix and the function arguments in that order. Thus you can easily switch between forward and reverse mode and experiment with different seed matrices.

In forward mode (FM) of AD, you can compute the product J*S of the Jacobian matrix J of a function f at arguments arg1, ..., argN and a seed matrix S. In theory, the computational expense (time and memory) depends linearly on the number of columns of S.

The following functions implement the forward mode of AD in ADiMat:

admDiffFor(@f, S, arg1, ..., argN, opts)

Classic FM implementation in ADiMat.

admDiffVFor(@f, S, arg1, ..., argN, opts)

New FM implementation in ADiMat.
In reverse mode (RM) of AD, you can compute the product S*J of a seed matrix S and the Jacobian matrix J of a function f at arguments arg1, ..., argN. In theory, the computational expense in time depends linearly on the number of rows of S. The memory required is in the order of the runtime of f, as all variable values that are overwritten have to be recorded on one of the stacks provided by ADiMat. You can use the directive recompute to trade off memory for runtime.

admDiffRev(@f, S, arg1, ..., argN, opts)

RM implementation in ADiMat.

Options can be given to the admDiff* routines using a structure constructed with the function admOptions. It must be given to them as the last argument. The function admOptions takes a sequence of name and value pairs, as shown in the following example:


opts = admOptions('independents', [1]);

Passing this options struct to admDiff* tells ADiMat to differentiate the function only with respect to the first parameter. Some names can also be abbreviated, e.g. 'i' can be used as short for 'independendents', and 'd' can be used for 'dependendents':


opts = admOptions('i', [1]);

For the full list of abbreviations refer to the online help of admOptions. You can also set items of the option struct by assigning to it:


opts.independents = [1];

Derivatives can also be computed with ADiMat using one of two numerical differentiation methods. Thus you can easily check AD results by comparing them to the results of these two non-AD methods. Both methods descibed below are conceptually comparable to the forward mode of AD, i.e. they return a product J*S of the Jacobian of f and the given seed matrix S.

The finite differences (FD) method, sometimes also called the divided differences method, is almost always applicable, except near discontinuities, but it produces only quite inaccurate derivatives, cf. Wikipedia, Numerical differentiation. You should use the option fdStep to set the step length h to the adequate value for your problem. The central FD method is used by default, you can also specify the forward or backward FD method via the option fdMode.

admDiffFD(@f, S, arg1, ..., argN, opts)

Implementation of the finite differences method in ADiMat.
The complex variable method was first divised by Lyness and Moler, and provides accurate derivatives, but it is only applicable then the function f is real analytic, cf. Wikipedia, Complex variable methods.

admDiffComplex(@f, S, arg1, ..., argN, opts)

Implementation of the complex variable method in ADiMat.

With these two functions, you may also differentiate anonymous or lambda functions, as the function source code is not required. For example:


J = admDiffFD(@(x,y) atan2(x, y), 1, 1, 2)

4.3 Higher order derivatives

ADiMat can, to a limited extend, compute higher order derivatives as well. Some ADiMat interface compute univariate Taylor coefficients, and there is also a driver to compute Hessian matrices. In this section we consider the function polynom:


function r = polynom(x, c)

Taylor-Coefficients

The function admTaylorFor can compute exact Taylor coefficients. By setting option derivOrder, you can specify which Taylor coefficients to compute. Note that all Taylor coefficients up to the maximum derivOrder will be computed, but only those requested will be returned. admTaylorFor uses an operator overloading (OO) approach, so there is no source transformation involved.


Taylor1_10 = admTaylorFor(@polynom, 1, 2, [1 1 1 1 1], admOptions('o', 1:10))

The Taylor objects that admTaylorFor uses can also be used manually, for more information see the example Using Taylor objects. Another version of the function is admTaylorVFor, which uses a similar AD approach as admDiffVFor:


Taylor1_10 = admTaylorVFor(@polynom, 1, 2, [1 1 1 1 1], admOptions('o', 1:10))

admDiffFD and admDiffFor can also compute Taylor coefficients. admDiffFD can compute up to fourth order Taylor coefficients. Note that admDiffFD will only compute the derivative orders requested. admDiffFor can compute second order Taylor coefficients. Note that this is implemented by computing the full Hessian matrix and then returning only the diagonal.

Hessians

Hessians using admHessian

admDiffHessian is the general driver for computing a Hessian matrix. You can use it as follows:


adopts = admOptions;
H = admHessian(@polynom, 1, x, c, adopts)
H = admHessian(@polynom, W, x, c, adopts)
H = admHessian(@polynom, {Y, W}, x, c, adopts)
H = admHessian(@polynom, {'s', W}, x, c, adopts)
H = admHessian(@polynom, {Y, V, W}, x, c, adopts)
H = admHessian(@polynom, {'s', V, W}, x, c, adopts)

It can use one of two basic strategies, which are selected by the option hessianStrategy in the options structure that may be given as the last argument:

't1rev': Forward over reverse mode
't2for': Interpolation of second order Taylor coefficients
'for2': Second order forward mode

The first is the default. It uses the same OO approach as admTaylorFor, but it runs the function differentiated in RM, not the original function. This approach can compute the full Hessian with an overhead of O(n).

The second strategy, 't2for' relies on one of functions that can compute second order Taylor-coefficients. The option admDiffFunction controls which underlying function is used to compute the required univariate Taylor coefficients. Possible options are admTaylorFor, admTaylorVFor, and admDiffFD.

The third strategy, 'for2' invokes the function admDiffFor2 which computes second order derivatives by running a code produced by applying the FM ST twice. This mode can be either used with the 'scalar_directderivs' RE, which results in strip-mining in scalar mode, or you can use one of the derivative classes 'opt_derivclass' or 'opt_sp_derivclass', which will use vector mode.

The second argument specifies the seed matrices. It may either be a single matrix W, or a cell array of two matrices {Y, W}, or a cell array of three matrices {Y, V, W}. As usual, a scalar 1 is a shortcut for a conforming identity matrix. Matrices that are not specified are also treated as if a scalar 1 was given.

If we first consider a scalar function, and let Y = 1, then admHessian will return the matrix product V^T * H * W. That is, V must be a (n x p) matrix and W must be a (n x q) matrix. With strategy t1rev, such a Hessian-matrix product has an overhead of O(q), and with strategy t2for or for2 it has an overhead of O(p*q). Strategy t2for has an upper limit of costs as O(n*(n+1)/2), as these are the costs for computing a full Hessian.

When the function is vector valued, the result will be the product V^T * H * W of the component Hessians H of the m output components arranged in a (p x q x m) tensor array. With strategy t1rev, the overhead is then O(q*m), and with strategy t2for or for2 the overhead is O(p*q).

The second argument may also be a cell array of three matrices {Y, V, W}. In this case, V and W are as above and Y must be a (r x m) matrix, the so called adjoint weight matrix. The result will then be a tensor of dimensions (p x q x r) with the r weighted sums of the component Hessians. In particular, if the adjoint weight matrix is a row vector of length m, then admHessian will compute a weighted sum of the Hessians. With strategy t1rev, the overhead is then O(q*r), and with strategy t2for or for2 the overhead is O(p*q).

A special value for Y is the string 's', or 'sum', which expands to a row vector of ones, that is, admHessian will then compute the sum of Hessians. An alternative way to set a adjoint weight matrix Y is to use the option field seedRev.

Setting a adjoint weight matrix is particularly interesting when the sum of component Hessians weighed by Lagrangian multipliers is desired, such as it is required by the fmincon solver. In this case, it is not necessary to construct a function for multiplying the contraints by the Lagrangian multiplier. Just set the adjoint weight matrix to the Lagrangian weights and differentiate the original constraints function. This has the advantage that you obtain both the Jacobian of the original function and the Hessian of the Lagrangian in one go and you can exploint the efficiency of the reverse mode.

Shortcut Hessian drivers

While admHessian defaults to the FM-over-RM mode, you have to set some options to get the other modes described in the previous section. Therefor we provide some convenience drivers, for the case of hessianStrategy set to t2for, as follows:

admHessFor: admDiffFunction set to admTaylorFor
admHessVFor: admDiffFunction set to admTaylorVFor
admHessFD: admDiffFunction set to admDiffFD

For hessianStrategy set to for2, you can also invoke admDiffFor2 directly.

Finally admHessRev is the driver for Hessians in forward-oer-reverse mode, which is the default strategy of admHessian.

Hessians by recursive application of admDiff

You can use the admDiff* functions recursively. Recall the first order invocation, evaluating first order derivatives of the polynom with coefficients c = [1 1 1 1 1] at x = 2:


Jacobian = admDiffFD(@polynom, 1, 2, [1 1 1 1 1])

You have to tell the outer invocation to differentiate the inner call only with the respect the function arguments. Thus, the recursive application for second order derivatives, using admDiffComplex as the outer differentiation function would be as follows:


Hessian = admDiffComplex(@admDiffFD, 1, @polynom, 1, 2, [1 1 1 1 1], admOptions('i', [3 4]))

If you also want to have the Jacobian and the function results, tell the outer admDiff* function how many outputs to expect from the inner, but also to only consider the first for differentiation:


[Hessian, Jacobian, r] = admDiffComplex(@admDiffFD, 1, @polynom, 1, 2, [1 1 1 1 1],
                               admOptions('i', [3 4], 'nargout', 3, 'd', 1))

The following combinations of admDiff* functions are possible:

FD over FD, Complex, For, VFor, or Rev
Complex over FD, For, VFor, or Rev

You cannot use one of the AD-functions on top of each other, as ADiMat can usually not differentiate the code it produces. Complex over Complex does not work as the function to differentiate has to be real analytic.

When using FD over FD, you usually have to increase the step length, to approximately h = sqrt(sqrt(eps)), when the function arguments are about one:


admDiffFD(@admDiffFD, 1, @polynom, 1, 2, [1 1 1 1 1], admOptions('i', [3 4], 'fdStep', sqrt(sqrt(eps))))

You can compute products of the Hessian times a matrix or vector. You can pass the latter to either invocation:


Hv = admDiffFD(@admDiffFD, v, @polynom, 1, 2, [1 1 1 1 1], admOptions('i', [3 4], 'fdStep', 0.001))
Hv = admDiffFD(@admDiffFD, 1, @polynom, v, 2, [1 1 1 1 1], admOptions('i', [3 4], 'fdStep', 0.001))

If you use AD as the inner differentiation function, use the first option, as then the AD process will be run just once.

Taylor-Coefficients in Reverse Mode

The function admTaylorRev can propagate Taylor coefficients across the function differentiated in RM. This is the same as computing a Hessian with hessianStrategy set to t1rev, only that option derivOrder is set to something larger than 1. admTaylorRev uses an operator overloading (OO) approach for the Taylor coefficients, just like admTaylorFor.


dTaylor1_10 = admTaylorRev(@polynom, 1, 2, [1 1 1 1 1], admOptions('o', 1:10))

The same kind of derivatives can be obtained with ADOL-C or CppAD by performing FM passes up to order M = max(derivOrder) followed by a single RM pass over the tape.

The second argument can be a single seed matrix S, or a cell array of the two seed matrices Y and S. S represents the first order derivatives of the inputs and Y the first order adjoints of the outputs. When S is an (n x q) matrix and Y a (p x m) matrix, then the ouput will be a (n x q x p x M) tensor. As usual, a scalar one stands for a conforming identity matrix. In particular, setting the second argument to 1 or to {1, 1}, the result will be a (n x q x m x M) tensor.

With derivOrder = 1 admTaylorRev returns the same results as admHessian. In particular, setting derivOrder = 1, and Y = ones(1, m) will yield the sum of the Hessian matrices of the outputs.

4.4 Source code transformation

The source transformation step is usually performed implicitly by one of the high level interface functions. They will check whether the required differentiated function exists and is up-to-date. Otherwise they use the function admTransform to produce the differentiated code.

The following sections explains how to perform this step manually.

Source transformation from within Matlab

From within Matlab, differentiated code can be produced by the function admTransform. For forward mode transformation is done using the command:


admTransform(@lighthouse)

Otherwise, you have to specify the desired transformation mode using the options structure. For the alternative forward mode transformation use:


admTransform(@lighthouse, admOptions('m', 'f'))

For the reverse mode transformation use:


admTransform(@lighthouse, admOptions('m', 'r'))

Source transformation from the command line

The transformation can also be done from the command line, using the command adimat-client (or admproc or admproc-bin, if present). This requires that the environment variable ADIMAT_HOME is set, see the installation section. The filename of the master file is given as the command line argument:


adimat-client lighthouse.m

This produces code differentiated in forward mode, which is written to the file g_lighthouse.m. The adimat command currently produces many errors and warning messages which can for the most part be ignored. If the result file is generated and the command exits with success everything should be fine.

An alternative forward mode transformation can be done using the -f switch, which produces the file d_lighthouse.m..


adimat-client -f lighthouse.m

Reverse mode differentiation of code can be done using the -r switch, which produces the file a_lighthouse.m.


adimat-client -r lighthouse.m

4.5 Running the transformed code

As a second step, the differentiated code must be run to compute the derivative for some input arguments. This should also preferably be done using the high level interface functions. In the following section describe the steps to perform this process manually. You can also read this section to gain a better understanding into the inner workings of the high level interface functions.

In this section we consider the function f:


function [y z] = f(a, b)

Selecting a runtime environment

In order to run differentiated code, you must first select a suitable runtime environment. This is done using the function adimat_derivclass, for example like this:


adimat_derivclass('opt_derivclass')

The following combinations of differentiated functions and runtime environments are possible:

Functions differentiated in forward mode (those with prefix g_): arrderivclass, arrderivclassvxdd, opt_derivclass, opt_sp_derivclass, mat_derivclass, or scalar_directderivs.
Functions differentiated in alternative forward mode (those with prefix d_): vector_directderivs.
Functions differentiated in reverse mode (those with prefix a_): arrderivclass, arrderivclassvxdd, opt_derivclass, opt_sp_derivclass, mat_derivclass, or scalar_directderivs.

Using any of the runtime environments, except scalar_directderivs, enables you to use vector mode, i.e. the differentiated function is run just once, computing derivatives along all derivative directions, which are given by the columns of the seed matrix, at once.

Using the scalar_directderivs runtime environment will use scalar mode, i.e. only a single directional derivative can be computed with each invocation of the differentiated function. The high level interface functions will automatically revert to strip mining, i.e. run the differentiated function once for each derivative direction specified. If you do the derivative evaluation step manually you have to take care of this yourself.

Using one of the runtime environments opt_derivclass, opt_sp_derivclass, or mat_derivclass, the derivative variables will be objects of the derivative class contained in the respective runtime environment. Using these classes tends to be rather slow because of the overloaded operator resolution at runtime. However, amortization starts when a large number of directional derivatives are compted.

Using the runtime environments scalar_directderivs or vector_directderivs, the derivative variables will be of native arrays of type double.

In Octave since version 3.6 classes are supported, and hence also the ADiMat derivative classes can be used. In older versions only the runtime environments scalar_directderivs and vector_directderivs are available.

Running forward mode functions

To run one of the functions differentiated in forward mode, you have to perform the following steps:

Create derivative input arguments from the function arguments
Run the differentiated function
Extract derivatives from the derivative output arguments

For the first step, you should use one of the functions createFullGradients or createSeededGradientsFor. If you want to compute the full Jacobian, i.e. use the conforming identity matrix as the seed matrix, you can use createFullGradients:


[g_a g_b] = createFullGradients(a, b);

If you want to specify a seed matrix S, use:


[g_a g_b] = createSeededGradientsFor(S, a, b);

For the second step, running the differentiated function, place the derivative arguments created in the previous step before the respective arguments in the input argument list, and do likewise for derivative outputs:


[g_y y g_z z] = g_f(g_a, a, g_b, b);

If in doubt, consult the function signature in the file g_f.m or d_f.m.

Finally you can extract the computed derivatives from the derivative outputs. It is recommended to do this using the function admJacFor:


Jacobian = admJacFor(g_y, g_z);

You can also manually extract the derivative values from the derivative outputs. If you are using one of the derivative classes, use curly brace indexing. To extract the i-th directional derivative of y from g_y, use the following:


dirder = g_y{i};

You can also extract all derivative values at once using the : wildcard index. This will concatenate all directional derivatives into one large array. However, the ordering of the derivative values may not be as you expected and it may differ depending on the derivative class used.


dvals = [g_y{:}];

When you used the alternative forward mode and the vector_directderivs runtime environment, the i-th directional derivative is contained in the i-th slice along the first dimension of the array. The expression d_y(i, :) will always return all those values in d_y in the form of a row vector, regardless of the number of dimensions of d_y. You can then use reshape to make the directional derivative have the same shape as y:


dirder = reshape(d_y(i, :), size(y));

Running reverse mode functions

To run a function differentiated in reverse mode, you have to perform the following steps:

Create derivative input arguments from the function results
Run the differentiated function
Extract derivatives from the derivative output arguments

In the first step you create adjoint variables of the function's results. These will be given as input arguments to the adjoint (reverse mode differentiated) function.

Thus, for the first step, you need the results that the function produces at the desired input arguments. Actually only the size and shape of the results is required, but for simplicity just run the original function once:


[y z] = f(a, b);

Using one of the high level interface functions, you should pass the function results via the option field functionResults, otherwise admDiffRev will automatically run the function to obtain this information.

Now you can use either createFullGradients or createSeededGradientsRev to construct the adjoint arguments. If you want to compute the full Jacobian, i.e. use the conforming identity matrix as the seed matrix, you can use createFullGradients:


[a_y a_z] = createFullGradients(y, z);

If you want to specify a seed matrix S, use:


[a_y a_z] = createSeededGradientsRev(S, y, z);

For the second step, running the differentiated function, place the derivative arguments created in the previous step at the end of the orignal function's argument list. The adjoint outputs come first in the adjoint function's output list, followed by the function results. Thus, run the adjoint function like this:


[a_a a_b y z] = a_f(a, b, a_y, a_z);

If in doubt, consult the function signature in the file a_f.m.

The derivate output arguments a_a and a_b are the adjoints of the function's input parameters, and thus contain the desired derivative values. You can extract the computed derivatives from them using the function admJacRev:


Jacobian = admJacRev(a_a, a_b);

If you want to extract the derivatives manually, refer to the description in the previous secion: here.

4.6 Applying AD to an example

The above steps are applied to an example code now, the lighthouse example which is introduced in the book by A. Griwank: A. Griewank: Evaluating derivatives. The function describing the coordinate where the beam of a lighthouse hits a quay wall is given by:


function y=lighthouse(nu, gamma, omega, t)
% The lighthouse example from the book:
% A. Griewank "Evaluating Derivatives" SIAM 2000 p. 16
y(1) = (nu * tan(omega*t))/(gamma-tan(omega*t));
y(2) = gamma*(nu * tan(omega*t))/(gamma-tan(omega*t));

You could create function input parameters and run the function as follows:


n = 10; % (m)
g = 0.375* pi; % (bogenmass)
o = 0.0001* pi; % (bogenmass)
t = 2; % (s)
y = lighthouse(n, g, o, t);

Look into the book for a more detailed description.

Example using the high-level interface

Using the high level interface you can obtain the derivative in forward mode by entering one of the following two commands:


[J y] = admDiffFor(@lighthouse, 1, n, g, o, t);
[J y] = admDiffVFor(@lighthouse, 1, n, g, o, t);

In forward mode, four derivative directions are needed, as there are a total of four components in the four scalar inputs. The resulting Jacobian matrix in J will have two rows and four columns.

If you want to specify a seed matrix, pass it as the second argument. The seed matrix must have four columns. The same result as in the previous example can be obtained by manually passing the four-by-four identity matrix:


S = eye(4);
[J y] = admDiffFor(@lighthouse, S, n, g, o, t);

If you wanted to compute just the first column of the Jacobian, you could pass a single column vector of length four, setting its first argument to one:


S = [1 0 0 0]';
[J y] = admDiffFor(@lighthouse, S, n, g, o, t);

However, in this case it is better to specify that only the first parameter shall be independent, as the derivative code will by shorter:


opts = admOptions('i', 1);
[J y] = admDiffFor(@lighthouse, 1, n, g, o, t, opts);

In both cases the the resulting J will be a two-by-one matrix, the first column of the full Jacobian.

For the reverse mode using the high level interface, type in the following command:


[J y] = admDiffRev(@lighthouse, 1, n, g, o, t);

In reverse mode, only two derivative directions are needed, as there are a total of two components in the single output variable y. The resulting Jacobian matrix in J will of course again have two rows and four columns.

Again, if you want to specify a seed matrix, pass it as the second argument. The seed matrix must have two rows. The same result as in the previous example can be obtained by manually passing the two-by-two identity matrix:


S = eye(2);
[J y] = admDiffRev(@lighthouse, S, n, g, o, t);

If you wanted to compute just the first row of the Jacobian, you could pass a single row vector of length two, setting its first argument to one:


S = [1 0];
[J y] = admDiffRev(@lighthouse, S, n, g, o, t);

In this case you cannot use the dependent option to do this, as there is only a single output parameter.

Example doing the differentiation steps manually

You can also compute the full Jacobian in forward mode by running the following commands:


admTransform(@lighthouse); % produces g_lighthouse.m
adimat_derivclass('opt_derivclass'); % select runtime environment
% adimat_derivclass('arrderivclass'); % possible alternative
S = eye(4); % create seed matrix
[g_n g_g g_o g_t] = createSeededGradientsFor(S, n, g, o, t); % create derivative inputs
[g_y y] = g_lighthouse(g_n, n, g_g, g, g_o, o, g_t, t); % run the differentiated function
J = admJacFor(g_y); % extract derivative values

Using the alternative forward mode implementation, this becomes:


admTransform(@lighthouse, admOptions('m', 'f')); % produces d_lighthouse.m
adimat_derivclass('vector_directderivs'); % select runtime environment
S = eye(4); % create seed matrix
[d_n d_g d_o d_t] = createSeededGradientsFor(S, n, g, o, t); % create derivative inputs
[d_y y] = d_lighthouse(d_n, n, d_g, g, d_o, o, d_t, t); % run the differentiated function
J = admJacFor(d_y); % extract derivative values

Using the reverse mode, you need to run the following commands:


admTransform(@lighthouse, admOptions('m', 'r')); % produces a_lighthouse.m
adimat_derivclass('opt_derivclass'); % select runtime environment
% adimat_derivclass('arrderivclass'); % possible alternative
y = lighthouse(n, g, o, t); % run the original function (get y)
S = eye(2); % create seed matrix
[a_y] = createSeededGradientsRev(S, y); % create derivative inputs (adjoint of y)
[a_n a_g a_o a_t y] = a_lighthouse(n, g, o, t, a_y); % run the differentiated function
J = admJacRev(a_n, a_g, a_o, a_t); % extract derivative values

Example using the scalar mode manually

In this example we will do as many steps as possible without using the runtime environment functions. You can compute a certain directional derivative along the direction four-vector [a b c d] in scalar mode by running the following commands:


admTransform(@lighthouse); % produces g_lighthouse.m
adimat_derivclass('scalar_directderivs'); % select runtime environment for scalar mode
g_n = zeros(size(n)); % create zero derivative inputs
g_g = zeros(size(g)); % create zero derivative inputs
g_o = zeros(size(o)); % create zero derivative inputs
g_t = zeros(size(t)); % create zero derivative inputs
g_n(1) = a; % set derivative direction
g_g(1) = b; % set derivative direction
g_o(1) = c; % set derivative direction
g_t(1) = d; % set derivative direction
[g_y y] = g_lighthouse(g_n, n, g_g, g, g_o, o, g_t, t); % run the differentiated function
dirder = g_y(:); % extract derivative values

Using the reverse mode, you can compute a certain reverse directional derivative along the direction two-vector [a b] in scalar mode by running the following commands:


admTransform(@lighthouse, admOptions('m', 'r')); % produces a_lighthouse.m
adimat_derivclass('scalar_directderivs'); % select runtime environment for scalar mode
y = lighthouse(n, g, o, t); % run the original function (get y)
a_y = zeros(size(y)); % create zero derivative inputs
a_y(1) = a; % set derivative direction
a_y(2) = b; % set derivative direction
[a_n a_g a_o a_t y] = a_lighthouse(n, g, o, t, a_y); % run the differentiated function
rdirder = [a_n a_g a_o a_t]; % extract derivative values

If you want to compute the gradient of a scalar function f(a, b, c), simply stick a one into the adjoint input parameter:


admTransform(@f, admOptions('m', 'r')); % produces a_f.m
adimat_derivclass('scalar_directderivs'); % select runtime environment for scalar mode
[a_a a_b a_c z] = a_f(a, b, c, 1); % run the differentiated function
gradient = [a_a(:).' a_b(:).' a_c(:).'];                        % concatenate derivative values into gradient

4.7 Using sparsity exploitation or compression

In first order FM or RM, when the Jacobian matrix is sparse, i.e. it has a lot of zero entries, then the number of directional derivatives can often be reduced drastically, which is called sparsity exploitation. This works by compressing the Jacobian matrix. The non-zero pattern of the Jacobian is required for this.

You can obtain the non-zero pattern P approximately by first computing the full Jacobian J and then setting P = J ~= 0. This is however slightly dangerous, because some entries in J may just happen to be zero at the particular point where you evaluated the derivative, but be non-zero at other points. You can be conservative in choosing P, i.e. mark more components of J as non-zeros than necessary.

Using sparsity exploitation with the high-level interface

Using the high-level interface, you have to set the option field JPattern to the non-zero pattern P. Then you can run admDiffFor as usual, compression will automatically be used:


opts.JPattern = P;
admDiffFor(@f, 1, arg1, ..., argN, opts)

The same thing works with admDiffVFor, admDiffRev, admDiffFD, and admDiffComplex. The example above will implicitly use the conforming identity matrix as the seed matrix, and compress it using the coloring heuristic function cpr. This results in a compressed seed matrix, which often has significantly fewer columns (fewer rows in RM), thereby allowing for a faster derivative computation. The results is a compressed Jacobian, which will automatically be unpacked to a sparse matrix, identical to J.

The coloring heuristic cpr comes with ADiMat and implements the Curtis-Powell-Reed heuristic. You can replace this heuristic by another function with a similar signature.

You can also use sparsity exploitation on top of a custom seed matrix S. Also, you can set a custom coloring function using the option coloringFunction.


opts.JPattern = P;
opts.coloringFunction = 'mycoloring';
admDiffFor(@f, S, arg1, ..., argN, opts)

Using sparsity exploitation manually

If you compute derivatives repeatedly, but with a fixed non-zero pattern, you may want to factor out the coloring and seed matrix compression process from the differentiation process. This can be done as follows:


[cS coloring] = admColorSeed(P, opts); % construct compressed seed matrix
cJac = admDiffFor(@f, cS, arg1, ..., argN, opts); % run differentiation
Jac = admUncompressJac(cJac, P, coloring); % unpack the compressed Jacobian

The same thing works with admDiffVFor, admDiffFD, and admDiffComplex. You can also run the transformed code manually. You could even use an entirely different FM AD tool to compute the compressed Jacobian in the second line instead of ADiMat.

In RM, however, you have to do something slightly different. Instead of compressing columns of the seed matrix, you have to compress rows. This can be achieved as follows, using transpose operations:


[cS coloring] = admColorSeed(P.');                       % construct compressed seed matrix
cJac = admDiffRev(@f, cS.', arg1, ..., argN, opts);      % run differentiation
Jac = admUncompressJac(cJac.', P.', coloring); % unpack the compressed Jacobian
Jac = Jac.';                                             % final transpose

Next Previous Contents