30 May, 2005

- The general approach
- The classes and functions
- ParameterSet
- VWD
- VWD2
- TAPE
- RVWD
- Numerical Integration
- Compiling
- Testing
- Files
- History
- To online documentation page

**Upgrading from a previous version? - see History
for changes.**

We want to calculate a value
*f*(*t*_{1},...,*t _{n}*) that depends on
parameters

The aim of the automatic differentiation package is to allow us to calculate
the derivates using a program that is just a minor modification of the program
that calculates *f*. This greatly reduces the work involved and greatly
increases the chances of getting the correct answer.

The present version of the automatic differentiation library
is a concept testing version. It really works, but is neither complete nor
very efficient. |

The major reference about * Automatic Differentiation* is

Griewank, Andreas (2000) *Evaluating derivatives: principles and
techniques of algorithmic differentiation.* SIAM, Philadelphia.

The VWD and VWD2 classes in my library represent what Griewank describes as
*forward* mode. I think RVWD represents *reverse* mode although the
organisation is slightly different.

At the beginning of the program you need to define the parameter set.
Suppose you have three parameters: *alpha*, *beta* and
*gamma*.

Then you define a parameter set object, PS, as follows.

ParameterSet PS; PS.AddParameter("alpha"); PS.AddParameter("beta"); PS.AddParameter("gamma");

The parameter set holds the names of the parameters and is used for simplifying moving the values in and out of arrays, printing results and making sure the user doesn't confuse two different parameter sets.

Do not try to define the same parameter set twice as the program will not
recognise these as the same parameter set. But you can copy parameter sets
either with `=` or as a parameter in a function.

Now consider the calculation loops of the program. Suppose, following the
convention of newmat10, you have *Real* variables *alpha*,
*beta* and *gamma*. Then, you need to define corresponding
*VWD* objects (assuming you want just first derivatives)

Real alpha, beta, gamma; ... put values in alpha, beta and gamma VWD Alpha(PS, alpha, "alpha"); VWD Beta(PS, beta, "beta"); VWD Gamma(PS, gamma, "gamma");

If you also want second derivatives use *VWD2* objects.

If you want to use *reverse* mode, use *RVWD* objects - only first
derivatives at present.

Then write your program at though you were just calculating *f*, but
use *Alpha*, *Beta* and *Gamma* in place of
*alpha*, *beta* and *gamma* and declare any intermediate
variables to be of type *VWD*. But be careful not to use *alpha*
when you mean to use *Alpha*.

Then both the value and the derivatives can be extracted from the result of your calculation.

At present you can use the following operators and functions with
*VWD*, *VWD2* and * RVWD* objects:

You must not add parameters to the parameter set after you
have defined VWD, VWD2 or RVWD objects based on it. |

ParameterSet();

the main constructor.

int AddParameter(const String& name);

add another parameter to the set.

int LocateParameter(const String& name) const;

find the index number of a parameter (the first one has value 1).

String& operator()(int i) const;

find the

namecorresponding to an index.

int Size() const;

the number of parameters.

friend bool operator==(const ParameterSet& ps1, const ParameterSet& ps2);

check that two parameter sets are copies of the same set.

friend bool operator!=(const ParameterSet& ps1, const ParameterSet& ps2);

check that two parameter sets are not copies of the same set.

friend void AssertEqual(const ParameterSet& ps1, const ParameterSet& ps2);

throw an exception if two parameter sets are not copies of the same set.

The *ParameterSet* object is really just a pointer to a
*ParameterSetClass* object that contains the real information. The
*ParameterSetClass* object contains a reference counter to see whether
anything is referencing it and will destroy itself when nothing references it.
Users should never refer directly to the *ParameterSetClass* object.

VWD();

default constructor. You can't use a VWD object created with the default constructor since the

ParameterSetis not defined. It must first be set equal to some other VWD object. You should be wary of using the default constructor in a global statement. This is intended to allow one to build arrays or lists of VWDs.

VWD(Real v);

construct a VWD object with value

v, derivatives = 0. This version creates an object that initially can be used only with+=. The parameter set value is set on the first use of+=.

VWD(const ParameterSet& ps, Real v);

construct a VWD object with parameter set

psand valuev, derivatives = 0.

VWD(const ParameterSet& ps, Real v, const RowVector& d);

construct a VWD object with parameter set

psand valuev, derivativesd.

VWD(const ParameterSet& ps, Real v, int k);

construct a VWD object with value

v, derivative withk-th index = 1, other derivatives = 0.

VWD(const ParameterSet& ps, Real v, const String& name);

construct a VWD object with value

v, derivative with respect to parametername= 1, other derivatives = 0.

Real GetValue() const;

return the value.

ReturnMatrix GetDerivatives() const;

return the derivatives (as RowVector).

Real GetDerivative(int i) const;

return the

i-th derivative.

ParameterSet GetParameterSet();

return the parameter set.

See the corresponding functions under VWD.

VWD2(); VWD2(Real v); VWD2(const ParameterSet& ps, Real v); VWD2(const ParameterSet& ps, Real v, const RowVector& d, const Matrix& d2); VWD2(const ParameterSet& ps, Real v, int k); VWD2(const ParameterSet& ps, Real v, const String& name); Real GetValue() const; ReturnMatrix GetDerivatives() const; // as RowVector ReturnMatrix GetSecondDerivatives() const; // as Matrix Real GetDerivative(int i) const; Real GetSecondDerivative(int i, int j) const; ParameterSet GetParameterSet();

In order to use the * reverse* mode one first sets up a *TAPE*
object.

TAPE(const ParameterSet& ps, int n, int m);

The parameter *n* defines the length of the tape and must be large
enough to contain details of each calculation (`+`, `-`,
`*`, `/`, `sin`, `cos`, etc) carried out in reverse
mode, but not so large that there is insufficient memory to store the tape. The
parameter *m* should be set equal to or larger than the number of
*VWD*s to be turned into *RVWD*s or *RVWD*s built directly - see
next section. (Ideally, the *TAPE* object should be
replaced by something like the STL *vector* class that can increase its
length during the course of a calculation.)

Note that you can add additional parameters to the parameter set after
defining the *TAPE* up until the first *RVWD* is defined.

The *TAPE* object is really just a pointer to a *TAPE_Class*
object that contains the real information. The *TAPE_Class* object
contains a reference counter to see whether anything is referencing it and will
destroy itself when nothing references it. Users should never refer directly to
the *TAPE_Class* object.

You use reverse mode by doing your calculations with *RVWD* objects
rather than *VWD* objects and then converting to a *VWD* object at
the end of the calculation. You start by setting up the initial *VWD*
objects as before and then building *RVWD* objects from them.

You can define a *RVWD* based on a *VWD* with the constructor:

RVWD(const TAPE& t, const VWD& vwd);

or you can build it directly:

RVWD(const TAPE& t, Real v, const String& name);

RVWD(const TAPE& t, Real v, int k);

You can also initialise a * RVWD* to a constant

RVWD(const TAPE& t, Real v);

There is also the default constructor, which like the default constructor for
*VWD* cannot be used until set equal to a valid value. There is also a
constructor `RVWD(Real)` which like `VWD(Real)` must initially be used with
`+=`.

Then *RVWD*s can be manipulated in the same way as *Real*s, just
as *VWD*s can be manipulated in the same way as *Real*s. Because the
information needed to recover the derivatives is stored on the *TAPE*
rather than being manipulated at each operation there is the potential for a much
faster operation at the expense of the *TAPE* requiring a large amount of
storage.

You can convert a *RVWD* back to a *VWD* with the constructor (or
automatic conversion)

VWD(const RVWD& rvwd);

Use this to get the final result of a calculation. This conversion is slow;
comparable in speed to the original calculation since it requires a sweep
through the *TAPE*. Hence the use of reverse mode is most effective when
you want only one or two numbers (and their derivatives) from your calculation.
Be careful not to cause accidental conversions from *RVWD* to *VWD*
through including *VWD* and *RVWD* objects in the same formula.

Alternatively you can recover the value and derivatives from a RVWD with

Real GetValue() const; ReturnMatrix GetDerivatives() const; // as RowVector

*GetDerivatives* uses the sweep through the TAPE so is slow in the same
way that converting to a *VWD* is slow.

When a calculation is complete you can delete all the information on the
*TAPE* object with the `TAPE::reset()` member function. Before this
function is called all *RVWD*s referring to the *TAPE* object must
either be
destroyed or assigned to a constant.

Alternatively use the `TAPE::purge()` member function to delete
obsolete material following a calculation. Use at the beginning of a new C++
block rather than at the end of an old one so that all unwanted *RVWD*s
will have been destroyed. Alternatively you can deactivate an unwanted
*RVWD* with a command like `rvwd = RVWD(Tape, 0)`.

If you are running more than one calculation in a program it is important to
run *reset* or *purge* or start a new tape between calculations.
Otherwise the reverse sweep operation will waste time by repeating the previous
calculations as well as the current ones.

Use `TAPE::purge(RVWD& rvwd)` when you have just
calculated a value for *rvwd* and a number of *RVWD*s required
to build *rvwd* have been destroyed. For example, *rvwd* has just
been returned from a function. This will recover the space on the tape used by
the destroyed *RVWD*s. To recover space you need to have tape-elements
corresponding to destroyed *RVWD*s followed just by the tape-elements
currently corresponding to *rvwd*.

The essence of effective use of reverse mode seems to be to avoid requiring
too much *TAPE* space and currently *purge* is the main tool in this
library for doing this.

You can see roughly how reverse mode - as I have implemented it -
works by looking at the corresponding code for `*`, `+`,
`sin` etc for the * VWD* and * RVWD* classes. In the
*VWD* class the derivatives are calculated immediately and involve linear
operations on the row vectors of derivatives; in the *RVWD* class the
multipliers of these derivatives are simply loaded onto the *TAPE*. When
the time comes to extract the derivative the
`TAPE_Class::reverse_sweep()` function is able to amalgamate the results
with a minimum of vector calculation.

Suppose the calculation of *f* involves a one dimensional numerical
integration. This package enables you simultaneously to calculate the integrals
of the derivatives. For example, we might want to find

2 Integral exp(ax + b) dx x=1

together with its derivatives with respect to *a* and *b*. (Of
course, in this case, we can do it all analytically).

At present I use 32 bit Gaussian numerical integration. This works remarkably well in many cases. However, it is up to you to tell whether it is appropriate for your problem (and you need to consider both the values and the derivatives).

Derive the function you need to integrate from the class *VWDOfReal*,
*VWD2OfReal* or *RVWDOfReal*. You will probably need a new
constructor and you will need to override *operator()()*.

Then the numerical integration is carried out by

VWD GaussianIntegration32(VWDOfReal& function, Real Lower, Real Upper);

or

VWD2 GaussianIntegration32(VWD2OfReal& function, Real Lower, Real Upper);

or

RVWD GaussianIntegration32(RVWDOfReal& function, Real Lower, Real Upper);

The program requires newmat and my string library.

You will need to #include *vwd.h* or *rvwd.h*.

The program has been tested under Borland C++ version 5.0, 5.5, 5.6 Microsoft C++ version 6,7, Gnu G++ 3.3, 3.4, Intel compilers, 8.1, for Windows and Linux, and Sun C++ version 6.

I have included make files for compiling the test program with a variety of compilers. You can use the genmake program for generating make files for a number of other compilers.

I have not implemented the clean-up functions required by my simulated exceptions package so if you are using my simulated exceptions there may be memory leaks if you throw an exception.

I have not set up the namespace options used in newmat.

A test program, *test_vwd.cpp*, is included. This uses templates so won't work with compilers
that don't allow simple templates. With templates, we can use the same source
code with Real, VWD, VWD2 and RVWD. The test program calculates the same expression
using two rather different functions and we can check that the answers are the
same. Output from Borland 5.6 is in *test_vwd.txt*. Note that some
compilers will give different printouts of the contents of the *tape*,
presumably because of a different order in which expressions are evaluated.

- 25 January, 1998: initial version of program
- 1 August, 1999: fix error in second derivative of
`Real-x` - 28 January, 2001: reverse automatic differentiation class,
*ValueWithDerivative*and*ValueWithDerivative2*changed to*VWD*and*VWD2*. Disable constructors`VWD(const ParameterSet& ps)`,`VWD2(const ParameterSet& ps)`; use something like`VWD(ps, 0.0)`instead. - 18 July, 2002: make compatible with my genmake automatic make file generator, fix warning messages from G++.
- 21 September, 2002: include default destructors for VWD, VWD2, RVWD.
- 30 May, 2005: VWD(Real) constructor

vwd.h | c++ definition file |

vwd.cpp | c++ bodies |

rvwd.h | c++ definition file - reverse AD |

rvwd.cpp | c++ bodies - reverse AD |

test_vwd.cpp | test program |

test_vwd.txt | output from test program |

autodiff.htm | this file |

rbd.css | style sheet used by autodiff.htm |

vwd_b55.mak | make file for use with Borland C++ 5.5 |

vwd_b56.mak | make file for use with Borland C++ 5.6 |

vwd_gnu.mak | make file for use with GCC |

vwd_cc.mak | make file for use with CC |

vwd_m6.mak | make file for use with Microsoft VC++ 6 and 7 |

vwd_i8.mak | make file for use with Intel C++ 8.1 for Windows |

vwd_il8.mak | make file for use with Intel C++ 8.1 for Linux |

vwd_ow.mak | make file for use with Open Watcom |