Next: Multiparameter errors
Up: Interpretation of the errors
Previous: Function normalization and ERROR
  Index
Subsections
Non-linearities: MIGRAD versus HESSE versus MINOS
In the theory of statistics, one can show that in the asymptotic
limit, any of several methods of determining parameter errors are
equivalent and will give the same result. Let us for the moment call
these methods MIGRAD, HESSE, and
MINOS (SIMPLEX is a special case). It
turns out that the conditlons under which these methods yield exactly
the same errors are either of the following:
- 1.
- The model to be fitted (y or f) is exactly a linear function of the
fit parameters a, or
- 2.
- The amount of observed data is infinite.
It may happen that (1) is satisfied, in whlch case you don't really
need Minuit, a smaller, simpler, and faster program would do, since
a linear problem can be solved directly without iterations (see [5],
p. 163-165), for example with CERN library program LSQQR.
Nevertheless, it may be convenient to use Minuit slnce non-linear
terms can then be added later if desired, without major changes to
the method. Condition (2) is of course never satisfied, although in
practice it often happens that there is enough data to make the
problem ``almost linear'', that is there is so much data that the
range of parameters allowed by the data becomes very small, and
any physical function behaves linearly over a small enough region.
The following sections explain the dirrerences between the various
parameter errors given by Minuit.
The errors printed by Minuit at any given stage represent the best
symmetric error estimates available at that stage, which may not
be very good. For example, at the first entry to FCN, the user's step
slzes are given, and these may bear no resemblance at all to proper
parameter errors, although they are supposed to be order-of-magnltude estimates.
After crude minimizers like SEEK or SIMPLEX,
a revised error estimate may be given, but this too is only meant to
be an order-or-magnitude estimate, and must certainly not be
taken seriously as a physical result. Such numbers are mainly for
the internal use of Minuit, which must after all assume a step size
for future minimizations and derivative calculations, and uses these
``errors'' as a first guess to be modified on the basis of experience.
The minimizing technique currently implemented in MIGRAD is a
stable variation (the ``switching'' method) of the
Davidon-Fletcher-Powell variable-metric algorithm.
This algorithm converges to the correct error matrix as it converges
to the function minimum.
This algorithm requires at each step a ``working approximation'' of the
error matrix, and a rather good approximation to the gradient vector at
the current best point.
The starting approximation to the error matrix may be obtained in different
ways, depending on the status of the error matrix before MIGRAD is called
as well as the value of STRATEGY. Usually it is found to be advantageous
to evaluate the error matrix rather carefully at the start point in order
to avoid premature convergence, but in principle even the unit matrix
can be used as a starting approximation.
Usually the Minuit default is to start by calculating the full error matrix by
calculating all the second derivatives and inverting the matrix.
If the user wants to make sure this is done, he can call HESSE before MIGRAD.
If a unit matrix is taken to start,
then the first step will be in a steepest descent direction, which is
not bad, but the estimate of EDM, needed to judge convergence, will be poor.
At each successive step, the information gathered from the change of gradient
is used to improve the approximation to the error matrix, without the need
to calculate any second derivatives or invert any matrices.
The algorithm used for this updating is supposed to be the best known,
but if there are a lot of highly correlated parameters, it may take many steps
before the off-diagonal elements of the error matrix approach the correct values.
In practice, MIGRAD usually yields good estimates of the
error matrix, but it is not absolutely reliable for two reasons:
- 1.
- Convergence to the minimum may occur ``too fast'' for MIGRAD to
have a good estimate of the error matrix. In the most flagrant of
such cases, MIGRAD realizes this and automatically introduces an
additional call to HESSE (described below), informing the user that
the covariance matrix is being recalculated. Since, for n variable
parameters, there are n(n + 1)/2 elements in the error matrix, the
number of FCN calls from MIGRAD must be large compared with
n2 in order for the MIGRAD error matrix calculation to be
reliable.
- 2.
- MIGRAD gathers information about the error matrix as it
proceeds, based on function values calculated away from the
minimum and assuming that the error matrix is nearly constant as a
function of the parameters, as it would be if the problem were
nearly linear. If the problem is highly non-linear, the error matrix
will depend strongly on the parameters, MIGRAD will converge more
slowly, and the resulting error matrix will at best represent some
average over the last part of the trajectory in parameter-space
traversed by MIGRAD.
If MIGRAD errors are wrong because of (1),
HESSE should be commanded after MIGRAD
and will give the correct errors. If MIGRAD
errors are wrong because of (2), HESSE will help, but only in an
academic sense, since in this case the error matrix is not the whole
story and for proper error calculation MINOS must be used.
As a general rule, anyone seriously interested in the parameter
errors should always put at least a HESSE command after each
MIGRAD (or MINIMIZE) command.
HESSE simply calculates the full second-derivative matrix by finite
differences and inverts it. It therefore calculates the error matrix
at the point where it happens to be when it is called. If the error
matrix is not positive-definite, diagnostics are printed, and an
attempt is made to form a positive-definite approximation. The
error matrix must be positive-definite at the solution (minimum)
for any real physical problem. It may well not be positive away from
the minimum, but most algorithms including the MIGRAD algorithm
require a positive-definite ``working matrix''.
The error matrix produced by HESSE is used to calculate what Minuit
prints as the parameter errors, which therefore contain the effects
due to parameter correlations. The extent of the two-by-two
correlations can be seen from the correlation coefficients printed by
Minuit, and the global correlations (see [5], p. 23) are also
printed. All of these correlation coefficients must be less than one
in absolute value. If any of them are very close to one or minus one,
this indicates an illposed problem with more free parameters than
can be determined by the model and the data.
MINOS is designed to calculate the correct errors in all cases,
especially when there are non-linearities as described above. The
theory behind the method is described in [5], pp. 204-205
(where ``non-parabolic likelihood'' should of course read
``non-parabolic log-likelihood'',
which is equivalent to ``nonparabolic chi-square'').
MINOS actually follows the function out from the minimum to find
where it crosses the function value (minimum + UP), instead of using
the curvature at the minimum and assuming a parabolic shape. This
method not only yields errors which may be different from those of
HESSE, but in general also different positive and negative errors
(asymmetric error interval). Indeed the most frequent result for
most physical problems is that the (symmetric) HESSE error lies
between the positive and negative errors of MINOS. The difference
between these three numbers is one measure of the non-linearity of
the problem (or rather of its formulation).
In practice, MINOS errors usually turn out to be close to, or
somewhat larger than errors derived from the error matrix, although
in cases of very bad behaviour (very little data or ill-posed model)
anything can happen. In particular, it is often not
true in MINOS that two-standard-deviation errors
(UP=4) and three-standard-deviation errors (UP=9)
are respectively two and three times as big as one-standard-deviation errors,
as is true by definition for errors derived from the
error matrix (MIGRAD or HESSE).
Next: Multiparameter errors
Up: Interpretation of the errors
Previous: Function normalization and ERROR
  Index
Back to
CERN
| IT
| ASD
| CERN Program Library Home
MG
(last mod. 1998-08-19)