**this really really needs updating to match the recent changes**
This program ´paramin´ consists of three minimization techniques
and can only be used with PVM.
The techniques are Simulated Annealing, Hooke & Jeeves algorithm
and the quasi-Newton algorithm BFGS. Simulated annealing is a
global minimization technique and the other two are local. The
quasi-Newton uses derivatives.
There are five intput files for ´paramin´.
mainconstants: Constants that involve the program itself.
bfgsconstants: Constants that involve the BFGS technique.
simconstants: Constants that involve Simulated Annealing.
hjconstants: Constants that involve Hooke & Jeeves technique.
initvals: Contains four columns of numbers, the initial point,
the lower bounds, the upper bounds and whether each
parameter shall be optimized or not - 1 for
optimizing and 0 for staying unchanged.
To minimize the objective function, write:
paramin 0 FunctionName
(or ´paramin 0 gadget -s -n´ for running gadget)
If the program uses BFGS and the minimization is convergent the
program gives the constant bfgsfail the value 0. It writes out
the minimum to the file ´finalvals´, the gradient of the objective
function to the file ´gradient´ and the approximated inverse
Hessian matrix of the objective function to the file ´hessian´.
If the program does not use BFGS the best point recieved will be
printed out on the screen with it´s function value.
To choose which minimization algorithms to use, give the parameters
SimulatedAnnealing, HookeAndJeeves and BFGS in mainconstants the
values 0 or 1; 0 for not in use, 1 for in use.
About the minimization algorithms:
1) Hooke and Jeeves:
Nonlinear Optimization using the algorithm of Hooke and Jeeves
12 February 1994
author: Mark G. Johnson
August 1999
changes for parallel use: Kristjana Yr Jonsdottir and
Thordis Linda Thorarinsdottir
Find a point X where the nonlinear function f(X) has a local
minimum. X is an n-vector and f(X) is a scalar. In mathematical
notation f: R^n -> R^1. The objective function f() is not
required to be continuous. Nor does f() need to be differentiable.
The program does not use or require derivatives of f().
The software user supplies three things: a subroutine that computes
f(X), an initial "starting guess" of the minimum point X, and values
for the algorithm convergence parameters. Then the program searches
for a local minimum, beginning from the starting guess, using the
Direct Search algorithm of Hooke and Jeeves.
The original C program is adapted from the Algol pseudocode found in
"Algorithm 178: Direct Search" by Arthur F
. Kaupe Jr.,
Communications of the ACM, Vol 6. p.313 (June 1963). It includes
the improvements suggested by Bell and Pike (CACM v.9, p. 684, Sept
1966) and those of Tomlin and Smith, "Remark on Algorithm 178" (CACM
v.12). The original paper, which I don't recommend as highly as the
one by A. Kaupe, is: R. Hooke and T. A. Jeeves, "Direct Search
Solution of Numerical and Statistical Problems", Journal of the ACM,
Vol. 8, April 1961, pp. 212-229.
Calling sequence:
int hooke.DoSearch( netI )
netI {netInterface} netI manages all netcommunication.
netInterface uses the class netCommunication to
send/receive data. The class processManager is used
to dispatch processes to be used by netInterface.
It uses the class netDataControl to store data and
keep track of information about status of data
concerning netcommunication. It can use the class
dataConverter to prepare data before sending and the
class dataScaler to scale/unscale data after/before
sending.
As mentioned above, values for the algorithm convergence parameters
are kept in the input files mainconstants and hjconstants. These
parameters are:
rho {a double} This is a user-supplied convergence
parameter (more detail below), which should be
set to a value between 0.0 and 1.0. Larger values
of rho give greater probability of convergence on
highly nonlinear functions, at a cost of more
function evaluations. Smaller values of rho reduces
the number of evaluations (and the program running
time), but increases the risk of nonconvergence.
See below.
epsilon {a double} This is the criterion for halting the
search for a minimum. When the algorithm begins
to make less and less progress on eachiteration,
it checks the halting criterion: if the stepsize
is below epsilon, terminate the iteration and
return the current best estimate of the minimum.
Larger values of epsilon (such as 1.0e-4) give
quicker running time, but a less accurate estimate
of the minimum. Smaller values of epsilon (such as
1.0e-7) give longer running time, but a more
accurate estimate of the minimum.
MAXITER {an integer} A second halting criterion. If the
algorithm uses more function evaluations than
MAXITER, terminate the iteration and return the
current best estimate of the minimum.
HOOKE_ITER {an integer} Total function evaluations in all runs
of the algorithm. If this value is greater than
MAXITER, repeated runs of the algorithm are done.
Each run uses at most MAXITER function evaluations
and the starting point each time is the current
best point.
The user supplied objective function f(x) has to support pvm.
Its argument is x - an array of doubles, the point at which
f(x) should be calculated and it returns a double. All
information about the objective function are a part of netI.
The following functions from netI are used:
startNewDataGroup(s) Creates a new data group of size s.
This group keeps all points sent to
the objective function as arguments,
which function value they have and
an ID for them.
stopUsingDataGroup() Destructor for data group.
getNumOfVarsInDataGroup() Returns the number of coordinates in
the argument for the objective
function.
getTotalNumProc() Tells, how many processors are
available.
getNumFreeProcesses() Returns number of available free processes.
sendToAllIdleHosts() Sends jobs to all processors
available. That is, sends points to
all available processors, for
evaluating the function values.
getNumNotAns() Number of all points that have been
sent, where the function value has
not yet received.
sendOne() Sends one point to the objective
function.
send_receiveAllData() Sends all uncalculated points in
the data group to processors and
recieves them function values as
well.
receiveOne() Recieves one function value.
receiveAll() Recieves all function value, not yet
received.
getRecievedID() Gets the ID of the point, which
function value was received last.
getY( returnID ) Returns the function value of the
point with ID returnID.
getX( returnID ) Returns the point with ID returnID.
getInitialX() Returns the starting point, that is
the best point know, before the
algorithm starts.
getUpperScaleConstant() For getting the upper bounds.
getLowerScaleConstant() For getting the lower bounds.
setX(x) Sets x as the next point in the data
group.
setBestX(x) Sets x as the point with the best
value found, so far.
rho, the algorithm convergence control:
The algorithm works by taking "steps" from one estimate of a
minimum, to another (hopefully better) estimate. Taking big steps
gets to the minimum more quickly, at the risk of "stepping right
over" an excellent point. The stepsize is controlled by a user
supplied parameter called rho. At each iteration, the stepsize is
multiplied by rho (0 < rho < 1), so the stepsize is successively
reduced.
Small values of rho correspond to big stepsize changes,
which make the algorithm run more quickly. However, there is a
chance (especially with highly nonlinear functions) that these big
changes will accidentally overlook a promising search vector,
leading to nonconvergence.
Large values of rho correspond to small stepsize changes,
which force the algorithm to carefully examine nearby points instead
of optimistically forging ahead. This improves the probability of
convergence.
The stepsize is reduced until it is equal to (or smaller
than) epsilon. So the number of iterations performed by Hooke-Jeeves
is determined by rho and epsilon:
rho**(number_of_iterations) = epsilon
In general it is a good idea to set rho to an aggressively
small value like 0.5 (hoping for fast convergence). Then, if the
user suspects that the reported minimum is incorrect (or perhaps
not accurate enough), the program can be run again with a larger
value of rho such as 0.85, using the result of the first
minimization as the starting guess to begin the second minimization.
The parameters of the point are randomly ordered now and then, to
avoid the order of them having an influence on which changes are
accepted and which are not. As this means that no two runs are
exactly the same (even though all parameters and constants are
unchanged), running the program at least twice on every point might
be a good idea.
Normal use: (1) Code your function f() in the C language and connect
it to pvm.
(2) Install your starting guess {or read it in}.
(3) Run the program.
(4) {for the skeptical}: Use the computed minimum
as the starting point for another run (set the value
of HOOKE_ITER equal 2*MAXITER).
Data Fitting:
Code your function f() to be the sum of the squares of the errors
(differences) between the computed values and the measured values.
Then minimize f() using Hooke-Jeeves.
EXAMPLE: you have 20 datapoints (ti, yi) and you want to
find A,B,C such that (A*t*t) + (B*exp(t)) + (C*tan(t))
fits the data as closely as possible. Then f() is just
f(x) = SUM (measured_y[i] - ((A*t[i]*t[i]) + (B*exp(t[i]))
+ (C*tan(t[i]))))^2
where x[] is a 3-vector consisting of {A, B, C}.
The author of this software is M.G. Johnson.
Permission to use, copy, modify, and distribute this software for
any purpose without fee is hereby granted, provided that this entire
notice is included in all copies of any software which is or
includes a copy or modification of this software and in all copies
of the supporting documentation for such software. THIS SOFTWARE IS
BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN
PARTICULAR, NEITHER THE AUTHOR NOR AT&T MAKE ANY REPRESENTATION OR
WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE
OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
2) Simulated Annealing
ABSTRACT:
Simulated annealing is a global optimization method that
distinguishes different local optima. Starting from an initial
point, the algorithm takes a step and the function is evaluated.
When minimizing a function, any downhill step is accepted and the
process repeats from this new point. An uphill step may be accepted
(thus, it can escape from local optima). This uphill decision is
made by the Metropolis criteria. As the optimization process
proceeds, the length of the steps decline and the algorithm closes
in on the global optimum. Since the algorithm makes very few
assumptions regarding the function to be optimized, it is quite
robust with respect to non-quadratic surfaces. The degree of
robustness can be adjusted by the user. In fact, simulated annealing
can be used as a local optimizer for difficult functions.
Author
h2zr1001@vm.cis.smu.edu
Changes for parallel use
Kristjana Yr Jonsdottir
Thordis Linda Thorarinsdottir
This code is a pvm version of a translation of a fortran code,
which is an example of the Corana et al. simulated annealing
algorithm for multimodal and robust optimization as implemented and
modified by by Goffe et al.
Use the sample function from Judge with the following suggestions
to get a feel for how SA works. When you've done this, you should be
ready to use it on most any function with a fair amount of expertise.
1. Run the program as is to make sure it runs okay. Add some print
functions and see how it optimizes as temperature (T) falls.
Notice how the optimal point is reached and how falling T reduces
VM.
2. Look through the documentation to SA so the following makes a
bit of sense. In line with the paper, it shouldn't be that hard
to figure out. The core of the algorithm is described on pp. 4-6
and on pp. 28. Also see Corana et al. pp. 264-9.
3. To see the importance of different temperatures, try starting
with a very low one (say T = 10E-5). You'll see (i) it never
escapes from the local optima (in annealing terminology, it
quenches) & (ii) the step length (VM) will be quite small. This
is a key part of the algorithm: as temperature (T) falls, step
length does too. In a minor point here, note how VM is quickly
reset from its initial value. Thus, the input VM is not very
important. This is all the more reason to examine VM once the
algorithm is underway.
4. To see the effect of different parameters and their effect on
the speed of the algorithm, try RT = .95 & RT = .1. Notice the
vastly different speed for optimization. Also try NT = 20. Note
that this sample function is quite easy to optimize, so it will
tolerate big changes in these parameters. RT and NT are the
parameters one should adjust to modify the runtime of the
algorithm and its robustness.
5. Try constraining the algorithm with either LB or UB.
Synopsis:
This routine implements the continuous simulated annealing global
optimization algorithm described in Corana et al.'s article
"Minimizing Multimodal Functions of Continuous Variables with the
"Simulated Annealing" Algorithm" in the September 1987 (vol. 13,
no. 3, pp. 262-280) issue of the ACM Transactions on Mathematical
Software.
A very quick (perhaps too quick) overview of SA:
SA tries to find the global optimum of an N dimensional function.
It moves both up and downhill and as the optimization process
proceeds, it focuses on the most promising area.
To start, it randomly changes one parameter at a time within the
step length VM (a vector of length N) of the user selected starting
point. The function is evaluated at these trial points and its
value is compared to its value at the initial point.
In a maximization problem, all uphill moves are accepted and the
algorithm continues from that trial point. Downhill moves may be
accepted; the decision is made by the Metropolis criteria. It uses
T (temperature) and the size of the downhill move in a probabilistic
manner. The bigger T and the smaller size of the downhill move are,
the more likely that move will be accepted. If the trial is
accepted, the algorithm moves on from that point. If it is rejected,
another point is chosen instead for a trial evaluation.
Each element of VM is periodically adjusted so that half of all
function evaluations in that direction are accepted.
A fall in T is imposed upon the system with the RT variable by
T(i+1) = RT*T(i) where i is the ith iteration. Thus, as T declines,
downhill moves are less likely to be accepted and the percentage of
rejections rise. Given the scheme for the selection for VM, VM falls.
Thus, as T declines, VM falls and SA focuses upon the most promising
area for optimization.
The parameters of the point are randomly ordered now and then, to
avoid the order of them having an influence on which changes are
accepted and which are not. As this means that no two runs are
exactly the same (even though all parameters and constants are
unchanged), running the program at least twice on every point might
be a good idea.
The importance of the parameter T:
The parameter T is crucial in using SA successfully. It influences
VM, the step length over which the algorithm searches for optima. For
a small intial T, the step length may be too small; thus not enough
of the function might be evaluated to find the global optima. The
user should carefully examine VM to make sure that it is appropriate.
The relationship between the initial temperature and the resulting
step length is function dependent.
To determine the starting temperature that is consistent with
optimizing a function, it is worthwhile to run a trial run first.
Set RT = 1.5 and T = 1.0. With RT > 1.0, the temperature increases
and VM rises as well. Then select the T that produces a large enough
VM.
For modifications to the algorithm and many details on its use,
(particularly for econometric applications) see Goffe, Ferrier and
Rogers, "Global Optimization of Statistical Functions with the
Simulated Annealing," Journal of Econometrics (forthcomming). For a
pre-publication copy, contact
Bill Goffe
Department of Economics
Southern Methodist University
Dallas, TX 75275
h2zr1001 @ smuvm1 (Bitnet)
h2zr1001 @ vm.cis.smu.edu (Internet)
As far as possible, the parameters here have the same name as in the
description of the algorithm on pp. 266-8 of Corana et al.
Input Parameters:
Note: The suggested values generally come from Corana et al. To
drastically reduce runtime, see Goffe et al., pp. 17-8 for
suggestions on choosing the appropriate RT and NT.
Input Parameters for class Simann:
netInt (netInterface) netInt manages all netcommunication.
netInterface uses the class netCommunication to
send/receive data. The class processManager is used
to dispatch processes to be used by netInterface.
It uses the class netDataControl to store data and
keep track of information about status of data
concerning netcommunication. It can use the class
dataConverter to prepare data before sending and the
class dataScaler to scale/unscale data after/before
sending.
MAXIM Denotes whether the function should be maximized or
minimized. A true value denotes maximization while a
false value denotes minimization.
SIM_ITER The maximum number of function evaluations. (INT)
TEMPERATURE The initial temperature. See Goffe et al. for
advice. (DP)
c Vector that controls the step length adjustment. The
suggested value for all elements is 2.0. (DP(N))
vm The step length vector. On input it should encompass
the region of interest given the starting value X.
For point X(I), the next trial point is selected is
from X(I) - VM(I) to X(I) + VM(I). Since VM is
adjusted so that about half of all points are
accepted, the input value is not very important
(i.e. is the value is off, SA adjusts VM to the
correct value). (DP(N))
Other input parameters from mainconstants:
C_COMPONENT Size of each component of the step length
adjustment, that is the value of the components of c.
VM_COMPONENT The initial value of the components of vm (as the
initial value is not considered very important, all
components have the same initial value).
Input parameters from simconstants:
RT The temperature reduction factor. The value suggested by
Corana et al. is .85. See Goffe et al. for more advice. (DP)
EPS Error tolerance for termination. If the final function
values from the last neps temperatures differ from the
corresponding value at the current temperature by less than
EPS and the final function value at the current temperature
differs from the current optimal function value by less than
EPS, execution terminates and IER = 0 is returned. (EP)
NS Number of cycles. After NS*N function evaluations, each
element of VM is adjusted so that approximately half of
all function evaluations are accepted. The suggested value
is 20. (INT)
NT Number of iterations before temperature reduction. After
NT*NS*N function evaluations, temperature (T) is changed
by the factor RT. Value suggested by Corana et al. is
MAX(100, 5*N). See Goffe et al. for further advice. (INT)
NEPS Number of final function values used to decide upon termi-
nation. See EPS. Suggested value is 4. (INT)
The upper and lower bounds are part of netInt. If the algorithm
chooses X(I) .LT. LB(I) or X(I) .GT. UB(I), I = 1, N, a point from
inside is randomly selected. This focuses the algorithm on the
region inside UB and LB. Unless the user wishes to concentrate the
search to a particular region, UB and LB should be set to very large
positive and negative values, respectively. Note that the starting
vector X should be inside this region. Also note that LB and UB are
fixed in position, while VM is centered on the last accepted trial
set of variables that optimizes the function.
Parameters within the algorithm, that can be used to follow how the
algorithm is working:
x The variables that optimize the function at each time.
(DP(N))
fopt The optimal value of the function at each time. (DP)
nacc The number of accepted function evaluations. (INT)
nnew The number of times the algorithm finds a new optimum. (INT)
nup The number of function values that are better than the last
one. (INT)
ndown The number of accepted function values that are worse than
the last one. (INT)
nrej The number of not accepted function values. (INT)
nfcnev The total number of function evaluations. In a minor
point, note that the first evaluation is not used in the
core of the algorithm; it simply initializes the
algorithm. (INT).
nobds The total number of trial function evaluations that
would have been out of bounds of LB and UB. Note that
a trial point is randomly selected between LB and UB.(INT)
The user supplied objective function f(x) has to support pvm.
Its argument is x - an array of doubles, the point at which
f(x) should be calculated and it returns a double. All
informations about the objective function are a part of netInt.
The following functions from netInt are used:
startNewDataGroup(s) Creates a new data group of size s.
This group keeps all points sent to
the objective function as arguments,
which function value they have and
an ID for them.
stopUsingDataGroup() Destructor for data group.
getNumOfVarsInDataGroup() Returns the number of coordinates in
the argument for the objective
function.
getTotalNumProc() Tells, how many processors are
available.
sendToAllIdleHosts() Sends jobs to all processors
available. That is, sends points to
all available processors, for
evaluating the function values.
getNumNotAns() Number of all points that have been
sent, where the function value has
not yet received.
send_receiveAllData() Sends all uncalculated points in
the data group to processors and
recieves them function values as
well.
receiveOne() Recieves one function value.
receiveAll() Recieves all function value, not yet
received.
getRecievedID() Gets the ID of the point, which
function value was received last.
getY( returnID ) Returns the function value of the
point with ID returnID.
getX( returnID ) Returns the point with ID returnID.
getInitialX() Returns the starting point, that is
the best point know, before the
algorithm starts.
getUpperScaleConstant() For getting the upper bounds.
getLowerScaleConstant() For getting the lower bounds.
setX(x) Sets x as the next point in the data
group.
setBestX(x) Sets x as the point with the best
value found, so far.
unscaleX(x) If working with scaled points, this
returns the point x unscaled, other-
wize no change.
3) BFGS
ABSTRACT:
The BFGS-method is a gradient iterative method that work as follows:
We start at some point and successively generate new points, so that
the function value decreases in each iteration. This is done by
choosing a direction that makes an angle greater than 90 degrees
with the gradient of the function.
Authors
Gunnar Stefansson
Kristjana Yr Jonsdottir
Thordis Linda Thorarinsdottir
The BFGS-method is one of the Quasi-Newton methods. Quasi-Newton
methods are iterative methods which approximate Newton's method
without calculating second derivatives. They are of the form
x^k+1 = x^k + a^k * d^k,
d^k = -D^k \nabla f( x^k ),
where D^k is a positive definite matrix, updated using the BFGS
updating formula. To find the value of a, an inexact linesearch
called the Armijo rule is used. That method finds an a such that
Armijo condition is satiefied, which is based on getting a
function value that is some amount better than the best know
function value. This is done by trying at first an initial
stepsize an than reducing the stepsize until a value that
satisfies the Armijo condition is found. For details, see D.P.
Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont,
1999. As the objective function is not necessarily differentiable,
the gradient is calculted numerically.
Input parameters, from file bfgsconstants:
BFGSPAR 1 for using the BFGS method and 0 for using the steepest
decent method. It is sometimes good to use the later one
for a few iterations if the point is stuck.
MAXBFGSITER Maximum number of iterations for each round of the
minimization.
MAXROUNDS Maximum number of rounds made.
ERRORTOL Error tolerance for the termination criteria.
XTOL Error tolerance for the point.
SHANNONSCALING 1 if the direction vector shall be scaled, otherwize 0. If 1,
than Shannon Scaling is used to scale the direction vector.
DIFFICULTGRAD denotes how good the approximation of the gradient should be.
If = 0, a first degree polynomial is used. If = 1, the
approximation is parabolic and if >= 2, a polynomial of degree
four is used. If the line search gives no results, the
difficultgrad is increased.
BETA beta constant for Armijo condition in linesearch. See D.P.
Bertsekas, "Nonlinear Programming", Athena Scientific,
Belmont, 1999, page 29.
SIGMA sigma constant for Armijo condition in linesearch. See D.P.
Bertsekas, "Nonlinear Programming", Athena Scientific,
Belmont, 1999, page 29.
PRINTING 1 for printing on, and 0 for printing off. If printing is on,
the last gradient calulated is printed out to the file
"gradient", the last hessian matrix is printed out to the file
"hessian" and the best point is printed out to the file
"finalvals".
The user supplied objective function f(x) has to support pvm.
Its argument is x - an array of doubles, the point at which
f(x) should be calculated and it returns a double. All
informations about the objective function are a part of netInt.
The following functions from netInt are used:
startNewDataGroup(s) Creates a new data group of size s.
This group keeps all points sent to
the objective function as arguments,
which function value they have and
an ID for them.
stopUsingDataGroup() Destructor for data group.
getNumOfVarsInDataGroup() Returns the number of coordinates in
the argument for the objective
function.
getNumDataItemSet() Number of data items set in data group.
getNumDataItemsAnswered() Number of data items set, which have
received answer too.
dataGroupFull() Returns 1 if not able to add more data to
current datagroup, else returns 0.
getTotalNumProc() Tells, how many processors are
available.
getNumFreeProcesses() Returns number of available free processes.
getNumNotAns() Number of all points that have been
sent, where the function value has
not yet received.
send_receiveAllData() Sends all uncalculated points in
the data group to processors and
recieves them function values as
well.
send_receive_setData(c) Sends, receives and sets new data until
condition c is met. Returns ERROR if error,
or 1 if condition was met.
send_receiveTillCondition(c) Sends and receives data until condition c is
met, or there is no more available data in
the data group. Returns ERROR if error, 1 if
condition was met and 0 if condition not met
and all data belonging to datagroup has been
sent and received.
getRecievedID() Gets the ID of the point, which
function value was received last.
getY( returnID ) Returns the function value of the
point with ID returnID.
getNextAnswerY() Returns the function value in the last data
pair found while looking for a datapair with
an answer.
getX( returnID ) Returns the point with ID returnID.
getNextAnswerX() Returns the next x found in datagroup which
has an answer since last checked for answer.
(x0, fx0), (x1, ), (x2, fx2), ...., (xn,fxn).
If the datapair (x0, fx0) was last checked
for an answer then x will equal x2 and the
next datapair that will be looked at for
answer will be (x3, fx3) unless the search is
reset to start at the beginning.
getInitialX() Returns the starting point, that is
the best point know, before the
algorithm starts.
getUpperScaleConstant() For getting the upper bounds.
getLowerScaleConstant() For getting the lower bounds.
setDataPair(x, f) Adds the datapair (x, f) to datagroup. x will
not be sent for processing through the net as
f is stored as its answer.
setX(x) Sets x as the next point in the data
group.
setXFirstToSend(x) Adds x to datagroup and puts it first in the
queue for sending.
setFirstAnsweredData() Initiates a datapointer to the data item which
was first set in data group.
setBestX(x) Sets x as the point with the best
value found, so far.
unscaleX(x) If working with scaled points, this
returns the point x unscaled, other-
wize no change.
stopNetComm() Halts all netcommunication, netInt can not
send/receive anymore.