1 : |
agomez |
1 |
|
2 : |
|
|
**this really really needs updating to match the recent changes**
|
3 : |
|
|
|
4 : |
|
|
This program ´paramin´ consists of three minimization techniques
|
5 : |
|
|
and can only be used with PVM.
|
6 : |
|
|
|
7 : |
|
|
The techniques are Simulated Annealing, Hooke & Jeeves algorithm
|
8 : |
|
|
and the quasi-Newton algorithm BFGS. Simulated annealing is a
|
9 : |
|
|
global minimization technique and the other two are local. The
|
10 : |
|
|
quasi-Newton uses derivatives.
|
11 : |
|
|
|
12 : |
|
|
There are five intput files for ´paramin´.
|
13 : |
|
|
|
14 : |
|
|
mainconstants: Constants that involve the program itself.
|
15 : |
|
|
|
16 : |
|
|
bfgsconstants: Constants that involve the BFGS technique.
|
17 : |
|
|
simconstants: Constants that involve Simulated Annealing.
|
18 : |
|
|
hjconstants: Constants that involve Hooke & Jeeves technique.
|
19 : |
|
|
|
20 : |
|
|
initvals: Contains four columns of numbers, the initial point,
|
21 : |
|
|
the lower bounds, the upper bounds and whether each
|
22 : |
|
|
parameter shall be optimized or not - 1 for
|
23 : |
|
|
optimizing and 0 for staying unchanged.
|
24 : |
|
|
|
25 : |
|
|
To minimize the objective function, write:
|
26 : |
|
|
|
27 : |
|
|
paramin 0 FunctionName
|
28 : |
|
|
|
29 : |
|
|
(or ´paramin 0 gadget -s -n´ for running gadget)
|
30 : |
|
|
|
31 : |
|
|
If the program uses BFGS and the minimization is convergent the
|
32 : |
|
|
program gives the constant bfgsfail the value 0. It writes out
|
33 : |
|
|
the minimum to the file ´finalvals´, the gradient of the objective
|
34 : |
|
|
function to the file ´gradient´ and the approximated inverse
|
35 : |
|
|
Hessian matrix of the objective function to the file ´hessian´.
|
36 : |
|
|
|
37 : |
|
|
If the program does not use BFGS the best point recieved will be
|
38 : |
|
|
printed out on the screen with it´s function value.
|
39 : |
|
|
|
40 : |
|
|
To choose which minimization algorithms to use, give the parameters
|
41 : |
|
|
SimulatedAnnealing, HookeAndJeeves and BFGS in mainconstants the
|
42 : |
|
|
values 0 or 1; 0 for not in use, 1 for in use.
|
43 : |
|
|
|
44 : |
|
|
About the minimization algorithms:
|
45 : |
|
|
|
46 : |
|
|
|
47 : |
|
|
1) Hooke and Jeeves:
|
48 : |
|
|
|
49 : |
|
|
Nonlinear Optimization using the algorithm of Hooke and Jeeves
|
50 : |
|
|
12 February 1994
|
51 : |
|
|
author: Mark G. Johnson
|
52 : |
|
|
August 1999
|
53 : |
|
|
changes for parallel use: Kristjana Yr Jonsdottir and
|
54 : |
|
|
Thordis Linda Thorarinsdottir
|
55 : |
|
|
|
56 : |
|
|
Find a point X where the nonlinear function f(X) has a local
|
57 : |
|
|
minimum. X is an n-vector and f(X) is a scalar. In mathematical
|
58 : |
|
|
notation f: R^n -> R^1. The objective function f() is not
|
59 : |
|
|
required to be continuous. Nor does f() need to be differentiable.
|
60 : |
|
|
The program does not use or require derivatives of f().
|
61 : |
|
|
|
62 : |
|
|
The software user supplies three things: a subroutine that computes
|
63 : |
|
|
f(X), an initial "starting guess" of the minimum point X, and values
|
64 : |
|
|
for the algorithm convergence parameters. Then the program searches
|
65 : |
|
|
for a local minimum, beginning from the starting guess, using the
|
66 : |
|
|
Direct Search algorithm of Hooke and Jeeves.
|
67 : |
|
|
|
68 : |
|
|
The original C program is adapted from the Algol pseudocode found in
|
69 : |
|
|
"Algorithm 178: Direct Search" by Arthur F
|
70 : |
|
|
. Kaupe Jr.,
|
71 : |
|
|
Communications of the ACM, Vol 6. p.313 (June 1963). It includes
|
72 : |
|
|
the improvements suggested by Bell and Pike (CACM v.9, p. 684, Sept
|
73 : |
|
|
1966) and those of Tomlin and Smith, "Remark on Algorithm 178" (CACM
|
74 : |
|
|
v.12). The original paper, which I don't recommend as highly as the
|
75 : |
|
|
one by A. Kaupe, is: R. Hooke and T. A. Jeeves, "Direct Search
|
76 : |
|
|
Solution of Numerical and Statistical Problems", Journal of the ACM,
|
77 : |
|
|
Vol. 8, April 1961, pp. 212-229.
|
78 : |
|
|
|
79 : |
|
|
Calling sequence:
|
80 : |
|
|
int hooke.DoSearch( netI )
|
81 : |
|
|
|
82 : |
|
|
netI {netInterface} netI manages all netcommunication.
|
83 : |
|
|
netInterface uses the class netCommunication to
|
84 : |
|
|
send/receive data. The class processManager is used
|
85 : |
|
|
to dispatch processes to be used by netInterface.
|
86 : |
|
|
It uses the class netDataControl to store data and
|
87 : |
|
|
keep track of information about status of data
|
88 : |
|
|
concerning netcommunication. It can use the class
|
89 : |
|
|
dataConverter to prepare data before sending and the
|
90 : |
|
|
class dataScaler to scale/unscale data after/before
|
91 : |
|
|
sending.
|
92 : |
|
|
|
93 : |
|
|
As mentioned above, values for the algorithm convergence parameters
|
94 : |
|
|
are kept in the input files mainconstants and hjconstants. These
|
95 : |
|
|
parameters are:
|
96 : |
|
|
|
97 : |
|
|
rho {a double} This is a user-supplied convergence
|
98 : |
|
|
parameter (more detail below), which should be
|
99 : |
|
|
set to a value between 0.0 and 1.0. Larger values
|
100 : |
|
|
of rho give greater probability of convergence on
|
101 : |
|
|
highly nonlinear functions, at a cost of more
|
102 : |
|
|
function evaluations. Smaller values of rho reduces
|
103 : |
|
|
the number of evaluations (and the program running
|
104 : |
|
|
time), but increases the risk of nonconvergence.
|
105 : |
|
|
See below.
|
106 : |
|
|
epsilon {a double} This is the criterion for halting the
|
107 : |
|
|
search for a minimum. When the algorithm begins
|
108 : |
|
|
to make less and less progress on eachiteration,
|
109 : |
|
|
it checks the halting criterion: if the stepsize
|
110 : |
|
|
is below epsilon, terminate the iteration and
|
111 : |
|
|
return the current best estimate of the minimum.
|
112 : |
|
|
Larger values of epsilon (such as 1.0e-4) give
|
113 : |
|
|
quicker running time, but a less accurate estimate
|
114 : |
|
|
of the minimum. Smaller values of epsilon (such as
|
115 : |
|
|
1.0e-7) give longer running time, but a more
|
116 : |
|
|
accurate estimate of the minimum.
|
117 : |
|
|
|
118 : |
|
|
MAXITER {an integer} A second halting criterion. If the
|
119 : |
|
|
algorithm uses more function evaluations than
|
120 : |
|
|
MAXITER, terminate the iteration and return the
|
121 : |
|
|
current best estimate of the minimum.
|
122 : |
|
|
HOOKE_ITER {an integer} Total function evaluations in all runs
|
123 : |
|
|
of the algorithm. If this value is greater than
|
124 : |
|
|
MAXITER, repeated runs of the algorithm are done.
|
125 : |
|
|
Each run uses at most MAXITER function evaluations
|
126 : |
|
|
and the starting point each time is the current
|
127 : |
|
|
best point.
|
128 : |
|
|
|
129 : |
|
|
The user supplied objective function f(x) has to support pvm.
|
130 : |
|
|
Its argument is x - an array of doubles, the point at which
|
131 : |
|
|
f(x) should be calculated and it returns a double. All
|
132 : |
|
|
information about the objective function are a part of netI.
|
133 : |
|
|
The following functions from netI are used:
|
134 : |
|
|
startNewDataGroup(s) Creates a new data group of size s.
|
135 : |
|
|
This group keeps all points sent to
|
136 : |
|
|
the objective function as arguments,
|
137 : |
|
|
which function value they have and
|
138 : |
|
|
an ID for them.
|
139 : |
|
|
stopUsingDataGroup() Destructor for data group.
|
140 : |
|
|
getNumOfVarsInDataGroup() Returns the number of coordinates in
|
141 : |
|
|
the argument for the objective
|
142 : |
|
|
function.
|
143 : |
|
|
getTotalNumProc() Tells, how many processors are
|
144 : |
|
|
available.
|
145 : |
|
|
getNumFreeProcesses() Returns number of available free processes.
|
146 : |
|
|
sendToAllIdleHosts() Sends jobs to all processors
|
147 : |
|
|
available. That is, sends points to
|
148 : |
|
|
all available processors, for
|
149 : |
|
|
evaluating the function values.
|
150 : |
|
|
getNumNotAns() Number of all points that have been
|
151 : |
|
|
sent, where the function value has
|
152 : |
|
|
not yet received.
|
153 : |
|
|
sendOne() Sends one point to the objective
|
154 : |
|
|
function.
|
155 : |
|
|
send_receiveAllData() Sends all uncalculated points in
|
156 : |
|
|
the data group to processors and
|
157 : |
|
|
recieves them function values as
|
158 : |
|
|
well.
|
159 : |
|
|
receiveOne() Recieves one function value.
|
160 : |
|
|
receiveAll() Recieves all function value, not yet
|
161 : |
|
|
received.
|
162 : |
|
|
getRecievedID() Gets the ID of the point, which
|
163 : |
|
|
function value was received last.
|
164 : |
|
|
getY( returnID ) Returns the function value of the
|
165 : |
|
|
point with ID returnID.
|
166 : |
|
|
getX( returnID ) Returns the point with ID returnID.
|
167 : |
|
|
getInitialX() Returns the starting point, that is
|
168 : |
|
|
the best point know, before the
|
169 : |
|
|
algorithm starts.
|
170 : |
|
|
getUpperScaleConstant() For getting the upper bounds.
|
171 : |
|
|
getLowerScaleConstant() For getting the lower bounds.
|
172 : |
|
|
setX(x) Sets x as the next point in the data
|
173 : |
|
|
group.
|
174 : |
|
|
setBestX(x) Sets x as the point with the best
|
175 : |
|
|
value found, so far.
|
176 : |
|
|
|
177 : |
|
|
rho, the algorithm convergence control:
|
178 : |
|
|
The algorithm works by taking "steps" from one estimate of a
|
179 : |
|
|
minimum, to another (hopefully better) estimate. Taking big steps
|
180 : |
|
|
gets to the minimum more quickly, at the risk of "stepping right
|
181 : |
|
|
over" an excellent point. The stepsize is controlled by a user
|
182 : |
|
|
supplied parameter called rho. At each iteration, the stepsize is
|
183 : |
|
|
multiplied by rho (0 < rho < 1), so the stepsize is successively
|
184 : |
|
|
reduced.
|
185 : |
|
|
Small values of rho correspond to big stepsize changes,
|
186 : |
|
|
which make the algorithm run more quickly. However, there is a
|
187 : |
|
|
chance (especially with highly nonlinear functions) that these big
|
188 : |
|
|
changes will accidentally overlook a promising search vector,
|
189 : |
|
|
leading to nonconvergence.
|
190 : |
|
|
Large values of rho correspond to small stepsize changes,
|
191 : |
|
|
which force the algorithm to carefully examine nearby points instead
|
192 : |
|
|
of optimistically forging ahead. This improves the probability of
|
193 : |
|
|
convergence.
|
194 : |
|
|
The stepsize is reduced until it is equal to (or smaller
|
195 : |
|
|
than) epsilon. So the number of iterations performed by Hooke-Jeeves
|
196 : |
|
|
is determined by rho and epsilon:
|
197 : |
|
|
rho**(number_of_iterations) = epsilon
|
198 : |
|
|
In general it is a good idea to set rho to an aggressively
|
199 : |
|
|
small value like 0.5 (hoping for fast convergence). Then, if the
|
200 : |
|
|
user suspects that the reported minimum is incorrect (or perhaps
|
201 : |
|
|
not accurate enough), the program can be run again with a larger
|
202 : |
|
|
value of rho such as 0.85, using the result of the first
|
203 : |
|
|
minimization as the starting guess to begin the second minimization.
|
204 : |
|
|
|
205 : |
|
|
The parameters of the point are randomly ordered now and then, to
|
206 : |
|
|
avoid the order of them having an influence on which changes are
|
207 : |
|
|
accepted and which are not. As this means that no two runs are
|
208 : |
|
|
exactly the same (even though all parameters and constants are
|
209 : |
|
|
unchanged), running the program at least twice on every point might
|
210 : |
|
|
be a good idea.
|
211 : |
|
|
|
212 : |
|
|
Normal use: (1) Code your function f() in the C language and connect
|
213 : |
|
|
it to pvm.
|
214 : |
|
|
(2) Install your starting guess {or read it in}.
|
215 : |
|
|
(3) Run the program.
|
216 : |
|
|
(4) {for the skeptical}: Use the computed minimum
|
217 : |
|
|
as the starting point for another run (set the value
|
218 : |
|
|
of HOOKE_ITER equal 2*MAXITER).
|
219 : |
|
|
|
220 : |
|
|
Data Fitting:
|
221 : |
|
|
Code your function f() to be the sum of the squares of the errors
|
222 : |
|
|
(differences) between the computed values and the measured values.
|
223 : |
|
|
Then minimize f() using Hooke-Jeeves.
|
224 : |
|
|
EXAMPLE: you have 20 datapoints (ti, yi) and you want to
|
225 : |
|
|
find A,B,C such that (A*t*t) + (B*exp(t)) + (C*tan(t))
|
226 : |
|
|
fits the data as closely as possible. Then f() is just
|
227 : |
|
|
f(x) = SUM (measured_y[i] - ((A*t[i]*t[i]) + (B*exp(t[i]))
|
228 : |
|
|
+ (C*tan(t[i]))))^2
|
229 : |
|
|
where x[] is a 3-vector consisting of {A, B, C}.
|
230 : |
|
|
|
231 : |
|
|
The author of this software is M.G. Johnson.
|
232 : |
|
|
Permission to use, copy, modify, and distribute this software for
|
233 : |
|
|
any purpose without fee is hereby granted, provided that this entire
|
234 : |
|
|
notice is included in all copies of any software which is or
|
235 : |
|
|
includes a copy or modification of this software and in all copies
|
236 : |
|
|
of the supporting documentation for such software. THIS SOFTWARE IS
|
237 : |
|
|
BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN
|
238 : |
|
|
PARTICULAR, NEITHER THE AUTHOR NOR AT&T MAKE ANY REPRESENTATION OR
|
239 : |
|
|
WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE
|
240 : |
|
|
OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
|
241 : |
|
|
|
242 : |
|
|
|
243 : |
|
|
2) Simulated Annealing
|
244 : |
|
|
|
245 : |
|
|
ABSTRACT:
|
246 : |
|
|
|
247 : |
|
|
Simulated annealing is a global optimization method that
|
248 : |
|
|
distinguishes different local optima. Starting from an initial
|
249 : |
|
|
point, the algorithm takes a step and the function is evaluated.
|
250 : |
|
|
When minimizing a function, any downhill step is accepted and the
|
251 : |
|
|
process repeats from this new point. An uphill step may be accepted
|
252 : |
|
|
(thus, it can escape from local optima). This uphill decision is
|
253 : |
|
|
made by the Metropolis criteria. As the optimization process
|
254 : |
|
|
proceeds, the length of the steps decline and the algorithm closes
|
255 : |
|
|
in on the global optimum. Since the algorithm makes very few
|
256 : |
|
|
assumptions regarding the function to be optimized, it is quite
|
257 : |
|
|
robust with respect to non-quadratic surfaces. The degree of
|
258 : |
|
|
robustness can be adjusted by the user. In fact, simulated annealing
|
259 : |
|
|
can be used as a local optimizer for difficult functions.
|
260 : |
|
|
|
261 : |
|
|
Author
|
262 : |
|
|
h2zr1001@vm.cis.smu.edu
|
263 : |
|
|
|
264 : |
|
|
Changes for parallel use
|
265 : |
|
|
Kristjana Yr Jonsdottir
|
266 : |
|
|
Thordis Linda Thorarinsdottir
|
267 : |
|
|
|
268 : |
|
|
|
269 : |
|
|
This code is a pvm version of a translation of a fortran code,
|
270 : |
|
|
which is an example of the Corana et al. simulated annealing
|
271 : |
|
|
algorithm for multimodal and robust optimization as implemented and
|
272 : |
|
|
modified by by Goffe et al.
|
273 : |
|
|
|
274 : |
|
|
Use the sample function from Judge with the following suggestions
|
275 : |
|
|
to get a feel for how SA works. When you've done this, you should be
|
276 : |
|
|
ready to use it on most any function with a fair amount of expertise.
|
277 : |
|
|
1. Run the program as is to make sure it runs okay. Add some print
|
278 : |
|
|
functions and see how it optimizes as temperature (T) falls.
|
279 : |
|
|
Notice how the optimal point is reached and how falling T reduces
|
280 : |
|
|
VM.
|
281 : |
|
|
2. Look through the documentation to SA so the following makes a
|
282 : |
|
|
bit of sense. In line with the paper, it shouldn't be that hard
|
283 : |
|
|
to figure out. The core of the algorithm is described on pp. 4-6
|
284 : |
|
|
and on pp. 28. Also see Corana et al. pp. 264-9.
|
285 : |
|
|
3. To see the importance of different temperatures, try starting
|
286 : |
|
|
with a very low one (say T = 10E-5). You'll see (i) it never
|
287 : |
|
|
escapes from the local optima (in annealing terminology, it
|
288 : |
|
|
quenches) & (ii) the step length (VM) will be quite small. This
|
289 : |
|
|
is a key part of the algorithm: as temperature (T) falls, step
|
290 : |
|
|
length does too. In a minor point here, note how VM is quickly
|
291 : |
|
|
reset from its initial value. Thus, the input VM is not very
|
292 : |
|
|
important. This is all the more reason to examine VM once the
|
293 : |
|
|
algorithm is underway.
|
294 : |
|
|
4. To see the effect of different parameters and their effect on
|
295 : |
|
|
the speed of the algorithm, try RT = .95 & RT = .1. Notice the
|
296 : |
|
|
vastly different speed for optimization. Also try NT = 20. Note
|
297 : |
|
|
that this sample function is quite easy to optimize, so it will
|
298 : |
|
|
tolerate big changes in these parameters. RT and NT are the
|
299 : |
|
|
parameters one should adjust to modify the runtime of the
|
300 : |
|
|
algorithm and its robustness.
|
301 : |
|
|
5. Try constraining the algorithm with either LB or UB.
|
302 : |
|
|
|
303 : |
|
|
Synopsis:
|
304 : |
|
|
This routine implements the continuous simulated annealing global
|
305 : |
|
|
optimization algorithm described in Corana et al.'s article
|
306 : |
|
|
"Minimizing Multimodal Functions of Continuous Variables with the
|
307 : |
|
|
"Simulated Annealing" Algorithm" in the September 1987 (vol. 13,
|
308 : |
|
|
no. 3, pp. 262-280) issue of the ACM Transactions on Mathematical
|
309 : |
|
|
Software.
|
310 : |
|
|
|
311 : |
|
|
A very quick (perhaps too quick) overview of SA:
|
312 : |
|
|
|
313 : |
|
|
SA tries to find the global optimum of an N dimensional function.
|
314 : |
|
|
It moves both up and downhill and as the optimization process
|
315 : |
|
|
proceeds, it focuses on the most promising area.
|
316 : |
|
|
|
317 : |
|
|
To start, it randomly changes one parameter at a time within the
|
318 : |
|
|
step length VM (a vector of length N) of the user selected starting
|
319 : |
|
|
point. The function is evaluated at these trial points and its
|
320 : |
|
|
value is compared to its value at the initial point.
|
321 : |
|
|
|
322 : |
|
|
In a maximization problem, all uphill moves are accepted and the
|
323 : |
|
|
algorithm continues from that trial point. Downhill moves may be
|
324 : |
|
|
accepted; the decision is made by the Metropolis criteria. It uses
|
325 : |
|
|
T (temperature) and the size of the downhill move in a probabilistic
|
326 : |
|
|
manner. The bigger T and the smaller size of the downhill move are,
|
327 : |
|
|
the more likely that move will be accepted. If the trial is
|
328 : |
|
|
accepted, the algorithm moves on from that point. If it is rejected,
|
329 : |
|
|
another point is chosen instead for a trial evaluation.
|
330 : |
|
|
|
331 : |
|
|
Each element of VM is periodically adjusted so that half of all
|
332 : |
|
|
function evaluations in that direction are accepted.
|
333 : |
|
|
|
334 : |
|
|
A fall in T is imposed upon the system with the RT variable by
|
335 : |
|
|
T(i+1) = RT*T(i) where i is the ith iteration. Thus, as T declines,
|
336 : |
|
|
downhill moves are less likely to be accepted and the percentage of
|
337 : |
|
|
rejections rise. Given the scheme for the selection for VM, VM falls.
|
338 : |
|
|
Thus, as T declines, VM falls and SA focuses upon the most promising
|
339 : |
|
|
area for optimization.
|
340 : |
|
|
|
341 : |
|
|
The parameters of the point are randomly ordered now and then, to
|
342 : |
|
|
avoid the order of them having an influence on which changes are
|
343 : |
|
|
accepted and which are not. As this means that no two runs are
|
344 : |
|
|
exactly the same (even though all parameters and constants are
|
345 : |
|
|
unchanged), running the program at least twice on every point might
|
346 : |
|
|
be a good idea.
|
347 : |
|
|
|
348 : |
|
|
The importance of the parameter T:
|
349 : |
|
|
|
350 : |
|
|
The parameter T is crucial in using SA successfully. It influences
|
351 : |
|
|
VM, the step length over which the algorithm searches for optima. For
|
352 : |
|
|
a small intial T, the step length may be too small; thus not enough
|
353 : |
|
|
of the function might be evaluated to find the global optima. The
|
354 : |
|
|
user should carefully examine VM to make sure that it is appropriate.
|
355 : |
|
|
The relationship between the initial temperature and the resulting
|
356 : |
|
|
step length is function dependent.
|
357 : |
|
|
|
358 : |
|
|
To determine the starting temperature that is consistent with
|
359 : |
|
|
optimizing a function, it is worthwhile to run a trial run first.
|
360 : |
|
|
Set RT = 1.5 and T = 1.0. With RT > 1.0, the temperature increases
|
361 : |
|
|
and VM rises as well. Then select the T that produces a large enough
|
362 : |
|
|
VM.
|
363 : |
|
|
|
364 : |
|
|
For modifications to the algorithm and many details on its use,
|
365 : |
|
|
(particularly for econometric applications) see Goffe, Ferrier and
|
366 : |
|
|
Rogers, "Global Optimization of Statistical Functions with the
|
367 : |
|
|
Simulated Annealing," Journal of Econometrics (forthcomming). For a
|
368 : |
|
|
pre-publication copy, contact
|
369 : |
|
|
Bill Goffe
|
370 : |
|
|
Department of Economics
|
371 : |
|
|
Southern Methodist University
|
372 : |
|
|
Dallas, TX 75275
|
373 : |
|
|
h2zr1001 @ smuvm1 (Bitnet)
|
374 : |
|
|
h2zr1001 @ vm.cis.smu.edu (Internet)
|
375 : |
|
|
|
376 : |
|
|
As far as possible, the parameters here have the same name as in the
|
377 : |
|
|
description of the algorithm on pp. 266-8 of Corana et al.
|
378 : |
|
|
|
379 : |
|
|
Input Parameters:
|
380 : |
|
|
|
381 : |
|
|
Note: The suggested values generally come from Corana et al. To
|
382 : |
|
|
drastically reduce runtime, see Goffe et al., pp. 17-8 for
|
383 : |
|
|
suggestions on choosing the appropriate RT and NT.
|
384 : |
|
|
|
385 : |
|
|
|
386 : |
|
|
Input Parameters for class Simann:
|
387 : |
|
|
|
388 : |
|
|
netInt (netInterface) netInt manages all netcommunication.
|
389 : |
|
|
netInterface uses the class netCommunication to
|
390 : |
|
|
send/receive data. The class processManager is used
|
391 : |
|
|
to dispatch processes to be used by netInterface.
|
392 : |
|
|
It uses the class netDataControl to store data and
|
393 : |
|
|
keep track of information about status of data
|
394 : |
|
|
concerning netcommunication. It can use the class
|
395 : |
|
|
dataConverter to prepare data before sending and the
|
396 : |
|
|
class dataScaler to scale/unscale data after/before
|
397 : |
|
|
sending.
|
398 : |
|
|
|
399 : |
|
|
MAXIM Denotes whether the function should be maximized or
|
400 : |
|
|
minimized. A true value denotes maximization while a
|
401 : |
|
|
false value denotes minimization.
|
402 : |
|
|
|
403 : |
|
|
SIM_ITER The maximum number of function evaluations. (INT)
|
404 : |
|
|
|
405 : |
|
|
TEMPERATURE The initial temperature. See Goffe et al. for
|
406 : |
|
|
advice. (DP)
|
407 : |
|
|
|
408 : |
|
|
c Vector that controls the step length adjustment. The
|
409 : |
|
|
suggested value for all elements is 2.0. (DP(N))
|
410 : |
|
|
|
411 : |
|
|
vm The step length vector. On input it should encompass
|
412 : |
|
|
the region of interest given the starting value X.
|
413 : |
|
|
For point X(I), the next trial point is selected is
|
414 : |
|
|
from X(I) - VM(I) to X(I) + VM(I). Since VM is
|
415 : |
|
|
adjusted so that about half of all points are
|
416 : |
|
|
accepted, the input value is not very important
|
417 : |
|
|
(i.e. is the value is off, SA adjusts VM to the
|
418 : |
|
|
correct value). (DP(N))
|
419 : |
|
|
|
420 : |
|
|
Other input parameters from mainconstants:
|
421 : |
|
|
|
422 : |
|
|
C_COMPONENT Size of each component of the step length
|
423 : |
|
|
adjustment, that is the value of the components of c.
|
424 : |
|
|
|
425 : |
|
|
VM_COMPONENT The initial value of the components of vm (as the
|
426 : |
|
|
initial value is not considered very important, all
|
427 : |
|
|
components have the same initial value).
|
428 : |
|
|
|
429 : |
|
|
Input parameters from simconstants:
|
430 : |
|
|
|
431 : |
|
|
RT The temperature reduction factor. The value suggested by
|
432 : |
|
|
Corana et al. is .85. See Goffe et al. for more advice. (DP)
|
433 : |
|
|
|
434 : |
|
|
EPS Error tolerance for termination. If the final function
|
435 : |
|
|
values from the last neps temperatures differ from the
|
436 : |
|
|
corresponding value at the current temperature by less than
|
437 : |
|
|
EPS and the final function value at the current temperature
|
438 : |
|
|
differs from the current optimal function value by less than
|
439 : |
|
|
EPS, execution terminates and IER = 0 is returned. (EP)
|
440 : |
|
|
|
441 : |
|
|
NS Number of cycles. After NS*N function evaluations, each
|
442 : |
|
|
element of VM is adjusted so that approximately half of
|
443 : |
|
|
all function evaluations are accepted. The suggested value
|
444 : |
|
|
is 20. (INT)
|
445 : |
|
|
|
446 : |
|
|
NT Number of iterations before temperature reduction. After
|
447 : |
|
|
NT*NS*N function evaluations, temperature (T) is changed
|
448 : |
|
|
by the factor RT. Value suggested by Corana et al. is
|
449 : |
|
|
MAX(100, 5*N). See Goffe et al. for further advice. (INT)
|
450 : |
|
|
|
451 : |
|
|
NEPS Number of final function values used to decide upon termi-
|
452 : |
|
|
nation. See EPS. Suggested value is 4. (INT)
|
453 : |
|
|
|
454 : |
|
|
The upper and lower bounds are part of netInt. If the algorithm
|
455 : |
|
|
chooses X(I) .LT. LB(I) or X(I) .GT. UB(I), I = 1, N, a point from
|
456 : |
|
|
inside is randomly selected. This focuses the algorithm on the
|
457 : |
|
|
region inside UB and LB. Unless the user wishes to concentrate the
|
458 : |
|
|
search to a particular region, UB and LB should be set to very large
|
459 : |
|
|
positive and negative values, respectively. Note that the starting
|
460 : |
|
|
vector X should be inside this region. Also note that LB and UB are
|
461 : |
|
|
fixed in position, while VM is centered on the last accepted trial
|
462 : |
|
|
set of variables that optimizes the function.
|
463 : |
|
|
|
464 : |
|
|
Parameters within the algorithm, that can be used to follow how the
|
465 : |
|
|
algorithm is working:
|
466 : |
|
|
|
467 : |
|
|
x The variables that optimize the function at each time.
|
468 : |
|
|
(DP(N))
|
469 : |
|
|
|
470 : |
|
|
fopt The optimal value of the function at each time. (DP)
|
471 : |
|
|
|
472 : |
|
|
nacc The number of accepted function evaluations. (INT)
|
473 : |
|
|
|
474 : |
|
|
nnew The number of times the algorithm finds a new optimum. (INT)
|
475 : |
|
|
|
476 : |
|
|
nup The number of function values that are better than the last
|
477 : |
|
|
one. (INT)
|
478 : |
|
|
|
479 : |
|
|
ndown The number of accepted function values that are worse than
|
480 : |
|
|
the last one. (INT)
|
481 : |
|
|
|
482 : |
|
|
nrej The number of not accepted function values. (INT)
|
483 : |
|
|
|
484 : |
|
|
nfcnev The total number of function evaluations. In a minor
|
485 : |
|
|
point, note that the first evaluation is not used in the
|
486 : |
|
|
core of the algorithm; it simply initializes the
|
487 : |
|
|
algorithm. (INT).
|
488 : |
|
|
|
489 : |
|
|
nobds The total number of trial function evaluations that
|
490 : |
|
|
would have been out of bounds of LB and UB. Note that
|
491 : |
|
|
a trial point is randomly selected between LB and UB.(INT)
|
492 : |
|
|
|
493 : |
|
|
The user supplied objective function f(x) has to support pvm.
|
494 : |
|
|
Its argument is x - an array of doubles, the point at which
|
495 : |
|
|
f(x) should be calculated and it returns a double. All
|
496 : |
|
|
informations about the objective function are a part of netInt.
|
497 : |
|
|
The following functions from netInt are used:
|
498 : |
|
|
startNewDataGroup(s) Creates a new data group of size s.
|
499 : |
|
|
This group keeps all points sent to
|
500 : |
|
|
the objective function as arguments,
|
501 : |
|
|
which function value they have and
|
502 : |
|
|
an ID for them.
|
503 : |
|
|
stopUsingDataGroup() Destructor for data group.
|
504 : |
|
|
getNumOfVarsInDataGroup() Returns the number of coordinates in
|
505 : |
|
|
the argument for the objective
|
506 : |
|
|
function.
|
507 : |
|
|
getTotalNumProc() Tells, how many processors are
|
508 : |
|
|
available.
|
509 : |
|
|
sendToAllIdleHosts() Sends jobs to all processors
|
510 : |
|
|
available. That is, sends points to
|
511 : |
|
|
all available processors, for
|
512 : |
|
|
evaluating the function values.
|
513 : |
|
|
getNumNotAns() Number of all points that have been
|
514 : |
|
|
sent, where the function value has
|
515 : |
|
|
not yet received.
|
516 : |
|
|
send_receiveAllData() Sends all uncalculated points in
|
517 : |
|
|
the data group to processors and
|
518 : |
|
|
recieves them function values as
|
519 : |
|
|
well.
|
520 : |
|
|
receiveOne() Recieves one function value.
|
521 : |
|
|
receiveAll() Recieves all function value, not yet
|
522 : |
|
|
received.
|
523 : |
|
|
getRecievedID() Gets the ID of the point, which
|
524 : |
|
|
function value was received last.
|
525 : |
|
|
getY( returnID ) Returns the function value of the
|
526 : |
|
|
point with ID returnID.
|
527 : |
|
|
getX( returnID ) Returns the point with ID returnID.
|
528 : |
|
|
getInitialX() Returns the starting point, that is
|
529 : |
|
|
the best point know, before the
|
530 : |
|
|
algorithm starts.
|
531 : |
|
|
getUpperScaleConstant() For getting the upper bounds.
|
532 : |
|
|
getLowerScaleConstant() For getting the lower bounds.
|
533 : |
|
|
setX(x) Sets x as the next point in the data
|
534 : |
|
|
group.
|
535 : |
|
|
setBestX(x) Sets x as the point with the best
|
536 : |
|
|
value found, so far.
|
537 : |
|
|
unscaleX(x) If working with scaled points, this
|
538 : |
|
|
returns the point x unscaled, other-
|
539 : |
|
|
wize no change.
|
540 : |
|
|
|
541 : |
|
|
|
542 : |
|
|
3) BFGS
|
543 : |
|
|
|
544 : |
|
|
ABSTRACT:
|
545 : |
|
|
The BFGS-method is a gradient iterative method that work as follows:
|
546 : |
|
|
We start at some point and successively generate new points, so that
|
547 : |
|
|
the function value decreases in each iteration. This is done by
|
548 : |
|
|
choosing a direction that makes an angle greater than 90 degrees
|
549 : |
|
|
with the gradient of the function.
|
550 : |
|
|
|
551 : |
|
|
Authors
|
552 : |
|
|
Gunnar Stefansson
|
553 : |
|
|
Kristjana Yr Jonsdottir
|
554 : |
|
|
Thordis Linda Thorarinsdottir
|
555 : |
|
|
|
556 : |
|
|
The BFGS-method is one of the Quasi-Newton methods. Quasi-Newton
|
557 : |
|
|
methods are iterative methods which approximate Newton's method
|
558 : |
|
|
without calculating second derivatives. They are of the form
|
559 : |
|
|
x^k+1 = x^k + a^k * d^k,
|
560 : |
|
|
d^k = -D^k \nabla f( x^k ),
|
561 : |
|
|
where D^k is a positive definite matrix, updated using the BFGS
|
562 : |
|
|
updating formula. To find the value of a, an inexact linesearch
|
563 : |
|
|
called the Armijo rule is used. That method finds an a such that
|
564 : |
|
|
Armijo condition is satiefied, which is based on getting a
|
565 : |
|
|
function value that is some amount better than the best know
|
566 : |
|
|
function value. This is done by trying at first an initial
|
567 : |
|
|
stepsize an than reducing the stepsize until a value that
|
568 : |
|
|
satisfies the Armijo condition is found. For details, see D.P.
|
569 : |
|
|
Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont,
|
570 : |
|
|
1999. As the objective function is not necessarily differentiable,
|
571 : |
|
|
the gradient is calculted numerically.
|
572 : |
|
|
|
573 : |
|
|
Input parameters, from file bfgsconstants:
|
574 : |
|
|
|
575 : |
|
|
BFGSPAR 1 for using the BFGS method and 0 for using the steepest
|
576 : |
|
|
decent method. It is sometimes good to use the later one
|
577 : |
|
|
for a few iterations if the point is stuck.
|
578 : |
|
|
|
579 : |
|
|
MAXBFGSITER Maximum number of iterations for each round of the
|
580 : |
|
|
minimization.
|
581 : |
|
|
|
582 : |
|
|
MAXROUNDS Maximum number of rounds made.
|
583 : |
|
|
|
584 : |
|
|
ERRORTOL Error tolerance for the termination criteria.
|
585 : |
|
|
|
586 : |
|
|
XTOL Error tolerance for the point.
|
587 : |
|
|
|
588 : |
|
|
SHANNONSCALING 1 if the direction vector shall be scaled, otherwize 0. If 1,
|
589 : |
|
|
than Shannon Scaling is used to scale the direction vector.
|
590 : |
|
|
|
591 : |
|
|
DIFFICULTGRAD denotes how good the approximation of the gradient should be.
|
592 : |
|
|
If = 0, a first degree polynomial is used. If = 1, the
|
593 : |
|
|
approximation is parabolic and if >= 2, a polynomial of degree
|
594 : |
|
|
four is used. If the line search gives no results, the
|
595 : |
|
|
difficultgrad is increased.
|
596 : |
|
|
|
597 : |
|
|
BETA beta constant for Armijo condition in linesearch. See D.P.
|
598 : |
|
|
Bertsekas, "Nonlinear Programming", Athena Scientific,
|
599 : |
|
|
Belmont, 1999, page 29.
|
600 : |
|
|
|
601 : |
|
|
SIGMA sigma constant for Armijo condition in linesearch. See D.P.
|
602 : |
|
|
Bertsekas, "Nonlinear Programming", Athena Scientific,
|
603 : |
|
|
Belmont, 1999, page 29.
|
604 : |
|
|
|
605 : |
|
|
PRINTING 1 for printing on, and 0 for printing off. If printing is on,
|
606 : |
|
|
the last gradient calulated is printed out to the file
|
607 : |
|
|
"gradient", the last hessian matrix is printed out to the file
|
608 : |
|
|
"hessian" and the best point is printed out to the file
|
609 : |
|
|
"finalvals".
|
610 : |
|
|
The user supplied objective function f(x) has to support pvm.
|
611 : |
|
|
Its argument is x - an array of doubles, the point at which
|
612 : |
|
|
f(x) should be calculated and it returns a double. All
|
613 : |
|
|
informations about the objective function are a part of netInt.
|
614 : |
|
|
The following functions from netInt are used:
|
615 : |
|
|
startNewDataGroup(s) Creates a new data group of size s.
|
616 : |
|
|
This group keeps all points sent to
|
617 : |
|
|
the objective function as arguments,
|
618 : |
|
|
which function value they have and
|
619 : |
|
|
an ID for them.
|
620 : |
|
|
stopUsingDataGroup() Destructor for data group.
|
621 : |
|
|
getNumOfVarsInDataGroup() Returns the number of coordinates in
|
622 : |
|
|
the argument for the objective
|
623 : |
|
|
function.
|
624 : |
|
|
getNumDataItemSet() Number of data items set in data group.
|
625 : |
|
|
getNumDataItemsAnswered() Number of data items set, which have
|
626 : |
|
|
received answer too.
|
627 : |
|
|
dataGroupFull() Returns 1 if not able to add more data to
|
628 : |
|
|
current datagroup, else returns 0.
|
629 : |
|
|
getTotalNumProc() Tells, how many processors are
|
630 : |
|
|
available.
|
631 : |
|
|
getNumFreeProcesses() Returns number of available free processes.
|
632 : |
|
|
getNumNotAns() Number of all points that have been
|
633 : |
|
|
sent, where the function value has
|
634 : |
|
|
not yet received.
|
635 : |
|
|
send_receiveAllData() Sends all uncalculated points in
|
636 : |
|
|
the data group to processors and
|
637 : |
|
|
recieves them function values as
|
638 : |
|
|
well.
|
639 : |
|
|
send_receive_setData(c) Sends, receives and sets new data until
|
640 : |
|
|
condition c is met. Returns ERROR if error,
|
641 : |
|
|
or 1 if condition was met.
|
642 : |
|
|
send_receiveTillCondition(c) Sends and receives data until condition c is
|
643 : |
|
|
met, or there is no more available data in
|
644 : |
|
|
the data group. Returns ERROR if error, 1 if
|
645 : |
|
|
condition was met and 0 if condition not met
|
646 : |
|
|
and all data belonging to datagroup has been
|
647 : |
|
|
sent and received.
|
648 : |
|
|
getRecievedID() Gets the ID of the point, which
|
649 : |
|
|
function value was received last.
|
650 : |
|
|
getY( returnID ) Returns the function value of the
|
651 : |
|
|
point with ID returnID.
|
652 : |
|
|
getNextAnswerY() Returns the function value in the last data
|
653 : |
|
|
pair found while looking for a datapair with
|
654 : |
|
|
an answer.
|
655 : |
|
|
getX( returnID ) Returns the point with ID returnID.
|
656 : |
|
|
getNextAnswerX() Returns the next x found in datagroup which
|
657 : |
|
|
has an answer since last checked for answer.
|
658 : |
|
|
(x0, fx0), (x1, ), (x2, fx2), ...., (xn,fxn).
|
659 : |
|
|
If the datapair (x0, fx0) was last checked
|
660 : |
|
|
for an answer then x will equal x2 and the
|
661 : |
|
|
next datapair that will be looked at for
|
662 : |
|
|
answer will be (x3, fx3) unless the search is
|
663 : |
|
|
reset to start at the beginning.
|
664 : |
|
|
getInitialX() Returns the starting point, that is
|
665 : |
|
|
the best point know, before the
|
666 : |
|
|
algorithm starts.
|
667 : |
|
|
getUpperScaleConstant() For getting the upper bounds.
|
668 : |
|
|
getLowerScaleConstant() For getting the lower bounds.
|
669 : |
|
|
setDataPair(x, f) Adds the datapair (x, f) to datagroup. x will
|
670 : |
|
|
not be sent for processing through the net as
|
671 : |
|
|
f is stored as its answer.
|
672 : |
|
|
setX(x) Sets x as the next point in the data
|
673 : |
|
|
group.
|
674 : |
|
|
setXFirstToSend(x) Adds x to datagroup and puts it first in the
|
675 : |
|
|
queue for sending.
|
676 : |
|
|
setFirstAnsweredData() Initiates a datapointer to the data item which
|
677 : |
|
|
was first set in data group.
|
678 : |
|
|
setBestX(x) Sets x as the point with the best
|
679 : |
|
|
value found, so far.
|
680 : |
|
|
unscaleX(x) If working with scaled points, this
|
681 : |
|
|
returns the point x unscaled, other-
|
682 : |
|
|
wize no change.
|
683 : |
|
|
stopNetComm() Halts all netcommunication, netInt can not
|
684 : |
|
|
send/receive anymore.
|