Log In | Get Help   
Home My Page Projects Code Snippets Project Openings Mareframe
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files
[mareframe] Annotation of /trunk/paramin-beta/readme.paramin
[mareframe] / trunk / paramin-beta / readme.paramin Repository:
ViewVC logotype

Annotation of /trunk/paramin-beta/readme.paramin

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1 - (view) (download)

1 : agomez 1
2 :     **this really really needs updating to match the recent changes**
3 :    
4 :     This program ´paramin´ consists of three minimization techniques
5 :     and can only be used with PVM.
6 :    
7 :     The techniques are Simulated Annealing, Hooke & Jeeves algorithm
8 :     and the quasi-Newton algorithm BFGS. Simulated annealing is a
9 :     global minimization technique and the other two are local. The
10 :     quasi-Newton uses derivatives.
11 :    
12 :     There are five intput files for ´paramin´.
13 :    
14 :     mainconstants: Constants that involve the program itself.
15 :    
16 :     bfgsconstants: Constants that involve the BFGS technique.
17 :     simconstants: Constants that involve Simulated Annealing.
18 :     hjconstants: Constants that involve Hooke & Jeeves technique.
19 :    
20 :     initvals: Contains four columns of numbers, the initial point,
21 :     the lower bounds, the upper bounds and whether each
22 :     parameter shall be optimized or not - 1 for
23 :     optimizing and 0 for staying unchanged.
24 :    
25 :     To minimize the objective function, write:
26 :    
27 :     paramin 0 FunctionName
28 :    
29 :     (or ´paramin 0 gadget -s -n´ for running gadget)
30 :    
31 :     If the program uses BFGS and the minimization is convergent the
32 :     program gives the constant bfgsfail the value 0. It writes out
33 :     the minimum to the file ´finalvals´, the gradient of the objective
34 :     function to the file ´gradient´ and the approximated inverse
35 :     Hessian matrix of the objective function to the file ´hessian´.
36 :    
37 :     If the program does not use BFGS the best point recieved will be
38 :     printed out on the screen with it´s function value.
39 :    
40 :     To choose which minimization algorithms to use, give the parameters
41 :     SimulatedAnnealing, HookeAndJeeves and BFGS in mainconstants the
42 :     values 0 or 1; 0 for not in use, 1 for in use.
43 :    
44 :     About the minimization algorithms:
45 :    
46 :    
47 :     1) Hooke and Jeeves:
48 :    
49 :     Nonlinear Optimization using the algorithm of Hooke and Jeeves
50 :     12 February 1994
51 :     author: Mark G. Johnson
52 :     August 1999
53 :     changes for parallel use: Kristjana Yr Jonsdottir and
54 :     Thordis Linda Thorarinsdottir
55 :    
56 :     Find a point X where the nonlinear function f(X) has a local
57 :     minimum. X is an n-vector and f(X) is a scalar. In mathematical
58 :     notation f: R^n -> R^1. The objective function f() is not
59 :     required to be continuous. Nor does f() need to be differentiable.
60 :     The program does not use or require derivatives of f().
61 :    
62 :     The software user supplies three things: a subroutine that computes
63 :     f(X), an initial "starting guess" of the minimum point X, and values
64 :     for the algorithm convergence parameters. Then the program searches
65 :     for a local minimum, beginning from the starting guess, using the
66 :     Direct Search algorithm of Hooke and Jeeves.
67 :    
68 :     The original C program is adapted from the Algol pseudocode found in
69 :     "Algorithm 178: Direct Search" by Arthur F
70 :     . Kaupe Jr.,
71 :     Communications of the ACM, Vol 6. p.313 (June 1963). It includes
72 :     the improvements suggested by Bell and Pike (CACM v.9, p. 684, Sept
73 :     1966) and those of Tomlin and Smith, "Remark on Algorithm 178" (CACM
74 :     v.12). The original paper, which I don't recommend as highly as the
75 :     one by A. Kaupe, is: R. Hooke and T. A. Jeeves, "Direct Search
76 :     Solution of Numerical and Statistical Problems", Journal of the ACM,
77 :     Vol. 8, April 1961, pp. 212-229.
78 :    
79 :     Calling sequence:
80 :     int hooke.DoSearch( netI )
81 :    
82 :     netI {netInterface} netI manages all netcommunication.
83 :     netInterface uses the class netCommunication to
84 :     send/receive data. The class processManager is used
85 :     to dispatch processes to be used by netInterface.
86 :     It uses the class netDataControl to store data and
87 :     keep track of information about status of data
88 :     concerning netcommunication. It can use the class
89 :     dataConverter to prepare data before sending and the
90 :     class dataScaler to scale/unscale data after/before
91 :     sending.
92 :    
93 :     As mentioned above, values for the algorithm convergence parameters
94 :     are kept in the input files mainconstants and hjconstants. These
95 :     parameters are:
96 :    
97 :     rho {a double} This is a user-supplied convergence
98 :     parameter (more detail below), which should be
99 :     set to a value between 0.0 and 1.0. Larger values
100 :     of rho give greater probability of convergence on
101 :     highly nonlinear functions, at a cost of more
102 :     function evaluations. Smaller values of rho reduces
103 :     the number of evaluations (and the program running
104 :     time), but increases the risk of nonconvergence.
105 :     See below.
106 :     epsilon {a double} This is the criterion for halting the
107 :     search for a minimum. When the algorithm begins
108 :     to make less and less progress on eachiteration,
109 :     it checks the halting criterion: if the stepsize
110 :     is below epsilon, terminate the iteration and
111 :     return the current best estimate of the minimum.
112 :     Larger values of epsilon (such as 1.0e-4) give
113 :     quicker running time, but a less accurate estimate
114 :     of the minimum. Smaller values of epsilon (such as
115 :     1.0e-7) give longer running time, but a more
116 :     accurate estimate of the minimum.
117 :    
118 :     MAXITER {an integer} A second halting criterion. If the
119 :     algorithm uses more function evaluations than
120 :     MAXITER, terminate the iteration and return the
121 :     current best estimate of the minimum.
122 :     HOOKE_ITER {an integer} Total function evaluations in all runs
123 :     of the algorithm. If this value is greater than
124 :     MAXITER, repeated runs of the algorithm are done.
125 :     Each run uses at most MAXITER function evaluations
126 :     and the starting point each time is the current
127 :     best point.
128 :    
129 :     The user supplied objective function f(x) has to support pvm.
130 :     Its argument is x - an array of doubles, the point at which
131 :     f(x) should be calculated and it returns a double. All
132 :     information about the objective function are a part of netI.
133 :     The following functions from netI are used:
134 :     startNewDataGroup(s) Creates a new data group of size s.
135 :     This group keeps all points sent to
136 :     the objective function as arguments,
137 :     which function value they have and
138 :     an ID for them.
139 :     stopUsingDataGroup() Destructor for data group.
140 :     getNumOfVarsInDataGroup() Returns the number of coordinates in
141 :     the argument for the objective
142 :     function.
143 :     getTotalNumProc() Tells, how many processors are
144 :     available.
145 :     getNumFreeProcesses() Returns number of available free processes.
146 :     sendToAllIdleHosts() Sends jobs to all processors
147 :     available. That is, sends points to
148 :     all available processors, for
149 :     evaluating the function values.
150 :     getNumNotAns() Number of all points that have been
151 :     sent, where the function value has
152 :     not yet received.
153 :     sendOne() Sends one point to the objective
154 :     function.
155 :     send_receiveAllData() Sends all uncalculated points in
156 :     the data group to processors and
157 :     recieves them function values as
158 :     well.
159 :     receiveOne() Recieves one function value.
160 :     receiveAll() Recieves all function value, not yet
161 :     received.
162 :     getRecievedID() Gets the ID of the point, which
163 :     function value was received last.
164 :     getY( returnID ) Returns the function value of the
165 :     point with ID returnID.
166 :     getX( returnID ) Returns the point with ID returnID.
167 :     getInitialX() Returns the starting point, that is
168 :     the best point know, before the
169 :     algorithm starts.
170 :     getUpperScaleConstant() For getting the upper bounds.
171 :     getLowerScaleConstant() For getting the lower bounds.
172 :     setX(x) Sets x as the next point in the data
173 :     group.
174 :     setBestX(x) Sets x as the point with the best
175 :     value found, so far.
176 :    
177 :     rho, the algorithm convergence control:
178 :     The algorithm works by taking "steps" from one estimate of a
179 :     minimum, to another (hopefully better) estimate. Taking big steps
180 :     gets to the minimum more quickly, at the risk of "stepping right
181 :     over" an excellent point. The stepsize is controlled by a user
182 :     supplied parameter called rho. At each iteration, the stepsize is
183 :     multiplied by rho (0 < rho < 1), so the stepsize is successively
184 :     reduced.
185 :     Small values of rho correspond to big stepsize changes,
186 :     which make the algorithm run more quickly. However, there is a
187 :     chance (especially with highly nonlinear functions) that these big
188 :     changes will accidentally overlook a promising search vector,
189 :     leading to nonconvergence.
190 :     Large values of rho correspond to small stepsize changes,
191 :     which force the algorithm to carefully examine nearby points instead
192 :     of optimistically forging ahead. This improves the probability of
193 :     convergence.
194 :     The stepsize is reduced until it is equal to (or smaller
195 :     than) epsilon. So the number of iterations performed by Hooke-Jeeves
196 :     is determined by rho and epsilon:
197 :     rho**(number_of_iterations) = epsilon
198 :     In general it is a good idea to set rho to an aggressively
199 :     small value like 0.5 (hoping for fast convergence). Then, if the
200 :     user suspects that the reported minimum is incorrect (or perhaps
201 :     not accurate enough), the program can be run again with a larger
202 :     value of rho such as 0.85, using the result of the first
203 :     minimization as the starting guess to begin the second minimization.
204 :    
205 :     The parameters of the point are randomly ordered now and then, to
206 :     avoid the order of them having an influence on which changes are
207 :     accepted and which are not. As this means that no two runs are
208 :     exactly the same (even though all parameters and constants are
209 :     unchanged), running the program at least twice on every point might
210 :     be a good idea.
211 :    
212 :     Normal use: (1) Code your function f() in the C language and connect
213 :     it to pvm.
214 :     (2) Install your starting guess {or read it in}.
215 :     (3) Run the program.
216 :     (4) {for the skeptical}: Use the computed minimum
217 :     as the starting point for another run (set the value
218 :     of HOOKE_ITER equal 2*MAXITER).
219 :    
220 :     Data Fitting:
221 :     Code your function f() to be the sum of the squares of the errors
222 :     (differences) between the computed values and the measured values.
223 :     Then minimize f() using Hooke-Jeeves.
224 :     EXAMPLE: you have 20 datapoints (ti, yi) and you want to
225 :     find A,B,C such that (A*t*t) + (B*exp(t)) + (C*tan(t))
226 :     fits the data as closely as possible. Then f() is just
227 :     f(x) = SUM (measured_y[i] - ((A*t[i]*t[i]) + (B*exp(t[i]))
228 :     + (C*tan(t[i]))))^2
229 :     where x[] is a 3-vector consisting of {A, B, C}.
230 :    
231 :     The author of this software is M.G. Johnson.
232 :     Permission to use, copy, modify, and distribute this software for
233 :     any purpose without fee is hereby granted, provided that this entire
234 :     notice is included in all copies of any software which is or
235 :     includes a copy or modification of this software and in all copies
236 :     of the supporting documentation for such software. THIS SOFTWARE IS
237 :     BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN
238 :     PARTICULAR, NEITHER THE AUTHOR NOR AT&T MAKE ANY REPRESENTATION OR
239 :     WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE
240 :     OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
241 :    
242 :    
243 :     2) Simulated Annealing
244 :    
245 :     ABSTRACT:
246 :    
247 :     Simulated annealing is a global optimization method that
248 :     distinguishes different local optima. Starting from an initial
249 :     point, the algorithm takes a step and the function is evaluated.
250 :     When minimizing a function, any downhill step is accepted and the
251 :     process repeats from this new point. An uphill step may be accepted
252 :     (thus, it can escape from local optima). This uphill decision is
253 :     made by the Metropolis criteria. As the optimization process
254 :     proceeds, the length of the steps decline and the algorithm closes
255 :     in on the global optimum. Since the algorithm makes very few
256 :     assumptions regarding the function to be optimized, it is quite
257 :     robust with respect to non-quadratic surfaces. The degree of
258 :     robustness can be adjusted by the user. In fact, simulated annealing
259 :     can be used as a local optimizer for difficult functions.
260 :    
261 :     Author
262 :     h2zr1001@vm.cis.smu.edu
263 :    
264 :     Changes for parallel use
265 :     Kristjana Yr Jonsdottir
266 :     Thordis Linda Thorarinsdottir
267 :    
268 :    
269 :     This code is a pvm version of a translation of a fortran code,
270 :     which is an example of the Corana et al. simulated annealing
271 :     algorithm for multimodal and robust optimization as implemented and
272 :     modified by by Goffe et al.
273 :    
274 :     Use the sample function from Judge with the following suggestions
275 :     to get a feel for how SA works. When you've done this, you should be
276 :     ready to use it on most any function with a fair amount of expertise.
277 :     1. Run the program as is to make sure it runs okay. Add some print
278 :     functions and see how it optimizes as temperature (T) falls.
279 :     Notice how the optimal point is reached and how falling T reduces
280 :     VM.
281 :     2. Look through the documentation to SA so the following makes a
282 :     bit of sense. In line with the paper, it shouldn't be that hard
283 :     to figure out. The core of the algorithm is described on pp. 4-6
284 :     and on pp. 28. Also see Corana et al. pp. 264-9.
285 :     3. To see the importance of different temperatures, try starting
286 :     with a very low one (say T = 10E-5). You'll see (i) it never
287 :     escapes from the local optima (in annealing terminology, it
288 :     quenches) & (ii) the step length (VM) will be quite small. This
289 :     is a key part of the algorithm: as temperature (T) falls, step
290 :     length does too. In a minor point here, note how VM is quickly
291 :     reset from its initial value. Thus, the input VM is not very
292 :     important. This is all the more reason to examine VM once the
293 :     algorithm is underway.
294 :     4. To see the effect of different parameters and their effect on
295 :     the speed of the algorithm, try RT = .95 & RT = .1. Notice the
296 :     vastly different speed for optimization. Also try NT = 20. Note
297 :     that this sample function is quite easy to optimize, so it will
298 :     tolerate big changes in these parameters. RT and NT are the
299 :     parameters one should adjust to modify the runtime of the
300 :     algorithm and its robustness.
301 :     5. Try constraining the algorithm with either LB or UB.
302 :    
303 :     Synopsis:
304 :     This routine implements the continuous simulated annealing global
305 :     optimization algorithm described in Corana et al.'s article
306 :     "Minimizing Multimodal Functions of Continuous Variables with the
307 :     "Simulated Annealing" Algorithm" in the September 1987 (vol. 13,
308 :     no. 3, pp. 262-280) issue of the ACM Transactions on Mathematical
309 :     Software.
310 :    
311 :     A very quick (perhaps too quick) overview of SA:
312 :    
313 :     SA tries to find the global optimum of an N dimensional function.
314 :     It moves both up and downhill and as the optimization process
315 :     proceeds, it focuses on the most promising area.
316 :    
317 :     To start, it randomly changes one parameter at a time within the
318 :     step length VM (a vector of length N) of the user selected starting
319 :     point. The function is evaluated at these trial points and its
320 :     value is compared to its value at the initial point.
321 :    
322 :     In a maximization problem, all uphill moves are accepted and the
323 :     algorithm continues from that trial point. Downhill moves may be
324 :     accepted; the decision is made by the Metropolis criteria. It uses
325 :     T (temperature) and the size of the downhill move in a probabilistic
326 :     manner. The bigger T and the smaller size of the downhill move are,
327 :     the more likely that move will be accepted. If the trial is
328 :     accepted, the algorithm moves on from that point. If it is rejected,
329 :     another point is chosen instead for a trial evaluation.
330 :    
331 :     Each element of VM is periodically adjusted so that half of all
332 :     function evaluations in that direction are accepted.
333 :    
334 :     A fall in T is imposed upon the system with the RT variable by
335 :     T(i+1) = RT*T(i) where i is the ith iteration. Thus, as T declines,
336 :     downhill moves are less likely to be accepted and the percentage of
337 :     rejections rise. Given the scheme for the selection for VM, VM falls.
338 :     Thus, as T declines, VM falls and SA focuses upon the most promising
339 :     area for optimization.
340 :    
341 :     The parameters of the point are randomly ordered now and then, to
342 :     avoid the order of them having an influence on which changes are
343 :     accepted and which are not. As this means that no two runs are
344 :     exactly the same (even though all parameters and constants are
345 :     unchanged), running the program at least twice on every point might
346 :     be a good idea.
347 :    
348 :     The importance of the parameter T:
349 :    
350 :     The parameter T is crucial in using SA successfully. It influences
351 :     VM, the step length over which the algorithm searches for optima. For
352 :     a small intial T, the step length may be too small; thus not enough
353 :     of the function might be evaluated to find the global optima. The
354 :     user should carefully examine VM to make sure that it is appropriate.
355 :     The relationship between the initial temperature and the resulting
356 :     step length is function dependent.
357 :    
358 :     To determine the starting temperature that is consistent with
359 :     optimizing a function, it is worthwhile to run a trial run first.
360 :     Set RT = 1.5 and T = 1.0. With RT > 1.0, the temperature increases
361 :     and VM rises as well. Then select the T that produces a large enough
362 :     VM.
363 :    
364 :     For modifications to the algorithm and many details on its use,
365 :     (particularly for econometric applications) see Goffe, Ferrier and
366 :     Rogers, "Global Optimization of Statistical Functions with the
367 :     Simulated Annealing," Journal of Econometrics (forthcomming). For a
368 :     pre-publication copy, contact
369 :     Bill Goffe
370 :     Department of Economics
371 :     Southern Methodist University
372 :     Dallas, TX 75275
373 :     h2zr1001 @ smuvm1 (Bitnet)
374 :     h2zr1001 @ vm.cis.smu.edu (Internet)
375 :    
376 :     As far as possible, the parameters here have the same name as in the
377 :     description of the algorithm on pp. 266-8 of Corana et al.
378 :    
379 :     Input Parameters:
380 :    
381 :     Note: The suggested values generally come from Corana et al. To
382 :     drastically reduce runtime, see Goffe et al., pp. 17-8 for
383 :     suggestions on choosing the appropriate RT and NT.
384 :    
385 :    
386 :     Input Parameters for class Simann:
387 :    
388 :     netInt (netInterface) netInt manages all netcommunication.
389 :     netInterface uses the class netCommunication to
390 :     send/receive data. The class processManager is used
391 :     to dispatch processes to be used by netInterface.
392 :     It uses the class netDataControl to store data and
393 :     keep track of information about status of data
394 :     concerning netcommunication. It can use the class
395 :     dataConverter to prepare data before sending and the
396 :     class dataScaler to scale/unscale data after/before
397 :     sending.
398 :    
399 :     MAXIM Denotes whether the function should be maximized or
400 :     minimized. A true value denotes maximization while a
401 :     false value denotes minimization.
402 :    
403 :     SIM_ITER The maximum number of function evaluations. (INT)
404 :    
405 :     TEMPERATURE The initial temperature. See Goffe et al. for
406 :     advice. (DP)
407 :    
408 :     c Vector that controls the step length adjustment. The
409 :     suggested value for all elements is 2.0. (DP(N))
410 :    
411 :     vm The step length vector. On input it should encompass
412 :     the region of interest given the starting value X.
413 :     For point X(I), the next trial point is selected is
414 :     from X(I) - VM(I) to X(I) + VM(I). Since VM is
415 :     adjusted so that about half of all points are
416 :     accepted, the input value is not very important
417 :     (i.e. is the value is off, SA adjusts VM to the
418 :     correct value). (DP(N))
419 :    
420 :     Other input parameters from mainconstants:
421 :    
422 :     C_COMPONENT Size of each component of the step length
423 :     adjustment, that is the value of the components of c.
424 :    
425 :     VM_COMPONENT The initial value of the components of vm (as the
426 :     initial value is not considered very important, all
427 :     components have the same initial value).
428 :    
429 :     Input parameters from simconstants:
430 :    
431 :     RT The temperature reduction factor. The value suggested by
432 :     Corana et al. is .85. See Goffe et al. for more advice. (DP)
433 :    
434 :     EPS Error tolerance for termination. If the final function
435 :     values from the last neps temperatures differ from the
436 :     corresponding value at the current temperature by less than
437 :     EPS and the final function value at the current temperature
438 :     differs from the current optimal function value by less than
439 :     EPS, execution terminates and IER = 0 is returned. (EP)
440 :    
441 :     NS Number of cycles. After NS*N function evaluations, each
442 :     element of VM is adjusted so that approximately half of
443 :     all function evaluations are accepted. The suggested value
444 :     is 20. (INT)
445 :    
446 :     NT Number of iterations before temperature reduction. After
447 :     NT*NS*N function evaluations, temperature (T) is changed
448 :     by the factor RT. Value suggested by Corana et al. is
449 :     MAX(100, 5*N). See Goffe et al. for further advice. (INT)
450 :    
451 :     NEPS Number of final function values used to decide upon termi-
452 :     nation. See EPS. Suggested value is 4. (INT)
453 :    
454 :     The upper and lower bounds are part of netInt. If the algorithm
455 :     chooses X(I) .LT. LB(I) or X(I) .GT. UB(I), I = 1, N, a point from
456 :     inside is randomly selected. This focuses the algorithm on the
457 :     region inside UB and LB. Unless the user wishes to concentrate the
458 :     search to a particular region, UB and LB should be set to very large
459 :     positive and negative values, respectively. Note that the starting
460 :     vector X should be inside this region. Also note that LB and UB are
461 :     fixed in position, while VM is centered on the last accepted trial
462 :     set of variables that optimizes the function.
463 :    
464 :     Parameters within the algorithm, that can be used to follow how the
465 :     algorithm is working:
466 :    
467 :     x The variables that optimize the function at each time.
468 :     (DP(N))
469 :    
470 :     fopt The optimal value of the function at each time. (DP)
471 :    
472 :     nacc The number of accepted function evaluations. (INT)
473 :    
474 :     nnew The number of times the algorithm finds a new optimum. (INT)
475 :    
476 :     nup The number of function values that are better than the last
477 :     one. (INT)
478 :    
479 :     ndown The number of accepted function values that are worse than
480 :     the last one. (INT)
481 :    
482 :     nrej The number of not accepted function values. (INT)
483 :    
484 :     nfcnev The total number of function evaluations. In a minor
485 :     point, note that the first evaluation is not used in the
486 :     core of the algorithm; it simply initializes the
487 :     algorithm. (INT).
488 :    
489 :     nobds The total number of trial function evaluations that
490 :     would have been out of bounds of LB and UB. Note that
491 :     a trial point is randomly selected between LB and UB.(INT)
492 :    
493 :     The user supplied objective function f(x) has to support pvm.
494 :     Its argument is x - an array of doubles, the point at which
495 :     f(x) should be calculated and it returns a double. All
496 :     informations about the objective function are a part of netInt.
497 :     The following functions from netInt are used:
498 :     startNewDataGroup(s) Creates a new data group of size s.
499 :     This group keeps all points sent to
500 :     the objective function as arguments,
501 :     which function value they have and
502 :     an ID for them.
503 :     stopUsingDataGroup() Destructor for data group.
504 :     getNumOfVarsInDataGroup() Returns the number of coordinates in
505 :     the argument for the objective
506 :     function.
507 :     getTotalNumProc() Tells, how many processors are
508 :     available.
509 :     sendToAllIdleHosts() Sends jobs to all processors
510 :     available. That is, sends points to
511 :     all available processors, for
512 :     evaluating the function values.
513 :     getNumNotAns() Number of all points that have been
514 :     sent, where the function value has
515 :     not yet received.
516 :     send_receiveAllData() Sends all uncalculated points in
517 :     the data group to processors and
518 :     recieves them function values as
519 :     well.
520 :     receiveOne() Recieves one function value.
521 :     receiveAll() Recieves all function value, not yet
522 :     received.
523 :     getRecievedID() Gets the ID of the point, which
524 :     function value was received last.
525 :     getY( returnID ) Returns the function value of the
526 :     point with ID returnID.
527 :     getX( returnID ) Returns the point with ID returnID.
528 :     getInitialX() Returns the starting point, that is
529 :     the best point know, before the
530 :     algorithm starts.
531 :     getUpperScaleConstant() For getting the upper bounds.
532 :     getLowerScaleConstant() For getting the lower bounds.
533 :     setX(x) Sets x as the next point in the data
534 :     group.
535 :     setBestX(x) Sets x as the point with the best
536 :     value found, so far.
537 :     unscaleX(x) If working with scaled points, this
538 :     returns the point x unscaled, other-
539 :     wize no change.
540 :    
541 :    
542 :     3) BFGS
543 :    
544 :     ABSTRACT:
545 :     The BFGS-method is a gradient iterative method that work as follows:
546 :     We start at some point and successively generate new points, so that
547 :     the function value decreases in each iteration. This is done by
548 :     choosing a direction that makes an angle greater than 90 degrees
549 :     with the gradient of the function.
550 :    
551 :     Authors
552 :     Gunnar Stefansson
553 :     Kristjana Yr Jonsdottir
554 :     Thordis Linda Thorarinsdottir
555 :    
556 :     The BFGS-method is one of the Quasi-Newton methods. Quasi-Newton
557 :     methods are iterative methods which approximate Newton's method
558 :     without calculating second derivatives. They are of the form
559 :     x^k+1 = x^k + a^k * d^k,
560 :     d^k = -D^k \nabla f( x^k ),
561 :     where D^k is a positive definite matrix, updated using the BFGS
562 :     updating formula. To find the value of a, an inexact linesearch
563 :     called the Armijo rule is used. That method finds an a such that
564 :     Armijo condition is satiefied, which is based on getting a
565 :     function value that is some amount better than the best know
566 :     function value. This is done by trying at first an initial
567 :     stepsize an than reducing the stepsize until a value that
568 :     satisfies the Armijo condition is found. For details, see D.P.
569 :     Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont,
570 :     1999. As the objective function is not necessarily differentiable,
571 :     the gradient is calculted numerically.
572 :    
573 :     Input parameters, from file bfgsconstants:
574 :    
575 :     BFGSPAR 1 for using the BFGS method and 0 for using the steepest
576 :     decent method. It is sometimes good to use the later one
577 :     for a few iterations if the point is stuck.
578 :    
579 :     MAXBFGSITER Maximum number of iterations for each round of the
580 :     minimization.
581 :    
582 :     MAXROUNDS Maximum number of rounds made.
583 :    
584 :     ERRORTOL Error tolerance for the termination criteria.
585 :    
586 :     XTOL Error tolerance for the point.
587 :    
588 :     SHANNONSCALING 1 if the direction vector shall be scaled, otherwize 0. If 1,
589 :     than Shannon Scaling is used to scale the direction vector.
590 :    
591 :     DIFFICULTGRAD denotes how good the approximation of the gradient should be.
592 :     If = 0, a first degree polynomial is used. If = 1, the
593 :     approximation is parabolic and if >= 2, a polynomial of degree
594 :     four is used. If the line search gives no results, the
595 :     difficultgrad is increased.
596 :    
597 :     BETA beta constant for Armijo condition in linesearch. See D.P.
598 :     Bertsekas, "Nonlinear Programming", Athena Scientific,
599 :     Belmont, 1999, page 29.
600 :    
601 :     SIGMA sigma constant for Armijo condition in linesearch. See D.P.
602 :     Bertsekas, "Nonlinear Programming", Athena Scientific,
603 :     Belmont, 1999, page 29.
604 :    
605 :     PRINTING 1 for printing on, and 0 for printing off. If printing is on,
606 :     the last gradient calulated is printed out to the file
607 :     "gradient", the last hessian matrix is printed out to the file
608 :     "hessian" and the best point is printed out to the file
609 :     "finalvals".
610 :     The user supplied objective function f(x) has to support pvm.
611 :     Its argument is x - an array of doubles, the point at which
612 :     f(x) should be calculated and it returns a double. All
613 :     informations about the objective function are a part of netInt.
614 :     The following functions from netInt are used:
615 :     startNewDataGroup(s) Creates a new data group of size s.
616 :     This group keeps all points sent to
617 :     the objective function as arguments,
618 :     which function value they have and
619 :     an ID for them.
620 :     stopUsingDataGroup() Destructor for data group.
621 :     getNumOfVarsInDataGroup() Returns the number of coordinates in
622 :     the argument for the objective
623 :     function.
624 :     getNumDataItemSet() Number of data items set in data group.
625 :     getNumDataItemsAnswered() Number of data items set, which have
626 :     received answer too.
627 :     dataGroupFull() Returns 1 if not able to add more data to
628 :     current datagroup, else returns 0.
629 :     getTotalNumProc() Tells, how many processors are
630 :     available.
631 :     getNumFreeProcesses() Returns number of available free processes.
632 :     getNumNotAns() Number of all points that have been
633 :     sent, where the function value has
634 :     not yet received.
635 :     send_receiveAllData() Sends all uncalculated points in
636 :     the data group to processors and
637 :     recieves them function values as
638 :     well.
639 :     send_receive_setData(c) Sends, receives and sets new data until
640 :     condition c is met. Returns ERROR if error,
641 :     or 1 if condition was met.
642 :     send_receiveTillCondition(c) Sends and receives data until condition c is
643 :     met, or there is no more available data in
644 :     the data group. Returns ERROR if error, 1 if
645 :     condition was met and 0 if condition not met
646 :     and all data belonging to datagroup has been
647 :     sent and received.
648 :     getRecievedID() Gets the ID of the point, which
649 :     function value was received last.
650 :     getY( returnID ) Returns the function value of the
651 :     point with ID returnID.
652 :     getNextAnswerY() Returns the function value in the last data
653 :     pair found while looking for a datapair with
654 :     an answer.
655 :     getX( returnID ) Returns the point with ID returnID.
656 :     getNextAnswerX() Returns the next x found in datagroup which
657 :     has an answer since last checked for answer.
658 :     (x0, fx0), (x1, ), (x2, fx2), ...., (xn,fxn).
659 :     If the datapair (x0, fx0) was last checked
660 :     for an answer then x will equal x2 and the
661 :     next datapair that will be looked at for
662 :     answer will be (x3, fx3) unless the search is
663 :     reset to start at the beginning.
664 :     getInitialX() Returns the starting point, that is
665 :     the best point know, before the
666 :     algorithm starts.
667 :     getUpperScaleConstant() For getting the upper bounds.
668 :     getLowerScaleConstant() For getting the lower bounds.
669 :     setDataPair(x, f) Adds the datapair (x, f) to datagroup. x will
670 :     not be sent for processing through the net as
671 :     f is stored as its answer.
672 :     setX(x) Sets x as the next point in the data
673 :     group.
674 :     setXFirstToSend(x) Adds x to datagroup and puts it first in the
675 :     queue for sending.
676 :     setFirstAnsweredData() Initiates a datapointer to the data item which
677 :     was first set in data group.
678 :     setBestX(x) Sets x as the point with the best
679 :     value found, so far.
680 :     unscaleX(x) If working with scaled points, this
681 :     returns the point x unscaled, other-
682 :     wize no change.
683 :     stopNetComm() Halts all netcommunication, netInt can not
684 :     send/receive anymore.

root@forge.cesga.es
ViewVC Help
Powered by ViewVC 1.0.0  

Powered By FusionForge