[mareframe] Annotation of /trunk/paramin-beta/readme.paramin

Annotation of /trunk/paramin-beta/readme.paramin

1 :	agomez	1
2 :			this really really needs updating to match the recent changes
3 :
4 :			This program �paramin� consists of three minimization techniques
5 :			and can only be used with PVM.
6 :
7 :			The techniques are Simulated Annealing, Hooke & Jeeves algorithm
8 :			and the quasi-Newton algorithm BFGS. Simulated annealing is a
9 :			global minimization technique and the other two are local. The
10 :			quasi-Newton uses derivatives.
11 :
12 :			There are five intput files for �paramin�.
13 :
14 :			mainconstants: Constants that involve the program itself.
15 :
16 :			bfgsconstants: Constants that involve the BFGS technique.
17 :			simconstants: Constants that involve Simulated Annealing.
18 :			hjconstants: Constants that involve Hooke & Jeeves technique.
19 :
20 :			initvals: Contains four columns of numbers, the initial point,
21 :			the lower bounds, the upper bounds and whether each
22 :			parameter shall be optimized or not - 1 for
23 :			optimizing and 0 for staying unchanged.
24 :
25 :			To minimize the objective function, write:
26 :
27 :			paramin 0 FunctionName
28 :
29 :			(or �paramin 0 gadget -s -n� for running gadget)
30 :
31 :			If the program uses BFGS and the minimization is convergent the
32 :			program gives the constant bfgsfail the value 0. It writes out
33 :			the minimum to the file �finalvals�, the gradient of the objective
34 :			function to the file �gradient� and the approximated inverse
35 :			Hessian matrix of the objective function to the file �hessian�.
36 :
37 :			If the program does not use BFGS the best point recieved will be
38 :			printed out on the screen with it�s function value.
39 :
40 :			To choose which minimization algorithms to use, give the parameters
41 :			SimulatedAnnealing, HookeAndJeeves and BFGS in mainconstants the
42 :			values 0 or 1; 0 for not in use, 1 for in use.
43 :
44 :			About the minimization algorithms:
45 :
46 :
47 :			1) Hooke and Jeeves:
48 :
49 :			Nonlinear Optimization using the algorithm of Hooke and Jeeves
50 :			12 February 1994
51 :			author: Mark G. Johnson
52 :			August 1999
53 :			changes for parallel use: Kristjana Yr Jonsdottir and
54 :			Thordis Linda Thorarinsdottir
55 :
56 :			Find a point X where the nonlinear function f(X) has a local
57 :			minimum. X is an n-vector and f(X) is a scalar. In mathematical
58 :			notation f: R^n -> R^1. The objective function f() is not
59 :			required to be continuous. Nor does f() need to be differentiable.
60 :			The program does not use or require derivatives of f().
61 :
62 :			The software user supplies three things: a subroutine that computes
63 :			f(X), an initial "starting guess" of the minimum point X, and values
64 :			for the algorithm convergence parameters. Then the program searches
65 :			for a local minimum, beginning from the starting guess, using the
66 :			Direct Search algorithm of Hooke and Jeeves.
67 :
68 :			The original C program is adapted from the Algol pseudocode found in
69 :			"Algorithm 178: Direct Search" by Arthur F
70 :			. Kaupe Jr.,
71 :			Communications of the ACM, Vol 6. p.313 (June 1963). It includes
72 :			the improvements suggested by Bell and Pike (CACM v.9, p. 684, Sept
73 :			1966) and those of Tomlin and Smith, "Remark on Algorithm 178" (CACM
74 :			v.12). The original paper, which I don't recommend as highly as the
75 :			one by A. Kaupe, is: R. Hooke and T. A. Jeeves, "Direct Search
76 :			Solution of Numerical and Statistical Problems", Journal of the ACM,
77 :			Vol. 8, April 1961, pp. 212-229.
78 :
79 :			Calling sequence:
80 :			int hooke.DoSearch( netI )
81 :
82 :			netI {netInterface} netI manages all netcommunication.
83 :			netInterface uses the class netCommunication to
84 :			send/receive data. The class processManager is used
85 :			to dispatch processes to be used by netInterface.
86 :			It uses the class netDataControl to store data and
87 :			keep track of information about status of data
88 :			concerning netcommunication. It can use the class
89 :			dataConverter to prepare data before sending and the
90 :			class dataScaler to scale/unscale data after/before
91 :			sending.
92 :
93 :			As mentioned above, values for the algorithm convergence parameters
94 :			are kept in the input files mainconstants and hjconstants. These
95 :			parameters are:
96 :
97 :			rho {a double} This is a user-supplied convergence
98 :			parameter (more detail below), which should be
99 :			set to a value between 0.0 and 1.0. Larger values
100 :			of rho give greater probability of convergence on
101 :			highly nonlinear functions, at a cost of more
102 :			function evaluations. Smaller values of rho reduces
103 :			the number of evaluations (and the program running
104 :			time), but increases the risk of nonconvergence.
105 :			See below.
106 :			epsilon {a double} This is the criterion for halting the
107 :			search for a minimum. When the algorithm begins
108 :			to make less and less progress on eachiteration,
109 :			it checks the halting criterion: if the stepsize
110 :			is below epsilon, terminate the iteration and
111 :			return the current best estimate of the minimum.
112 :			Larger values of epsilon (such as 1.0e-4) give
113 :			quicker running time, but a less accurate estimate
114 :			of the minimum. Smaller values of epsilon (such as
115 :			1.0e-7) give longer running time, but a more
116 :			accurate estimate of the minimum.
117 :
118 :			MAXITER {an integer} A second halting criterion. If the
119 :			algorithm uses more function evaluations than
120 :			MAXITER, terminate the iteration and return the
121 :			current best estimate of the minimum.
122 :			HOOKE_ITER {an integer} Total function evaluations in all runs
123 :			of the algorithm. If this value is greater than
124 :			MAXITER, repeated runs of the algorithm are done.
125 :			Each run uses at most MAXITER function evaluations
126 :			and the starting point each time is the current
127 :			best point.
128 :
129 :			The user supplied objective function f(x) has to support pvm.
130 :			Its argument is x - an array of doubles, the point at which
131 :			f(x) should be calculated and it returns a double. All
132 :			information about the objective function are a part of netI.
133 :			The following functions from netI are used:
134 :			startNewDataGroup(s) Creates a new data group of size s.
135 :			This group keeps all points sent to
136 :			the objective function as arguments,
137 :			which function value they have and
138 :			an ID for them.
139 :			stopUsingDataGroup() Destructor for data group.
140 :			getNumOfVarsInDataGroup() Returns the number of coordinates in
141 :			the argument for the objective
142 :			function.
143 :			getTotalNumProc() Tells, how many processors are
144 :			available.
145 :			getNumFreeProcesses() Returns number of available free processes.
146 :			sendToAllIdleHosts() Sends jobs to all processors
147 :			available. That is, sends points to
148 :			all available processors, for
149 :			evaluating the function values.
150 :			getNumNotAns() Number of all points that have been
151 :			sent, where the function value has
152 :			not yet received.
153 :			sendOne() Sends one point to the objective
154 :			function.
155 :			send_receiveAllData() Sends all uncalculated points in
156 :			the data group to processors and
157 :			recieves them function values as
158 :			well.
159 :			receiveOne() Recieves one function value.
160 :			receiveAll() Recieves all function value, not yet
161 :			received.
162 :			getRecievedID() Gets the ID of the point, which
163 :			function value was received last.
164 :			getY( returnID ) Returns the function value of the
165 :			point with ID returnID.
166 :			getX( returnID ) Returns the point with ID returnID.
167 :			getInitialX() Returns the starting point, that is
168 :			the best point know, before the
169 :			algorithm starts.
170 :			getUpperScaleConstant() For getting the upper bounds.
171 :			getLowerScaleConstant() For getting the lower bounds.
172 :			setX(x) Sets x as the next point in the data
173 :			group.
174 :			setBestX(x) Sets x as the point with the best
175 :			value found, so far.
176 :
177 :			rho, the algorithm convergence control:
178 :			The algorithm works by taking "steps" from one estimate of a
179 :			minimum, to another (hopefully better) estimate. Taking big steps
180 :			gets to the minimum more quickly, at the risk of "stepping right
181 :			over" an excellent point. The stepsize is controlled by a user
182 :			supplied parameter called rho. At each iteration, the stepsize is
183 :			multiplied by rho (0 < rho < 1), so the stepsize is successively
184 :			reduced.
185 :			Small values of rho correspond to big stepsize changes,
186 :			which make the algorithm run more quickly. However, there is a
187 :			chance (especially with highly nonlinear functions) that these big
188 :			changes will accidentally overlook a promising search vector,
189 :			leading to nonconvergence.
190 :			Large values of rho correspond to small stepsize changes,
191 :			which force the algorithm to carefully examine nearby points instead
192 :			of optimistically forging ahead. This improves the probability of
193 :			convergence.
194 :			The stepsize is reduced until it is equal to (or smaller
195 :			than) epsilon. So the number of iterations performed by Hooke-Jeeves
196 :			is determined by rho and epsilon:
197 :			rho**(number_of_iterations) = epsilon
198 :			In general it is a good idea to set rho to an aggressively
199 :			small value like 0.5 (hoping for fast convergence). Then, if the
200 :			user suspects that the reported minimum is incorrect (or perhaps
201 :			not accurate enough), the program can be run again with a larger
202 :			value of rho such as 0.85, using the result of the first
203 :			minimization as the starting guess to begin the second minimization.
204 :
205 :			The parameters of the point are randomly ordered now and then, to
206 :			avoid the order of them having an influence on which changes are
207 :			accepted and which are not. As this means that no two runs are
208 :			exactly the same (even though all parameters and constants are
209 :			unchanged), running the program at least twice on every point might
210 :			be a good idea.
211 :
212 :			Normal use: (1) Code your function f() in the C language and connect
213 :			it to pvm.
214 :			(2) Install your starting guess {or read it in}.
215 :			(3) Run the program.
216 :			(4) {for the skeptical}: Use the computed minimum
217 :			as the starting point for another run (set the value
218 :			of HOOKE_ITER equal 2*MAXITER).
219 :
220 :			Data Fitting:
221 :			Code your function f() to be the sum of the squares of the errors
222 :			(differences) between the computed values and the measured values.
223 :			Then minimize f() using Hooke-Jeeves.
224 :			EXAMPLE: you have 20 datapoints (ti, yi) and you want to
225 :			find A,B,C such that (Att) + (Bexp(t)) + (Ctan(t))
226 :			fits the data as closely as possible. Then f() is just
227 :			f(x) = SUM (measured_y[i] - ((At[i]t[i]) + (B*exp(t[i]))
228 :			+ (C*tan(t[i]))))^2
229 :			where x[] is a 3-vector consisting of {A, B, C}.
230 :
231 :			The author of this software is M.G. Johnson.
232 :			Permission to use, copy, modify, and distribute this software for
233 :			any purpose without fee is hereby granted, provided that this entire
234 :			notice is included in all copies of any software which is or
235 :			includes a copy or modification of this software and in all copies
236 :			of the supporting documentation for such software. THIS SOFTWARE IS
237 :			BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN
238 :			PARTICULAR, NEITHER THE AUTHOR NOR AT&T MAKE ANY REPRESENTATION OR
239 :			WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE
240 :			OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
241 :
242 :
243 :			2) Simulated Annealing
244 :
245 :			ABSTRACT:
246 :
247 :			Simulated annealing is a global optimization method that
248 :			distinguishes different local optima. Starting from an initial
249 :			point, the algorithm takes a step and the function is evaluated.
250 :			When minimizing a function, any downhill step is accepted and the
251 :			process repeats from this new point. An uphill step may be accepted
252 :			(thus, it can escape from local optima). This uphill decision is
253 :			made by the Metropolis criteria. As the optimization process
254 :			proceeds, the length of the steps decline and the algorithm closes
255 :			in on the global optimum. Since the algorithm makes very few
256 :			assumptions regarding the function to be optimized, it is quite
257 :			robust with respect to non-quadratic surfaces. The degree of
258 :			robustness can be adjusted by the user. In fact, simulated annealing
259 :			can be used as a local optimizer for difficult functions.
260 :
261 :			Author
262 :			h2zr1001@vm.cis.smu.edu
263 :
264 :			Changes for parallel use
265 :			Kristjana Yr Jonsdottir
266 :			Thordis Linda Thorarinsdottir
267 :
268 :
269 :			This code is a pvm version of a translation of a fortran code,
270 :			which is an example of the Corana et al. simulated annealing
271 :			algorithm for multimodal and robust optimization as implemented and
272 :			modified by by Goffe et al.
273 :
274 :			Use the sample function from Judge with the following suggestions
275 :			to get a feel for how SA works. When you've done this, you should be
276 :			ready to use it on most any function with a fair amount of expertise.
277 :			1. Run the program as is to make sure it runs okay. Add some print
278 :			functions and see how it optimizes as temperature (T) falls.
279 :			Notice how the optimal point is reached and how falling T reduces
280 :			VM.
281 :			2. Look through the documentation to SA so the following makes a
282 :			bit of sense. In line with the paper, it shouldn't be that hard
283 :			to figure out. The core of the algorithm is described on pp. 4-6
284 :			and on pp. 28. Also see Corana et al. pp. 264-9.
285 :			3. To see the importance of different temperatures, try starting
286 :			with a very low one (say T = 10E-5). You'll see (i) it never
287 :			escapes from the local optima (in annealing terminology, it
288 :			quenches) & (ii) the step length (VM) will be quite small. This
289 :			is a key part of the algorithm: as temperature (T) falls, step
290 :			length does too. In a minor point here, note how VM is quickly
291 :			reset from its initial value. Thus, the input VM is not very
292 :			important. This is all the more reason to examine VM once the
293 :			algorithm is underway.
294 :			4. To see the effect of different parameters and their effect on
295 :			the speed of the algorithm, try RT = .95 & RT = .1. Notice the
296 :			vastly different speed for optimization. Also try NT = 20. Note
297 :			that this sample function is quite easy to optimize, so it will
298 :			tolerate big changes in these parameters. RT and NT are the
299 :			parameters one should adjust to modify the runtime of the
300 :			algorithm and its robustness.
301 :			5. Try constraining the algorithm with either LB or UB.
302 :
303 :			Synopsis:
304 :			This routine implements the continuous simulated annealing global
305 :			optimization algorithm described in Corana et al.'s article
306 :			"Minimizing Multimodal Functions of Continuous Variables with the
307 :			"Simulated Annealing" Algorithm" in the September 1987 (vol. 13,
308 :			no. 3, pp. 262-280) issue of the ACM Transactions on Mathematical
309 :			Software.
310 :
311 :			A very quick (perhaps too quick) overview of SA:
312 :
313 :			SA tries to find the global optimum of an N dimensional function.
314 :			It moves both up and downhill and as the optimization process
315 :			proceeds, it focuses on the most promising area.
316 :
317 :			To start, it randomly changes one parameter at a time within the
318 :			step length VM (a vector of length N) of the user selected starting
319 :			point. The function is evaluated at these trial points and its
320 :			value is compared to its value at the initial point.
321 :
322 :			In a maximization problem, all uphill moves are accepted and the
323 :			algorithm continues from that trial point. Downhill moves may be
324 :			accepted; the decision is made by the Metropolis criteria. It uses
325 :			T (temperature) and the size of the downhill move in a probabilistic
326 :			manner. The bigger T and the smaller size of the downhill move are,
327 :			the more likely that move will be accepted. If the trial is
328 :			accepted, the algorithm moves on from that point. If it is rejected,
329 :			another point is chosen instead for a trial evaluation.
330 :
331 :			Each element of VM is periodically adjusted so that half of all
332 :			function evaluations in that direction are accepted.
333 :
334 :			A fall in T is imposed upon the system with the RT variable by
335 :			T(i+1) = RT*T(i) where i is the ith iteration. Thus, as T declines,
336 :			downhill moves are less likely to be accepted and the percentage of
337 :			rejections rise. Given the scheme for the selection for VM, VM falls.
338 :			Thus, as T declines, VM falls and SA focuses upon the most promising
339 :			area for optimization.
340 :
341 :			The parameters of the point are randomly ordered now and then, to
342 :			avoid the order of them having an influence on which changes are
343 :			accepted and which are not. As this means that no two runs are
344 :			exactly the same (even though all parameters and constants are
345 :			unchanged), running the program at least twice on every point might
346 :			be a good idea.
347 :
348 :			The importance of the parameter T:
349 :
350 :			The parameter T is crucial in using SA successfully. It influences
351 :			VM, the step length over which the algorithm searches for optima. For
352 :			a small intial T, the step length may be too small; thus not enough
353 :			of the function might be evaluated to find the global optima. The
354 :			user should carefully examine VM to make sure that it is appropriate.
355 :			The relationship between the initial temperature and the resulting
356 :			step length is function dependent.
357 :
358 :			To determine the starting temperature that is consistent with
359 :			optimizing a function, it is worthwhile to run a trial run first.
360 :			Set RT = 1.5 and T = 1.0. With RT > 1.0, the temperature increases
361 :			and VM rises as well. Then select the T that produces a large enough
362 :			VM.
363 :
364 :			For modifications to the algorithm and many details on its use,
365 :			(particularly for econometric applications) see Goffe, Ferrier and
366 :			Rogers, "Global Optimization of Statistical Functions with the
367 :			Simulated Annealing," Journal of Econometrics (forthcomming). For a
368 :			pre-publication copy, contact
369 :			Bill Goffe
370 :			Department of Economics
371 :			Southern Methodist University
372 :			Dallas, TX 75275
373 :			h2zr1001 @ smuvm1 (Bitnet)
374 :			h2zr1001 @ vm.cis.smu.edu (Internet)
375 :
376 :			As far as possible, the parameters here have the same name as in the
377 :			description of the algorithm on pp. 266-8 of Corana et al.
378 :
379 :			Input Parameters:
380 :
381 :			Note: The suggested values generally come from Corana et al. To
382 :			drastically reduce runtime, see Goffe et al., pp. 17-8 for
383 :			suggestions on choosing the appropriate RT and NT.
384 :
385 :
386 :			Input Parameters for class Simann:
387 :
388 :			netInt (netInterface) netInt manages all netcommunication.
389 :			netInterface uses the class netCommunication to
390 :			send/receive data. The class processManager is used
391 :			to dispatch processes to be used by netInterface.
392 :			It uses the class netDataControl to store data and
393 :			keep track of information about status of data
394 :			concerning netcommunication. It can use the class
395 :			dataConverter to prepare data before sending and the
396 :			class dataScaler to scale/unscale data after/before
397 :			sending.
398 :
399 :			MAXIM Denotes whether the function should be maximized or
400 :			minimized. A true value denotes maximization while a
401 :			false value denotes minimization.
402 :
403 :			SIM_ITER The maximum number of function evaluations. (INT)
404 :
405 :			TEMPERATURE The initial temperature. See Goffe et al. for
406 :			advice. (DP)
407 :
408 :			c Vector that controls the step length adjustment. The
409 :			suggested value for all elements is 2.0. (DP(N))
410 :
411 :			vm The step length vector. On input it should encompass
412 :			the region of interest given the starting value X.
413 :			For point X(I), the next trial point is selected is
414 :			from X(I) - VM(I) to X(I) + VM(I). Since VM is
415 :			adjusted so that about half of all points are
416 :			accepted, the input value is not very important
417 :			(i.e. is the value is off, SA adjusts VM to the
418 :			correct value). (DP(N))
419 :
420 :			Other input parameters from mainconstants:
421 :
422 :			C_COMPONENT Size of each component of the step length
423 :			adjustment, that is the value of the components of c.
424 :
425 :			VM_COMPONENT The initial value of the components of vm (as the
426 :			initial value is not considered very important, all
427 :			components have the same initial value).
428 :
429 :			Input parameters from simconstants:
430 :
431 :			RT The temperature reduction factor. The value suggested by
432 :			Corana et al. is .85. See Goffe et al. for more advice. (DP)
433 :
434 :			EPS Error tolerance for termination. If the final function
435 :			values from the last neps temperatures differ from the
436 :			corresponding value at the current temperature by less than
437 :			EPS and the final function value at the current temperature
438 :			differs from the current optimal function value by less than
439 :			EPS, execution terminates and IER = 0 is returned. (EP)
440 :
441 :			NS Number of cycles. After NS*N function evaluations, each
442 :			element of VM is adjusted so that approximately half of
443 :			all function evaluations are accepted. The suggested value
444 :			is 20. (INT)
445 :
446 :			NT Number of iterations before temperature reduction. After
447 :			NTNSN function evaluations, temperature (T) is changed
448 :			by the factor RT. Value suggested by Corana et al. is
449 :			MAX(100, 5*N). See Goffe et al. for further advice. (INT)
450 :
451 :			NEPS Number of final function values used to decide upon termi-
452 :			nation. See EPS. Suggested value is 4. (INT)
453 :
454 :			The upper and lower bounds are part of netInt. If the algorithm
455 :			chooses X(I) .LT. LB(I) or X(I) .GT. UB(I), I = 1, N, a point from
456 :			inside is randomly selected. This focuses the algorithm on the
457 :			region inside UB and LB. Unless the user wishes to concentrate the
458 :			search to a particular region, UB and LB should be set to very large
459 :			positive and negative values, respectively. Note that the starting
460 :			vector X should be inside this region. Also note that LB and UB are
461 :			fixed in position, while VM is centered on the last accepted trial
462 :			set of variables that optimizes the function.
463 :
464 :			Parameters within the algorithm, that can be used to follow how the
465 :			algorithm is working:
466 :
467 :			x The variables that optimize the function at each time.
468 :			(DP(N))
469 :
470 :			fopt The optimal value of the function at each time. (DP)
471 :
472 :			nacc The number of accepted function evaluations. (INT)
473 :
474 :			nnew The number of times the algorithm finds a new optimum. (INT)
475 :
476 :			nup The number of function values that are better than the last
477 :			one. (INT)
478 :
479 :			ndown The number of accepted function values that are worse than
480 :			the last one. (INT)
481 :
482 :			nrej The number of not accepted function values. (INT)
483 :
484 :			nfcnev The total number of function evaluations. In a minor
485 :			point, note that the first evaluation is not used in the
486 :			core of the algorithm; it simply initializes the
487 :			algorithm. (INT).
488 :
489 :			nobds The total number of trial function evaluations that
490 :			would have been out of bounds of LB and UB. Note that
491 :			a trial point is randomly selected between LB and UB.(INT)
492 :
493 :			The user supplied objective function f(x) has to support pvm.
494 :			Its argument is x - an array of doubles, the point at which
495 :			f(x) should be calculated and it returns a double. All
496 :			informations about the objective function are a part of netInt.
497 :			The following functions from netInt are used:
498 :			startNewDataGroup(s) Creates a new data group of size s.
499 :			This group keeps all points sent to
500 :			the objective function as arguments,
501 :			which function value they have and
502 :			an ID for them.
503 :			stopUsingDataGroup() Destructor for data group.
504 :			getNumOfVarsInDataGroup() Returns the number of coordinates in
505 :			the argument for the objective
506 :			function.
507 :			getTotalNumProc() Tells, how many processors are
508 :			available.
509 :			sendToAllIdleHosts() Sends jobs to all processors
510 :			available. That is, sends points to
511 :			all available processors, for
512 :			evaluating the function values.
513 :			getNumNotAns() Number of all points that have been
514 :			sent, where the function value has
515 :			not yet received.
516 :			send_receiveAllData() Sends all uncalculated points in
517 :			the data group to processors and
518 :			recieves them function values as
519 :			well.
520 :			receiveOne() Recieves one function value.
521 :			receiveAll() Recieves all function value, not yet
522 :			received.
523 :			getRecievedID() Gets the ID of the point, which
524 :			function value was received last.
525 :			getY( returnID ) Returns the function value of the
526 :			point with ID returnID.
527 :			getX( returnID ) Returns the point with ID returnID.
528 :			getInitialX() Returns the starting point, that is
529 :			the best point know, before the
530 :			algorithm starts.
531 :			getUpperScaleConstant() For getting the upper bounds.
532 :			getLowerScaleConstant() For getting the lower bounds.
533 :			setX(x) Sets x as the next point in the data
534 :			group.
535 :			setBestX(x) Sets x as the point with the best
536 :			value found, so far.
537 :			unscaleX(x) If working with scaled points, this
538 :			returns the point x unscaled, other-
539 :			wize no change.
540 :
541 :
542 :			3) BFGS
543 :
544 :			ABSTRACT:
545 :			The BFGS-method is a gradient iterative method that work as follows:
546 :			We start at some point and successively generate new points, so that
547 :			the function value decreases in each iteration. This is done by
548 :			choosing a direction that makes an angle greater than 90 degrees
549 :			with the gradient of the function.
550 :
551 :			Authors
552 :			Gunnar Stefansson
553 :			Kristjana Yr Jonsdottir
554 :			Thordis Linda Thorarinsdottir
555 :
556 :			The BFGS-method is one of the Quasi-Newton methods. Quasi-Newton
557 :			methods are iterative methods which approximate Newton's method
558 :			without calculating second derivatives. They are of the form
559 :			x^k+1 = x^k + a^k * d^k,
560 :			d^k = -D^k \nabla f( x^k ),
561 :			where D^k is a positive definite matrix, updated using the BFGS
562 :			updating formula. To find the value of a, an inexact linesearch
563 :			called the Armijo rule is used. That method finds an a such that
564 :			Armijo condition is satiefied, which is based on getting a
565 :			function value that is some amount better than the best know
566 :			function value. This is done by trying at first an initial
567 :			stepsize an than reducing the stepsize until a value that
568 :			satisfies the Armijo condition is found. For details, see D.P.
569 :			Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont,
570 :			1999. As the objective function is not necessarily differentiable,
571 :			the gradient is calculted numerically.
572 :
573 :			Input parameters, from file bfgsconstants:
574 :
575 :			BFGSPAR 1 for using the BFGS method and 0 for using the steepest
576 :			decent method. It is sometimes good to use the later one
577 :			for a few iterations if the point is stuck.
578 :
579 :			MAXBFGSITER Maximum number of iterations for each round of the
580 :			minimization.
581 :
582 :			MAXROUNDS Maximum number of rounds made.
583 :
584 :			ERRORTOL Error tolerance for the termination criteria.
585 :
586 :			XTOL Error tolerance for the point.
587 :
588 :			SHANNONSCALING 1 if the direction vector shall be scaled, otherwize 0. If 1,
589 :			than Shannon Scaling is used to scale the direction vector.
590 :
591 :			DIFFICULTGRAD denotes how good the approximation of the gradient should be.
592 :			If = 0, a first degree polynomial is used. If = 1, the
593 :			approximation is parabolic and if >= 2, a polynomial of degree
594 :			four is used. If the line search gives no results, the
595 :			difficultgrad is increased.
596 :
597 :			BETA beta constant for Armijo condition in linesearch. See D.P.
598 :			Bertsekas, "Nonlinear Programming", Athena Scientific,
599 :			Belmont, 1999, page 29.
600 :
601 :			SIGMA sigma constant for Armijo condition in linesearch. See D.P.
602 :			Bertsekas, "Nonlinear Programming", Athena Scientific,
603 :			Belmont, 1999, page 29.
604 :
605 :			PRINTING 1 for printing on, and 0 for printing off. If printing is on,
606 :			the last gradient calulated is printed out to the file
607 :			"gradient", the last hessian matrix is printed out to the file
608 :			"hessian" and the best point is printed out to the file
609 :			"finalvals".
610 :			The user supplied objective function f(x) has to support pvm.
611 :			Its argument is x - an array of doubles, the point at which
612 :			f(x) should be calculated and it returns a double. All
613 :			informations about the objective function are a part of netInt.
614 :			The following functions from netInt are used:
615 :			startNewDataGroup(s) Creates a new data group of size s.
616 :			This group keeps all points sent to
617 :			the objective function as arguments,
618 :			which function value they have and
619 :			an ID for them.
620 :			stopUsingDataGroup() Destructor for data group.
621 :			getNumOfVarsInDataGroup() Returns the number of coordinates in
622 :			the argument for the objective
623 :			function.
624 :			getNumDataItemSet() Number of data items set in data group.
625 :			getNumDataItemsAnswered() Number of data items set, which have
626 :			received answer too.
627 :			dataGroupFull() Returns 1 if not able to add more data to
628 :			current datagroup, else returns 0.
629 :			getTotalNumProc() Tells, how many processors are
630 :			available.
631 :			getNumFreeProcesses() Returns number of available free processes.
632 :			getNumNotAns() Number of all points that have been
633 :			sent, where the function value has
634 :			not yet received.
635 :			send_receiveAllData() Sends all uncalculated points in
636 :			the data group to processors and
637 :			recieves them function values as
638 :			well.
639 :			send_receive_setData(c) Sends, receives and sets new data until
640 :			condition c is met. Returns ERROR if error,
641 :			or 1 if condition was met.
642 :			send_receiveTillCondition(c) Sends and receives data until condition c is
643 :			met, or there is no more available data in
644 :			the data group. Returns ERROR if error, 1 if
645 :			condition was met and 0 if condition not met
646 :			and all data belonging to datagroup has been
647 :			sent and received.
648 :			getRecievedID() Gets the ID of the point, which
649 :			function value was received last.
650 :			getY( returnID ) Returns the function value of the
651 :			point with ID returnID.
652 :			getNextAnswerY() Returns the function value in the last data
653 :			pair found while looking for a datapair with
654 :			an answer.
655 :			getX( returnID ) Returns the point with ID returnID.
656 :			getNextAnswerX() Returns the next x found in datagroup which
657 :			has an answer since last checked for answer.
658 :			(x0, fx0), (x1, ), (x2, fx2), ...., (xn,fxn).
659 :			If the datapair (x0, fx0) was last checked
660 :			for an answer then x will equal x2 and the
661 :			next datapair that will be looked at for
662 :			answer will be (x3, fx3) unless the search is
663 :			reset to start at the beginning.
664 :			getInitialX() Returns the starting point, that is
665 :			the best point know, before the
666 :			algorithm starts.
667 :			getUpperScaleConstant() For getting the upper bounds.
668 :			getLowerScaleConstant() For getting the lower bounds.
669 :			setDataPair(x, f) Adds the datapair (x, f) to datagroup. x will
670 :			not be sent for processing through the net as
671 :			f is stored as its answer.
672 :			setX(x) Sets x as the next point in the data
673 :			group.
674 :			setXFirstToSend(x) Adds x to datagroup and puts it first in the
675 :			queue for sending.
676 :			setFirstAnsweredData() Initiates a datapointer to the data item which
677 :			was first set in data group.
678 :			setBestX(x) Sets x as the point with the best
679 :			value found, so far.
680 :			unscaleX(x) If working with scaled points, this
681 :			returns the point x unscaled, other-
682 :			wize no change.
683 :			stopNetComm() Halts all netcommunication, netInt can not
684 :			send/receive anymore.

root@forge.cesga.es	ViewVC Help
Powered by ViewVC 1.0.0