**this really really needs updating to match the recent changes** This program ´paramin´ consists of three minimization techniques and can only be used with PVM. The techniques are Simulated Annealing, Hooke & Jeeves algorithm and the quasi-Newton algorithm BFGS. Simulated annealing is a global minimization technique and the other two are local. The quasi-Newton uses derivatives. There are five intput files for ´paramin´. mainconstants: Constants that involve the program itself. bfgsconstants: Constants that involve the BFGS technique. simconstants: Constants that involve Simulated Annealing. hjconstants: Constants that involve Hooke & Jeeves technique. initvals: Contains four columns of numbers, the initial point, the lower bounds, the upper bounds and whether each parameter shall be optimized or not - 1 for optimizing and 0 for staying unchanged. To minimize the objective function, write: paramin 0 FunctionName (or ´paramin 0 gadget -s -n´ for running gadget) If the program uses BFGS and the minimization is convergent the program gives the constant bfgsfail the value 0. It writes out the minimum to the file ´finalvals´, the gradient of the objective function to the file ´gradient´ and the approximated inverse Hessian matrix of the objective function to the file ´hessian´. If the program does not use BFGS the best point recieved will be printed out on the screen with it´s function value. To choose which minimization algorithms to use, give the parameters SimulatedAnnealing, HookeAndJeeves and BFGS in mainconstants the values 0 or 1; 0 for not in use, 1 for in use. About the minimization algorithms: 1) Hooke and Jeeves: Nonlinear Optimization using the algorithm of Hooke and Jeeves 12 February 1994 author: Mark G. Johnson August 1999 changes for parallel use: Kristjana Yr Jonsdottir and Thordis Linda Thorarinsdottir Find a point X where the nonlinear function f(X) has a local minimum. X is an n-vector and f(X) is a scalar. In mathematical notation f: R^n -> R^1. The objective function f() is not required to be continuous. Nor does f() need to be differentiable. The program does not use or require derivatives of f(). The software user supplies three things: a subroutine that computes f(X), an initial "starting guess" of the minimum point X, and values for the algorithm convergence parameters. Then the program searches for a local minimum, beginning from the starting guess, using the Direct Search algorithm of Hooke and Jeeves. The original C program is adapted from the Algol pseudocode found in "Algorithm 178: Direct Search" by Arthur F . Kaupe Jr., Communications of the ACM, Vol 6. p.313 (June 1963). It includes the improvements suggested by Bell and Pike (CACM v.9, p. 684, Sept 1966) and those of Tomlin and Smith, "Remark on Algorithm 178" (CACM v.12). The original paper, which I don't recommend as highly as the one by A. Kaupe, is: R. Hooke and T. A. Jeeves, "Direct Search Solution of Numerical and Statistical Problems", Journal of the ACM, Vol. 8, April 1961, pp. 212-229. Calling sequence: int hooke.DoSearch( netI ) netI {netInterface} netI manages all netcommunication. netInterface uses the class netCommunication to send/receive data. The class processManager is used to dispatch processes to be used by netInterface. It uses the class netDataControl to store data and keep track of information about status of data concerning netcommunication. It can use the class dataConverter to prepare data before sending and the class dataScaler to scale/unscale data after/before sending. As mentioned above, values for the algorithm convergence parameters are kept in the input files mainconstants and hjconstants. These parameters are: rho {a double} This is a user-supplied convergence parameter (more detail below), which should be set to a value between 0.0 and 1.0. Larger values of rho give greater probability of convergence on highly nonlinear functions, at a cost of more function evaluations. Smaller values of rho reduces the number of evaluations (and the program running time), but increases the risk of nonconvergence. See below. epsilon {a double} This is the criterion for halting the search for a minimum. When the algorithm begins to make less and less progress on eachiteration, it checks the halting criterion: if the stepsize is below epsilon, terminate the iteration and return the current best estimate of the minimum. Larger values of epsilon (such as 1.0e-4) give quicker running time, but a less accurate estimate of the minimum. Smaller values of epsilon (such as 1.0e-7) give longer running time, but a more accurate estimate of the minimum. MAXITER {an integer} A second halting criterion. If the algorithm uses more function evaluations than MAXITER, terminate the iteration and return the current best estimate of the minimum. HOOKE_ITER {an integer} Total function evaluations in all runs of the algorithm. If this value is greater than MAXITER, repeated runs of the algorithm are done. Each run uses at most MAXITER function evaluations and the starting point each time is the current best point. The user supplied objective function f(x) has to support pvm. Its argument is x - an array of doubles, the point at which f(x) should be calculated and it returns a double. All information about the objective function are a part of netI. The following functions from netI are used: startNewDataGroup(s) Creates a new data group of size s. This group keeps all points sent to the objective function as arguments, which function value they have and an ID for them. stopUsingDataGroup() Destructor for data group. getNumOfVarsInDataGroup() Returns the number of coordinates in the argument for the objective function. getTotalNumProc() Tells, how many processors are available. getNumFreeProcesses() Returns number of available free processes. sendToAllIdleHosts() Sends jobs to all processors available. That is, sends points to all available processors, for evaluating the function values. getNumNotAns() Number of all points that have been sent, where the function value has not yet received. sendOne() Sends one point to the objective function. send_receiveAllData() Sends all uncalculated points in the data group to processors and recieves them function values as well. receiveOne() Recieves one function value. receiveAll() Recieves all function value, not yet received. getRecievedID() Gets the ID of the point, which function value was received last. getY( returnID ) Returns the function value of the point with ID returnID. getX( returnID ) Returns the point with ID returnID. getInitialX() Returns the starting point, that is the best point know, before the algorithm starts. getUpperScaleConstant() For getting the upper bounds. getLowerScaleConstant() For getting the lower bounds. setX(x) Sets x as the next point in the data group. setBestX(x) Sets x as the point with the best value found, so far. rho, the algorithm convergence control: The algorithm works by taking "steps" from one estimate of a minimum, to another (hopefully better) estimate. Taking big steps gets to the minimum more quickly, at the risk of "stepping right over" an excellent point. The stepsize is controlled by a user supplied parameter called rho. At each iteration, the stepsize is multiplied by rho (0 < rho < 1), so the stepsize is successively reduced. Small values of rho correspond to big stepsize changes, which make the algorithm run more quickly. However, there is a chance (especially with highly nonlinear functions) that these big changes will accidentally overlook a promising search vector, leading to nonconvergence. Large values of rho correspond to small stepsize changes, which force the algorithm to carefully examine nearby points instead of optimistically forging ahead. This improves the probability of convergence. The stepsize is reduced until it is equal to (or smaller than) epsilon. So the number of iterations performed by Hooke-Jeeves is determined by rho and epsilon: rho**(number_of_iterations) = epsilon In general it is a good idea to set rho to an aggressively small value like 0.5 (hoping for fast convergence). Then, if the user suspects that the reported minimum is incorrect (or perhaps not accurate enough), the program can be run again with a larger value of rho such as 0.85, using the result of the first minimization as the starting guess to begin the second minimization. The parameters of the point are randomly ordered now and then, to avoid the order of them having an influence on which changes are accepted and which are not. As this means that no two runs are exactly the same (even though all parameters and constants are unchanged), running the program at least twice on every point might be a good idea. Normal use: (1) Code your function f() in the C language and connect it to pvm. (2) Install your starting guess {or read it in}. (3) Run the program. (4) {for the skeptical}: Use the computed minimum as the starting point for another run (set the value of HOOKE_ITER equal 2*MAXITER). Data Fitting: Code your function f() to be the sum of the squares of the errors (differences) between the computed values and the measured values. Then minimize f() using Hooke-Jeeves. EXAMPLE: you have 20 datapoints (ti, yi) and you want to find A,B,C such that (A*t*t) + (B*exp(t)) + (C*tan(t)) fits the data as closely as possible. Then f() is just f(x) = SUM (measured_y[i] - ((A*t[i]*t[i]) + (B*exp(t[i])) + (C*tan(t[i]))))^2 where x[] is a 3-vector consisting of {A, B, C}. The author of this software is M.G. Johnson. Permission to use, copy, modify, and distribute this software for any purpose without fee is hereby granted, provided that this entire notice is included in all copies of any software which is or includes a copy or modification of this software and in all copies of the supporting documentation for such software. THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN PARTICULAR, NEITHER THE AUTHOR NOR AT&T MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE. 2) Simulated Annealing ABSTRACT: Simulated annealing is a global optimization method that distinguishes different local optima. Starting from an initial point, the algorithm takes a step and the function is evaluated. When minimizing a function, any downhill step is accepted and the process repeats from this new point. An uphill step may be accepted (thus, it can escape from local optima). This uphill decision is made by the Metropolis criteria. As the optimization process proceeds, the length of the steps decline and the algorithm closes in on the global optimum. Since the algorithm makes very few assumptions regarding the function to be optimized, it is quite robust with respect to non-quadratic surfaces. The degree of robustness can be adjusted by the user. In fact, simulated annealing can be used as a local optimizer for difficult functions. Author h2zr1001@vm.cis.smu.edu Changes for parallel use Kristjana Yr Jonsdottir Thordis Linda Thorarinsdottir This code is a pvm version of a translation of a fortran code, which is an example of the Corana et al. simulated annealing algorithm for multimodal and robust optimization as implemented and modified by by Goffe et al. Use the sample function from Judge with the following suggestions to get a feel for how SA works. When you've done this, you should be ready to use it on most any function with a fair amount of expertise. 1. Run the program as is to make sure it runs okay. Add some print functions and see how it optimizes as temperature (T) falls. Notice how the optimal point is reached and how falling T reduces VM. 2. Look through the documentation to SA so the following makes a bit of sense. In line with the paper, it shouldn't be that hard to figure out. The core of the algorithm is described on pp. 4-6 and on pp. 28. Also see Corana et al. pp. 264-9. 3. To see the importance of different temperatures, try starting with a very low one (say T = 10E-5). You'll see (i) it never escapes from the local optima (in annealing terminology, it quenches) & (ii) the step length (VM) will be quite small. This is a key part of the algorithm: as temperature (T) falls, step length does too. In a minor point here, note how VM is quickly reset from its initial value. Thus, the input VM is not very important. This is all the more reason to examine VM once the algorithm is underway. 4. To see the effect of different parameters and their effect on the speed of the algorithm, try RT = .95 & RT = .1. Notice the vastly different speed for optimization. Also try NT = 20. Note that this sample function is quite easy to optimize, so it will tolerate big changes in these parameters. RT and NT are the parameters one should adjust to modify the runtime of the algorithm and its robustness. 5. Try constraining the algorithm with either LB or UB. Synopsis: This routine implements the continuous simulated annealing global optimization algorithm described in Corana et al.'s article "Minimizing Multimodal Functions of Continuous Variables with the "Simulated Annealing" Algorithm" in the September 1987 (vol. 13, no. 3, pp. 262-280) issue of the ACM Transactions on Mathematical Software. A very quick (perhaps too quick) overview of SA: SA tries to find the global optimum of an N dimensional function. It moves both up and downhill and as the optimization process proceeds, it focuses on the most promising area. To start, it randomly changes one parameter at a time within the step length VM (a vector of length N) of the user selected starting point. The function is evaluated at these trial points and its value is compared to its value at the initial point. In a maximization problem, all uphill moves are accepted and the algorithm continues from that trial point. Downhill moves may be accepted; the decision is made by the Metropolis criteria. It uses T (temperature) and the size of the downhill move in a probabilistic manner. The bigger T and the smaller size of the downhill move are, the more likely that move will be accepted. If the trial is accepted, the algorithm moves on from that point. If it is rejected, another point is chosen instead for a trial evaluation. Each element of VM is periodically adjusted so that half of all function evaluations in that direction are accepted. A fall in T is imposed upon the system with the RT variable by T(i+1) = RT*T(i) where i is the ith iteration. Thus, as T declines, downhill moves are less likely to be accepted and the percentage of rejections rise. Given the scheme for the selection for VM, VM falls. Thus, as T declines, VM falls and SA focuses upon the most promising area for optimization. The parameters of the point are randomly ordered now and then, to avoid the order of them having an influence on which changes are accepted and which are not. As this means that no two runs are exactly the same (even though all parameters and constants are unchanged), running the program at least twice on every point might be a good idea. The importance of the parameter T: The parameter T is crucial in using SA successfully. It influences VM, the step length over which the algorithm searches for optima. For a small intial T, the step length may be too small; thus not enough of the function might be evaluated to find the global optima. The user should carefully examine VM to make sure that it is appropriate. The relationship between the initial temperature and the resulting step length is function dependent. To determine the starting temperature that is consistent with optimizing a function, it is worthwhile to run a trial run first. Set RT = 1.5 and T = 1.0. With RT > 1.0, the temperature increases and VM rises as well. Then select the T that produces a large enough VM. For modifications to the algorithm and many details on its use, (particularly for econometric applications) see Goffe, Ferrier and Rogers, "Global Optimization of Statistical Functions with the Simulated Annealing," Journal of Econometrics (forthcomming). For a pre-publication copy, contact Bill Goffe Department of Economics Southern Methodist University Dallas, TX 75275 h2zr1001 @ smuvm1 (Bitnet) h2zr1001 @ vm.cis.smu.edu (Internet) As far as possible, the parameters here have the same name as in the description of the algorithm on pp. 266-8 of Corana et al. Input Parameters: Note: The suggested values generally come from Corana et al. To drastically reduce runtime, see Goffe et al., pp. 17-8 for suggestions on choosing the appropriate RT and NT. Input Parameters for class Simann: netInt (netInterface) netInt manages all netcommunication. netInterface uses the class netCommunication to send/receive data. The class processManager is used to dispatch processes to be used by netInterface. It uses the class netDataControl to store data and keep track of information about status of data concerning netcommunication. It can use the class dataConverter to prepare data before sending and the class dataScaler to scale/unscale data after/before sending. MAXIM Denotes whether the function should be maximized or minimized. A true value denotes maximization while a false value denotes minimization. SIM_ITER The maximum number of function evaluations. (INT) TEMPERATURE The initial temperature. See Goffe et al. for advice. (DP) c Vector that controls the step length adjustment. The suggested value for all elements is 2.0. (DP(N)) vm The step length vector. On input it should encompass the region of interest given the starting value X. For point X(I), the next trial point is selected is from X(I) - VM(I) to X(I) + VM(I). Since VM is adjusted so that about half of all points are accepted, the input value is not very important (i.e. is the value is off, SA adjusts VM to the correct value). (DP(N)) Other input parameters from mainconstants: C_COMPONENT Size of each component of the step length adjustment, that is the value of the components of c. VM_COMPONENT The initial value of the components of vm (as the initial value is not considered very important, all components have the same initial value). Input parameters from simconstants: RT The temperature reduction factor. The value suggested by Corana et al. is .85. See Goffe et al. for more advice. (DP) EPS Error tolerance for termination. If the final function values from the last neps temperatures differ from the corresponding value at the current temperature by less than EPS and the final function value at the current temperature differs from the current optimal function value by less than EPS, execution terminates and IER = 0 is returned. (EP) NS Number of cycles. After NS*N function evaluations, each element of VM is adjusted so that approximately half of all function evaluations are accepted. The suggested value is 20. (INT) NT Number of iterations before temperature reduction. After NT*NS*N function evaluations, temperature (T) is changed by the factor RT. Value suggested by Corana et al. is MAX(100, 5*N). See Goffe et al. for further advice. (INT) NEPS Number of final function values used to decide upon termi- nation. See EPS. Suggested value is 4. (INT) The upper and lower bounds are part of netInt. If the algorithm chooses X(I) .LT. LB(I) or X(I) .GT. UB(I), I = 1, N, a point from inside is randomly selected. This focuses the algorithm on the region inside UB and LB. Unless the user wishes to concentrate the search to a particular region, UB and LB should be set to very large positive and negative values, respectively. Note that the starting vector X should be inside this region. Also note that LB and UB are fixed in position, while VM is centered on the last accepted trial set of variables that optimizes the function. Parameters within the algorithm, that can be used to follow how the algorithm is working: x The variables that optimize the function at each time. (DP(N)) fopt The optimal value of the function at each time. (DP) nacc The number of accepted function evaluations. (INT) nnew The number of times the algorithm finds a new optimum. (INT) nup The number of function values that are better than the last one. (INT) ndown The number of accepted function values that are worse than the last one. (INT) nrej The number of not accepted function values. (INT) nfcnev The total number of function evaluations. In a minor point, note that the first evaluation is not used in the core of the algorithm; it simply initializes the algorithm. (INT). nobds The total number of trial function evaluations that would have been out of bounds of LB and UB. Note that a trial point is randomly selected between LB and UB.(INT) The user supplied objective function f(x) has to support pvm. Its argument is x - an array of doubles, the point at which f(x) should be calculated and it returns a double. All informations about the objective function are a part of netInt. The following functions from netInt are used: startNewDataGroup(s) Creates a new data group of size s. This group keeps all points sent to the objective function as arguments, which function value they have and an ID for them. stopUsingDataGroup() Destructor for data group. getNumOfVarsInDataGroup() Returns the number of coordinates in the argument for the objective function. getTotalNumProc() Tells, how many processors are available. sendToAllIdleHosts() Sends jobs to all processors available. That is, sends points to all available processors, for evaluating the function values. getNumNotAns() Number of all points that have been sent, where the function value has not yet received. send_receiveAllData() Sends all uncalculated points in the data group to processors and recieves them function values as well. receiveOne() Recieves one function value. receiveAll() Recieves all function value, not yet received. getRecievedID() Gets the ID of the point, which function value was received last. getY( returnID ) Returns the function value of the point with ID returnID. getX( returnID ) Returns the point with ID returnID. getInitialX() Returns the starting point, that is the best point know, before the algorithm starts. getUpperScaleConstant() For getting the upper bounds. getLowerScaleConstant() For getting the lower bounds. setX(x) Sets x as the next point in the data group. setBestX(x) Sets x as the point with the best value found, so far. unscaleX(x) If working with scaled points, this returns the point x unscaled, other- wize no change. 3) BFGS ABSTRACT: The BFGS-method is a gradient iterative method that work as follows: We start at some point and successively generate new points, so that the function value decreases in each iteration. This is done by choosing a direction that makes an angle greater than 90 degrees with the gradient of the function. Authors Gunnar Stefansson Kristjana Yr Jonsdottir Thordis Linda Thorarinsdottir The BFGS-method is one of the Quasi-Newton methods. Quasi-Newton methods are iterative methods which approximate Newton's method without calculating second derivatives. They are of the form x^k+1 = x^k + a^k * d^k, d^k = -D^k \nabla f( x^k ), where D^k is a positive definite matrix, updated using the BFGS updating formula. To find the value of a, an inexact linesearch called the Armijo rule is used. That method finds an a such that Armijo condition is satiefied, which is based on getting a function value that is some amount better than the best know function value. This is done by trying at first an initial stepsize an than reducing the stepsize until a value that satisfies the Armijo condition is found. For details, see D.P. Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont, 1999. As the objective function is not necessarily differentiable, the gradient is calculted numerically. Input parameters, from file bfgsconstants: BFGSPAR 1 for using the BFGS method and 0 for using the steepest decent method. It is sometimes good to use the later one for a few iterations if the point is stuck. MAXBFGSITER Maximum number of iterations for each round of the minimization. MAXROUNDS Maximum number of rounds made. ERRORTOL Error tolerance for the termination criteria. XTOL Error tolerance for the point. SHANNONSCALING 1 if the direction vector shall be scaled, otherwize 0. If 1, than Shannon Scaling is used to scale the direction vector. DIFFICULTGRAD denotes how good the approximation of the gradient should be. If = 0, a first degree polynomial is used. If = 1, the approximation is parabolic and if >= 2, a polynomial of degree four is used. If the line search gives no results, the difficultgrad is increased. BETA beta constant for Armijo condition in linesearch. See D.P. Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont, 1999, page 29. SIGMA sigma constant for Armijo condition in linesearch. See D.P. Bertsekas, "Nonlinear Programming", Athena Scientific, Belmont, 1999, page 29. PRINTING 1 for printing on, and 0 for printing off. If printing is on, the last gradient calulated is printed out to the file "gradient", the last hessian matrix is printed out to the file "hessian" and the best point is printed out to the file "finalvals". The user supplied objective function f(x) has to support pvm. Its argument is x - an array of doubles, the point at which f(x) should be calculated and it returns a double. All informations about the objective function are a part of netInt. The following functions from netInt are used: startNewDataGroup(s) Creates a new data group of size s. This group keeps all points sent to the objective function as arguments, which function value they have and an ID for them. stopUsingDataGroup() Destructor for data group. getNumOfVarsInDataGroup() Returns the number of coordinates in the argument for the objective function. getNumDataItemSet() Number of data items set in data group. getNumDataItemsAnswered() Number of data items set, which have received answer too. dataGroupFull() Returns 1 if not able to add more data to current datagroup, else returns 0. getTotalNumProc() Tells, how many processors are available. getNumFreeProcesses() Returns number of available free processes. getNumNotAns() Number of all points that have been sent, where the function value has not yet received. send_receiveAllData() Sends all uncalculated points in the data group to processors and recieves them function values as well. send_receive_setData(c) Sends, receives and sets new data until condition c is met. Returns ERROR if error, or 1 if condition was met. send_receiveTillCondition(c) Sends and receives data until condition c is met, or there is no more available data in the data group. Returns ERROR if error, 1 if condition was met and 0 if condition not met and all data belonging to datagroup has been sent and received. getRecievedID() Gets the ID of the point, which function value was received last. getY( returnID ) Returns the function value of the point with ID returnID. getNextAnswerY() Returns the function value in the last data pair found while looking for a datapair with an answer. getX( returnID ) Returns the point with ID returnID. getNextAnswerX() Returns the next x found in datagroup which has an answer since last checked for answer. (x0, fx0), (x1, ), (x2, fx2), ...., (xn,fxn). If the datapair (x0, fx0) was last checked for an answer then x will equal x2 and the next datapair that will be looked at for answer will be (x3, fx3) unless the search is reset to start at the beginning. getInitialX() Returns the starting point, that is the best point know, before the algorithm starts. getUpperScaleConstant() For getting the upper bounds. getLowerScaleConstant() For getting the lower bounds. setDataPair(x, f) Adds the datapair (x, f) to datagroup. x will not be sent for processing through the net as f is stored as its answer. setX(x) Sets x as the next point in the data group. setXFirstToSend(x) Adds x to datagroup and puts it first in the queue for sending. setFirstAnsweredData() Initiates a datapointer to the data item which was first set in data group. setBestX(x) Sets x as the point with the best value found, so far. unscaleX(x) If working with scaled points, this returns the point x unscaled, other- wize no change. stopNetComm() Halts all netcommunication, netInt can not send/receive anymore.