16 min

# HumpDay: A Package to Help You Choose a Python Global Optimizer

Published on February 13, 2021

This post continues my quixotic quest, begun in Comparing Global Optimization Packages, to locate the best, practical derivative-free global optimizer packages that can be easily used from Python. I wrote a small package called HumpDay to make this less painful. It provides a simple, unified calling syntax into some - but by no means all - functionality from a dozen of the most popular Python global optimization packages:

best_val, best_x = strategy( objective, n_trials, n_dim )

The sister repository optimizer-elo-ratings uses this to assess performance. That isn't an easy task but HumpDay (camel case, of course) contains a growing list of challenges that I hope become more representative over time. For now, it uses the classic functions you are familiar with such as griewank, rastrigin, and their ilk. See objectives/classic.py for the full list.

In this post, I update recommendations and explain a new Elo rating methodology for optimizers. Perhaps more importantly, you can use the humpday/comparison utilities (or just open this notebook in colab) and the simplicity of the common syntax, to quickly do your own bespoke evaluations of optimizers from different packages and, thereby, arrive at one or more likely to serve your purposes, not mine.

There is an important limitation at present: these comparisons are probably only good for continuous parameter spaces. Of course you can map anything to the hypercube, but the optimizers won't know that and you'll likely waste function evaluations. A more subtle drawback is that if your domain is continuous but long and skinny, then there may be some distortion when you transform the domain to the cube.

### Popular Python Optimization Packages

One has quite a choice these days, as the table below illustrates. No doubt it is incomplete. And not all popular optimization packages are included in HumpDay at present, or have Elo ratings assigned.

 Package name (PyPI) Downloads Elo rating? Notes scipy 20,516,412* Elo scikit-optimize 562,604 Elo hyperopt 525,297 Elo reviewed cmaes 312,860 Elo  reviewed under the optuna hood ray 246,120* ray.tune gpyopt 240,550 ortools 173,678* optuna 166,445 Elo reviewed bayesian-optimization 78,512 Elo notebook dlib 77,472* Elo notebook blog paper   post cma 52,000 ax-platform 22,144* Elo nlopt 20,359 Elo notebook docs (be wary of uncaught C++ exceptions :-) swarmpackagepy 16,316 hpbandster 13,174 Implements bohb rbfopt 12,634 gekko 11,680 mixed int zoopt 11,433 paper rbfopt 11,096 IBM simanneal 10,549 optlang 8,564 nevergrad 8,486 Elo spotpy 7,477 pyswarms 7,440 py-bobyqa 5,328 paper NAG (also dfols for LS) hebo 5,300 shgo n/a Elo reviewed Part of scipy now pymoo 3,241 Elo reviewed multi pySOT 3,094 Elo reviewed platypus-opt 2823 Elo reviewed bolib 2,666 swarmlib 2,130 Elo toy auptimizer 2,225 orchestration bayesianoptimization 1,921 gaft 1,725 genetic ipopt 1,706 COIN-OR mealpy 1,615 meta mystic 1,533 docs niapy 1,507 proxmin 1,203 paper coordinate-wise cvx bayeso 986 hebo 819 bbo 1st place. original in github yabox 623 "Yet another..." dfogn 611 NAG paper pypesto 523 param est mipego 521 mixed integer opentuner 482 site pyriad 396 pytorch pygpgo 390 oasis-optimization 314 docs dfoalgos 295 rank-based heuristic-optimization 283 swarm pydogs 244 ultraopt 222 Elo notebook pysmac 221 arm-mango 195 slides pyopus ssb-optimize 100 pdfo 92 Powell tgo 88 Supplanted by shgo which is reviewed. solidpy 50 The following are not on PyPI bayesianevolution rapids-ai-BBO-2nd-place-solution NVIDIA pagmo algo list spearmint Murky license. Only github simple Simplex domain only turbo_bbo_neurips_2020

As with the other lists here, from which this was copied, a download number carries an asterisk if it is part of a larger general purpose package. Download stats are dubious and easily manipulated. Please file an issue if you would like more included.

#### A Painful Choice

That pain of choosing an optimizer centers on the fact that between scipy.optimize, dlib, ax-platform, hyperopt, nevergrad, optuna, bayesopt, platypus, pymoo, pySOT and skopt (and more by the time you read this) there is barely a convention in common. I was about to say that skopt is the exception because it adopts the scipy conventions - now there's a novel idea - but actually it doesn't quite. Aside from minor naming differences, it can be fooled into optimizing on the corners instead of a continuum, if you are not careful, when you send it precisely the same bounds that work perfectly in scipy.optimize.

That's a trite example but if you read into the HumpDay package, you'll be stunned at how many ways there are to set up an optimization problem in code. Compare optuna's style here to, say, pymoo and you'll see what I mean - they are both quite marked departures from the functional scipy style, and yet depart in quite different ways. That isn't a bad thing, as innovation in the design of optimizer interfaces is important - Optuna's is rather elegant I think - but it does make performance comparisons awkward. I strongly suspect that most people don't compare optimizers, but just pick a package based on hearsay and stick with it.

Unfortunately, if you read my previous post Comparing Python Global Optimizers, you'll know that a Google search can be a terrible guide. Hyperopt is downloaded more than twice as often as the nearest competitor, but I'm not sure it even makes my top ten. On the flip side, the pySOT surrogate optimization package is a quiet achiever, ranking a lowly fourteenth by download popularity (and too humble to assign itself a logo). I suspect that's because of the ceremony involved and run time, but if your objective is truly expensive, maybe try "dycors" or "srbf".

The HumpDay package also makes amends for some notable omissions in my initial look Comparing Python Global Optimzers (post). I somehow left out two optimizers provided by Facebook - ax-platform and nevergrad. Sorry folks! You'll also find consistent signatures for scikit-optimize (a.k.a. skopt), ultraopt, dlib and bayesian-optimization that didn't appear previously.

#### Canonical Signatures

Obviously, I got fed up and decided to cut the Gordian knot, or at least one part of the knot: continuous domains. So in HumpDay you will find more than fifty different optimization strategies spanning the usual suspects inspired by nature, Bayesian surrogate methods, homology, Parzen and hybrid approaches (not to mention, as they say on the radio, all the hits from the sixties, seventies and eighties - pattern search ain't broken).

Implementations are drawn from reasonably popular Python optimization packages, and within each, I've tried to span a reasonable set of choices. Each is rendered as a simple function, even simpler than the scipy signature:

best_val, best_x = strategy( objective, n_trials, n_dim )

I was minimalist here, so objective is a function taking the hyper-cube as its domain. At least for me there isn't a huge loss of generality, since the domain can be mapped to a hypercube in some fashion usually. This exercise has only one real hazard: the potential to trigger long-buried memories of Complex Analysis exams - though obviously the mapping from one domain to another need not preserve angles! Indeed, the case has been made that for Lipshitz/Holder functions one doesn't even need to preserve dimension. See Lera and Sergeyev pdf for one example - though this is a whole sub-field.

I hope that as far as your views on optimizers go, the standardization provided by the HumpDay package makes it operationally trivial for you to form a weak prior at least, as to what might be useful. Suppose you have an objective function that captures the essence of many similar tasks you face, or a collection of the same. To run one optimizer:

    from humpday.optimizers.nevergrad import nevergrad_ngopt_cube
from humpday.objectives.classic import salomon_on_cube
best_val, best_x, feval_count = nevergrad_ngopt_cube(salomon_on_cube, n_trials=100, n_dim=3, with_count=True)

You can then switch seamlessly from nevergrad to ax-platform to dlib or whatever you can find in humpday/optimizers. Or, to test all the strategies from one package against some pre-canned objective functions, you can write something like:

from humpday.optimizers.pysotcube import PYSOT_OPTIMIZERS
from humpday import CLASSIC_OBJECTIVES
for objective in CLASSIC_OBJECTIVES:
for optimizer in PYSOT_OPTIMIZERS:
print( (optimizer.__name__, optimizer(objective, n_trials=100, n_dim=3, with_count=True)))

Similarly one can, given enough time, test all strategies from all optimization packages:

from humpday import OPTIMIZERS
from humpday import CLASSIC_OBJECTIVES
for objective in CLASSIC_OBJECTIVES:
for optimizer in OPTIMIZERS:
print( (optimizer.__name__, optimizer(objective, n_trials=100, n_dim=3, with_count=True)))


Obviously, you could replace CLASSIC_OBJECTIVES with your own list of objective functions. To emphasize, an objective is a function taking just one argument - a single list or vector describing a point in the hypercube [0,1]^n. I dare say functools.partial may be your friend, and there are a few trivial transforms provided to squish domains as you need to.

### Top Performers

Though your bespoke task is what matters, it is probably useful to glean some overall impressions about which optimizers are performing well on standardized tasks. The Elo ratings, presented with no expense spared at optimizer-elo-ratings, are an attempt to get around the assessment problem caused by "naughty" optimizers - those who insist on performing more objective function evaluations than they are instructed to use. There are lots of caveats and many Elo ratings are yet to settle as I write, but that said, the following packages and strategies seem to be performing well.

For readers of my previous post, which tried to get at optimizer performance in a different way, there are some familiar characters here and no huge surprises - though I repeat the caveat that some techniques (e.g. surrogate) might be favored by the somewhat stylized objective functions I am using here. The flexibility of other techniques (e.g. parallelism, discrete optimization, nested domains, et cetera) aren't allowed to shine. My intent isn't to downplay the cleverness of design features of these packages.

As more real-world objective functions are introduced, it will be interesting to see how Elo's change. I refer you to the current leaderboard, as I won't be attempting to keep recommendations here up to date manually. The new Facebook inclusions are doing pretty well - which is a relief given that I've not been entirely helpful promoting their signature time series package (see prophet post - just trying to help here). Facebook organized an open competition last year to improve nevergrad and IOHprofiler (link) and I'm sure the optimizers will get even better over time. They also cater to a wide range of dimensions and domains.

However, at the risk of being booted from the nevergrad Facebook group my new go-to is dlib. It uses the approach developed by Malherbe and Vayatis (paper) as noted above, which is annoyingly elegant. There is a nice video on the blog and a very clear explanation. Not only is the performance extremely good, but the run time is likely to be negligible for any real world, expensive objective functions. The library won't let you run problems with more than 34 dimensions, but that seems to be the only limitation. The only drawback I can think of (for my particular domain) is constantly wishing I had come up with it myself.

Elo ratings are flawed in the same way that Value at Risk is flawed. I don't suggest that "most likely to get the lowest value for a fixed number of function evaluations" is everyone's cost function. There are some other things to be aware of too, such as the fact that some optimizers, including SHGO, really struggle to limit themselves to a fixed number of evaluations and that can hurt them in head-to-head matches when playing Black. Whether that penalty is considered fair depends on your needs. I'll go into a little bit more detail on the methodology in a moment, to illuminate these points.

#### Recommendations

First let me mention that you can poke around in humpday/comparison for utilities that will generate recommended optimizers. These are only as good as the category Elo ratings behind them, which need more time to equilibrate, but say you want some quick advice on which optimizers are the best performing at 8 dimensional problems when given 130 function evaluations to play with. Perhaps you want to limit total computation time of the optimizer (as distinct from the objective function) to 5 minutes. Use:

    suggest(n_dim=8, n_trials=130, n_seconds=300)

The category Elo ratings are in the process of being computed by dimension and function evaluation limit. I've tried to warm them up a little, but they are really intended to be a free byproduct of testing. Examples:

Go up one directory to find more, here. For some other time saving utilities I'd refer you to the HumpDay README.md as that is likely to be better maintained than this blog article. For instance:

    from humpday import recommend    def my_objective(u):        time.sleep(0.01)        return u[0]*math.sin(u[1])    recommendations = recommend(my_objective, n_dim=21, n_trials=130)

As this particular function is very fast, some optimizers will be culled from the list. There's not point spending a long time updating a surrogate function if you can sample the objective many times instead.

Hopefully this speeds your search or encourages you to try some packages you would otherwise overlook. I suppose, in addition, you could continue to use HumpDay after you find you favorite optimizer, though that comes with some pip dependency baggage and serious limitations so I don't really recommend it. The intent is modest here: to provide you with a quick shortlisting of Python optimization packages that might be suitable to your bespoke task, and some templates to get you over the hump (groan) of moving from one syntax to another. At minimum it might provide you with some assurance that in continuing with the one you know and like, you're not missing much.

### How Matches are Officiated

If you are familiar with Elo ratings for chess, and I'm guessing that many of you are, then the only question you will ask is, "How are matches arranged between optimizers?". That is laid out in the code, and you can even watch games occurring (I'm kidding ... I think) or inspect the history. But here's the English version.

1. We choose an objective function at random from the list. See the previous article Comparing Python Global Optimizers for an explanation of some of these.
2. We select a random dimension n_dim, from the Fibonacci numbers (because ... no reason) and a maximum number of function evaluations, referred to n_trials in the code.
3. We choose two optimization strategies at random. One to play White, the other Black. To be clear, an optimization strategy comprises a choice of optimization package, such as optuna, and also the fixing of any other choices or parameters that are required (in this case a choice of sampler - such as Tree-structured Parzen Estimator or TPE).
4. The White player is instructed to minimize the objective function using at most n_trials function evaluations. If they go over, we try to fool the optimizer by lowering the parameter that (supposedly) controls the number of function evaluations (but evidently, doesn't quite do so). We repeat until the actual number of function evaluations, call it white_feval_count, is not more than n_trials. Each time we fail, we reduce the parameter by 30%.
5. Assuming success, it is now Black's turn. We perform a similar procedure where n_trials is swapped out for white_feval_count, should they be different. And this time, we allow Black to exceed white_eval_count by ten percent. We also give Black finer increments, and more attempts.
6. Hopefully, both optimizers obey our command one way or another and in doing so, minimize a function using roughly the same number of function evaluations. A winner is determined according to the lowest function minimum that is found. However, if the proportional difference is very small, a draw is declared.

Sometimes a match will be declared void if step 4 or step 5 fails, and one optimizer fails to constrain itself appropriately. This occurs if the White player exceeds n_trials repeatedly or if the Black player exceeds the white_feval_count by more than 10%, despite strong encouragement. A match is also void if either algorithm crashes. There is no other penalty, at present.

The rest is just standard Elo updating, but I'll save you from re-watching The Social Network if you are rusty on that topic (is it mildly ironic that we are judging Facebook algorithms with the algorithm that catalyzed The Facebook?). The Elo system assumes that if Black's prior rating exceeds White's rating by a difference of $$d$$ then White can expect to score $$\bar{p} = \frac{1}{1 + 10^{d/F}}$$ points, where a win counts as $$1.0$$, a draw $$0.5$$ and a loss $$0.0$$ and $$F$$ is a constant, called the F-factor. White actually scores $$p$$ points say leading to a differential of $$\delta = p - \bar{p}$$ for White and $$-\delta$$ for Black. Both player's ratings are updated by an amount equal to this performance differential multiplied by the so-called K-factor. At time of writing, the following F-factors and K-factors are used.

 Optimizers c.f. Chess K-factor 60 or 30 25, 16 or 10 F-factor 1000 400

We take the liberty of using a larger F-factor than in chess. The rationale is that a single optimization is somewhere between an entire game of chess and a hand of poker.

### Interpreting Absolute Optimizer Elo Ratings

Unlike chess Elo ratings, the optimizer Elo scale is anchored. The random optimizer's rating is reset to 1600 after every game. For the avoidance of doubt, the random optimizer is the optimizer that chooses uniformly on the hypercube n_trials times, returning the lowest value found.

This presumably limits ratings drift over time, and also makes it relatively easy to interpret an optimizer rating in an absolute sense. For example, using an F-factor of 1000 we can read off the expected points gained in a game when an optimizer is compared to random search. If draws are considered rare, this is essentially equivalent to the probability that an optimizer will help, rather than harm, when you employ it.

As also noted in the previous post, that particular measure may not be the best statistic. For example, you might be interested in the average time taken by an optimizer to perform reasonably well, as we were in that article. Or you might be interested in some tail weighted metric that penalizes cases where it never finds the minimum. Such metrics might lead you more towards optimizers that try to find all minima, whereas Elo ratings might not always correlate.

But with those caveats and more, here's an interpretation. The probability that an optimizer performs better than random search, assuming no draws, is as follows:

 Elo Beat Random Elo Beat Random Elo Beat Random 1300 33 % 1600 50 % 1900 67 % 1400 38 % 1700 56 % 2000 72 % 1500 44 % 1800 61 % 2100 76 %

Now as computed, the Elo rating is more a function of matches against other optimizers than it is against the random optimizer, so take with a grain of salt. But notice that the very best optimization strategies are better than random search three times more often than not. That might be significant, depending on the economic cost of your search.

Random search isn't dreadful, and it is well appreciated that randomly choosing points from the space to be evaluated is superior to grid search, something we see all too often (see the paper by Bergstra and Bengio for a theoretical treatment). On the other hand, random search is certainly easy to implement, so one might reasonably be concerned if an optimizer is assigned a rating below 1600 - as seems to be the case for the "portfolio" algorithm in the ax-platform at present, and for several marsupials in the platypus package.

But again, there is a lot of noise in the ratings (it is an exercise for the reader to estimate how much) and, as new objective functions are introduced that are better suited to evolutionary algorithms, relative assessments will probably change. There is another reason why some optimizers might lose out to the random search on the Elo scale that the reader may have noticed. If they are not good at controlling their function evaluation budget, then the head-to-head procedure, as outlined above, may result in them using fewer function evaluations than their opponent - at least when they play Black.

### Interpreting Relative Optimizer Elo Ratings

There are theoretical flaws with the Elo system and inaccuracies in the underpinning assumptions (logarithmic odds ratios). Yet the chess world carries on and so shall we. I hope that the relative differences in Elo ratings of optimizers can provide some rough guidance, allowing you to weigh possible performance differences against a myriad of other considerations, ranging from design, ease of use, responsiveness to Github issues, generality of the search domain, ease of modification and so forth.

Here's an oddly familiar table relating differences in optimizer Elo ratings to the likelihood that one optimizer will find a lower minimum than the other.

 Elo Diff Probability Elo Diff Probability Elo Diff Probability -300 33 % 0 50 % 300 67 % -200 38 % 100 56 % 400 72 % -100 44 % 200 61 % 500 76 %

I would not dismiss relatively small differences as economically insignificant if each evaluation corresponds to a lengthy simulation (or worse, some physical operation). However, I think this might also grant some of you the liberty to choose your package based on non-performance criteria in many instances and sleep well - say if the rating differential amounts to what might be won or lost in a single game (+/- 30 points initially, then +/- 15 later on) or two.

This exercise has convinced me further that download statistics and other measures of popularity are dreadful regressors for performance. Hopefully, more optimizers will be assigned Elo ratings in the future and again, if you have a suggestion for these lists, please file an issue or, better yet, a pull request.

### Contributing

It should be easy to make helpful PRs:

1. Just grab your favorite optimization package from those that I'm missed thus far.
2. Shape it as I have the others in the optimizers directory, and make a pull request.
3. Or ... find bugs. Special thanks to Robert Kern for finding some horrendous ones already.
4. Or ... suggest better choices of method or hyper-parameters within a package.

Interesting objective functions on the hypercube are also in demand, and you can shove them here.  Incidentally, there are a couple of related projects that might pique your interest too:

• The embarrassingly library discussed here that is my somewhat experimental hack for robustness. It modifies the objective function only, so is uncoupled in an engineering sense from the optimizer itself.
• An ongoing effort to boost elo ratings for time-series algorithms judged out of sample on live data was part of the motivation for HumpDay. HumpDay grew out of a sub-module there but I carved it out in the interest of maintainability.

Anyway, I hope HumpDay saves you some time, simple as it is. To ensure articles like this are in your thread consider following microprediction on LinkedIn. We're trying to make powerful bespoke AI free to the world, and you are welcome to use it or contribute in large or small ways.