You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some time ago, while browsing the state of the art, I stumbled upon this idea and I can't for the life of me remember which algo introduced it, in which paper it was published or which packages implement it. I could have sworn it was BOHB and SMAC3, but it turns out I was wrong.
The main idea was to treat the budget parameter similarly to how SMAC3 treats instance features, i.e., to train the surrogate model on it, as well as on instance features and hyperparameters. When maximizing the acquisition function, we'd only care about the predictions at max_budget.
As such, a slice along the budget dimension in the cost surface modeled by the RF would effectively represent an estimate for configurations' learning curve. This way, the underlying surrogate model would also provide learning curve prediction (extrapolation), and costs measured at lower budgets would improve estimations for which configs will maximize the acquisition function at max_budget.
In this case, it would also help to provide more datapoints to constrain the surrogate model, so it would make sense to report cost(s) after every unit increment of the budget (i.e., after every epoch), rather then just at the budgets at which the multi-fidelity intensifier judges whether to keep running or cut short.
The text was updated successfully, but these errors were encountered:
Hi Bogdan,
yes you are right, but I also cannot remember quickly which paper introduced this. We agree that it would be nice to include more multi-fidelity approaches.
Wait, it was, in fact BOHB. Checkout the paper (abs, pdf), under section 3.2 Hyperband where it describes the surrogate model as such:
While the objective function is typically expensive to evaluate (since it requires training a machine learning model with the specified hyperparameters), in most applications it is possible to define cheap-to-evaluate approximate versions that are parameterized by a so-called budget.
And later on, in section 4.1. Algorithm description it sounds as if the KDE is supposed to be trained on budget input, as well.
Also, in Appendix I. Surrogates, in the section I.1. Constructing the Surrogates it says:
To build a surrogate, we sampled 10 000 random configurations for each dataset, trained them for 50 epochs, and recorded their classification error after each epoch, along with their total training time. We fitted two independent random forests that predict these two quantities as a function of the hyperparameter configuration used. This enabled us to predict the classification error as a function of time with sufficient accuracy.
I can see that even in the original HpBandSter implementation a different model is trained for every budget, and budget is not treated as just another input to train the surrogate model upon.
How come there's this difference between the paper and its implementations? Or am I misunderstanding the paper?
Some time ago, while browsing the state of the art, I stumbled upon this idea and I can't for the life of me remember which algo introduced it, in which paper it was published or which packages implement it. I could have sworn it was BOHB and SMAC3, but it turns out I was wrong.
The main idea was to treat the budget parameter similarly to how SMAC3 treats instance features, i.e., to train the surrogate model on it, as well as on instance features and hyperparameters. When maximizing the acquisition function, we'd only care about the predictions at
max_budget
.As such, a slice along the budget dimension in the cost surface modeled by the RF would effectively represent an estimate for configurations' learning curve. This way, the underlying surrogate model would also provide learning curve prediction (extrapolation), and costs measured at lower budgets would improve estimations for which configs will maximize the acquisition function at
max_budget
.In this case, it would also help to provide more datapoints to constrain the surrogate model, so it would make sense to report cost(s) after every unit increment of the budget (i.e., after every epoch), rather then just at the budgets at which the multi-fidelity intensifier judges whether to keep running or cut short.
The text was updated successfully, but these errors were encountered: