Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Use budget as an input for surrogate model training #1183

Open
bbudescu opened this issue Dec 19, 2024 · 2 comments
Open

Comments

@bbudescu
Copy link

bbudescu commented Dec 19, 2024

Some time ago, while browsing the state of the art, I stumbled upon this idea and I can't for the life of me remember which algo introduced it, in which paper it was published or which packages implement it. I could have sworn it was BOHB and SMAC3, but it turns out I was wrong.

The main idea was to treat the budget parameter similarly to how SMAC3 treats instance features, i.e., to train the surrogate model on it, as well as on instance features and hyperparameters. When maximizing the acquisition function, we'd only care about the predictions at max_budget.

As such, a slice along the budget dimension in the cost surface modeled by the RF would effectively represent an estimate for configurations' learning curve. This way, the underlying surrogate model would also provide learning curve prediction (extrapolation), and costs measured at lower budgets would improve estimations for which configs will maximize the acquisition function at max_budget.

In this case, it would also help to provide more datapoints to constrain the surrogate model, so it would make sense to report cost(s) after every unit increment of the budget (i.e., after every epoch), rather then just at the budgets at which the multi-fidelity intensifier judges whether to keep running or cut short.

@benjamc
Copy link
Collaborator

benjamc commented Jan 8, 2025

Hi Bogdan,
yes you are right, but I also cannot remember quickly which paper introduced this. We agree that it would be nice to include more multi-fidelity approaches.

@bbudescu
Copy link
Author

Wait, it was, in fact BOHB. Checkout the paper (abs, pdf), under section 3.2 Hyperband where it describes the surrogate model as such:

While the objective function is typically expensive to evaluate (since it requires training a machine learning model with the specified hyperparameters), in most applications it is possible to define cheap-to-evaluate approximate versions that are parameterized by a so-called budget.

And later on, in section 4.1. Algorithm description it sounds as if the KDE is supposed to be trained on budget input, as well.

Also, in Appendix I. Surrogates, in the section I.1. Constructing the Surrogates it says:

To build a surrogate, we sampled 10 000 random configurations for each dataset, trained them for 50 epochs, and recorded their classification error after each epoch, along with their total training time. We fitted two independent random forests that predict these two quantities as a function of the hyperparameter configuration used. This enabled us to predict the classification error as a function of time with sufficient accuracy.

I can see that even in the original HpBandSter implementation a different model is trained for every budget, and budget is not treated as just another input to train the surrogate model upon.

How come there's this difference between the paper and its implementations? Or am I misunderstanding the paper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants