Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless exploitation cycle #468

Open
AdrianSosic opened this issue Jan 23, 2025 · 3 comments
Open

Endless exploitation cycle #468

AdrianSosic opened this issue Jan 23, 2025 · 3 comments
Labels
enhancement Expand / change existing functionality

Comments

@AdrianSosic
Copy link
Collaborator

Observation

Already a while ago, we sometimes noticed "strange" behavior (especially in discrete domains) where a campaign would keep on repeating the same point over and over again. This was one of the motivations for setting the default of either allow_recommending_already_measured or allow_repeated_recommendations to False, both of which are an effective way to avoid getting stuck in the endless cycle / force the algorithm to explore further.

Yesterday, @Hrovatin stumbled again over the same effect, noticing that two (otherwise identical) campaign settings can perform drastically differently when either i) adding new measurements to the campaign after each iteration or ii) reinitializing a new campaign with the extended dataset (see figure below). The reason is in fact the one described above since in the latter case the flags cannot take effect (because the campaign meta data is lost).

Image

Investigation

A deeper investigation led me to the hypothesis that this is ultimately caused by the inherent mechanisms of expected improvement (and other acquisition functions), which can simply happen to underexplore. I've quickly drafted a minimal example that consistently reproduces the effect (see post below)

In this example, the iteration loop quickly reaches a dead end where the recommendations keep on cycling between two equal minima. This steady state is graphically shown below (black dots=data, green line=current minimum, gray=posterior, red=acqf values). According to EI, there is not enough benefit to explore unseen points but, at the same time, re-observing the minima also doesn't change the situation since their posterior variance is already effectively zero.

Image

Interpretation and Way Forward

Overall, it seems that the issue could be that there is overall too much trust in the current model or, in other words: the model uncertainty is not adequately estimated. In fact, there is evidence that this can lead to BO getting stuck with expected improvement, like described in "A hierarchical expected improvement method for Bayesian optimization" [Chen et al]

Image

If this is the case, a potential avenue is to replace the point-estimate-based GP parameter fitting with an approach that properly takes into account the posterior hyper-parameter distribution, e.g. Bayesian model averaging or deterministic approximation schemes. A more flexible fitting approach is on the roadmap anyway...

Until then, it's probably best to keep the allow_* flags active.

@AdrianSosic AdrianSosic added the enhancement Expand / change existing functionality label Jan 23, 2025
@AdrianSosic
Copy link
Collaborator Author

AdrianSosic commented Jan 23, 2025

Minimal Example

import numpy as np
import pandas as pd
import torch
from botorch.test_functions import Rastrigin
from matplotlib import pyplot as plt

from baybe import Campaign
from baybe.acquisition.acqfs import qLogExpectedImprovement
from baybe.objectives import SingleTargetObjective
from baybe.parameters import NumericalDiscreteParameter
from baybe.recommenders.meta.sequential import TwoPhaseMetaRecommender
from baybe.recommenders.pure.bayesian.botorch import BotorchRecommender
from baybe.searchspace import SearchSpace
from baybe.simulation.lookup import look_up_targets
from baybe.targets import NumericalTarget
from baybe.utils.random import set_random_seed

N_MC_ITERATIONS = 10
N_DOE_ITERATIONS = 15
BATCH_SIZE = 1
POINTS_PER_DIM = 10
DIMENSION = 1

TEST_FUNCTION = Rastrigin(DIMENSION)
BOUNDS = TEST_FUNCTION.bounds


def blackbox(df: pd.DataFrame, /) -> pd.DataFrame:
    """A callable whose internal logic is unknown to the algorithm."""
    df["Target"] = TEST_FUNCTION(torch.tensor(df.values))
    return df



parameters = [
    NumericalDiscreteParameter(
        name=f"x_{k+1}",
        values=list(np.linspace(BOUNDS[0, k], BOUNDS[1, k], POINTS_PER_DIM)),
        tolerance=0.01,
    )
    for k in range(DIMENSION)
]
searchspace = SearchSpace.from_product(parameters=parameters)
objective = SingleTargetObjective(target=NumericalTarget(name="Target", mode="MIN"))

AC = qLogExpectedImprovement
campaign = Campaign(
    searchspace=SearchSpace.from_product(parameters=parameters),
    objective=objective,
    recommender=TwoPhaseMetaRecommender(
        recommender=BotorchRecommender(
            allow_repeated_recommendations=True, acquisition_function=AC()
        )
    ),
)


def get_acqf_values(campaign: Campaign):
    surrogate = campaign.get_surrogate()
    acqf_cls = AC
    acqf = acqf_cls().to_botorch(
        surrogate, searchspace, objective, campaign.measurements
    )
    return acqf(
        torch.tensor(
            searchspace.transform(searchspace.discrete.exp_rep).values
        ).unsqueeze(-2)
    )


set_random_seed(0)
x = parameters[0].values
for iter in range(N_DOE_ITERATIONS):
    measured = campaign.recommend(BATCH_SIZE)

    if iter >= 1:
        p = campaign.posterior(searchspace.discrete.exp_rep)
        acqf = get_acqf_values(campaign).detach().numpy()
        m = campaign.measurements
        mean = p.mean.detach().squeeze().numpy()
        std = p.stddev.detach().numpy().squeeze()
        fig, ax1 = plt.subplots()
        ax2 = ax1.twinx()
        ax1.errorbar(x, mean, std, fmt="none", ecolor='gray')
        ax1.plot(m["x_1"], m["Target"], "ok")
        ax1.hlines(campaign.measurements["Target"].min(), min(x), max(x), color="g")
        ax2.plot(x, acqf, "r")
        plt.show()

    look_up_targets(measured, campaign.targets, blackbox, "error")
    campaign.add_measurements(measured)
    print(measured)

@Scienfitz
Copy link
Collaborator

Scienfitz commented Jan 23, 2025

@AdrianSosic we had a bug report more than a year ago about a similar observation, the conclusion then was numerical artifacts that are exponentially less likely for batch sizes > 1 (see slide 8)
We have a folder BugReports with that in it in the Teams folder

@AdrianSosic
Copy link
Collaborator Author

@AdrianSosic we had a bug report more than a year ago about a similar observation

Yes, that's what I meant with Already a while ago, we ... at the beginning of the description, but I forgot that folder existed, so thx for the reminder 👌🏼

the conclusion then was numerical artifacts that are exponentially less likely for batch sizes > 1

I'm not so sure this has something to do with numerics. In fact, it makes sort of sense to me that larger batch sizes mitigate the effect since they result in more exploration (due to batch diversification). So I still think there is a deeper issue that needs to investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants