Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended approach for utilising gradients #12

Open
TorkelE opened this issue Apr 13, 2024 · 11 comments
Open

Recommended approach for utilising gradients #12

TorkelE opened this issue Apr 13, 2024 · 11 comments

Comments

@TorkelE
Copy link

TorkelE commented Apr 13, 2024

I have created an optimisation problem using PEtab.jl. This gives me a gradient, which I have tried to adapt to LiklihoodProfiler's format using

function loss_grad(p)
    grad = zeros(9)
    opt_prob2_3.compute_gradient!(grad, p)
    return grad
end

My impression is that, by default, this gradient is not utilised. What combination of (profiling) method and local algorithm do you recommend for utilising gradients properly?

What kind of advantage can I expect to get if I have a gradient?

@ivborissov
Copy link
Collaborator

Gradient-based optimizers in general should be more efficient than derivative-free. You can use this gradient in get_interval (... ; loss_grad) function with one of the gradient-based local optimizers local_alg. I didn't have much experience with relevant optimizers available in NLopt but I would consider usingh :LD_SLSQP, :LD_CCSAQ. You can also try :LD_MMA

@TorkelE
Copy link
Author

TorkelE commented Apr 17, 2024

Thanks, this works. I also checked with another person who suggested :LD_LBFGS.

I have tried using gradients, however, the result seems wrong? When checking the result using gradients, the found points have identical parameter values to the initial point (except for the one I am computing the profile with). When I do it without gradients, all parameter values are different in the profile end points.

This does not seem to be directly tied to using a gradient-dependant local alg. LN_NELDERMEAD is fine, however, LD_CCSAQ, LD_LBFGS, LD_MMA, and LD_SLSQP all exhibit this problem (but I am looking into it so not fully sure yet)

This seems weir, right? I am investigating closer, but also figured I'd ask if it is something that you recognise.

@ivborissov
Copy link
Collaborator

Hm, seems gradient based (starting with LD) algs fail to move from the initial point. Is it the same model, that you have sent me ?

@TorkelE
Copy link
Author

TorkelE commented Apr 18, 2024

Yes, it is the same model (although the data points are different, I can try and give you an updated file with the new data points)

@TorkelE
Copy link
Author

TorkelE commented Apr 18, 2024

I have sent an updated project folder

@ivborissov
Copy link
Collaborator

It's a bit weird but it seems the gradient-based methods need much lower tolerance to work properly. In your example I get the same result with LD_MMA (as well as LD_SLSQP, LD_LBFGS ) as I get with the default derivative-free LN_NELDERMEAD when I set scan_tol=1e-6

function loss_grad(p)
  grad = zeros(9)
  petab_problem.compute_gradient!(grad, p)
  return grad
end

conf_int_1 = get_interval(start_p, p_idx, f, :CICO_ONE_PASS; local_alg = :LN_NELDERMEAD, loss_crit, theta_bounds, scan_bounds) 
conf_int_2 = get_interval(start_p, p_idx, f, :CICO_ONE_PASS; local_alg = :LD_LBFGS, loss_grad, scan_tol=1e-6, loss_crit, theta_bounds, scan_bounds) 


@TorkelE
Copy link
Author

TorkelE commented Apr 22, 2024

Thanks, that did work.

However, the runtimes suffer quite bad. E.g. for LD_MMA computing the example interval takes 12 seconds (and only about 200ms for LN_NELDERMEAD). E.g. LD_LBFGS is not as bad, but still about 700 ms (and more then 3x worse than LN_NELDERMEAD). Shouldn't I expect to be able to gain a speed-up by providing a gradient?

@TorkelE
Copy link
Author

TorkelE commented Apr 22, 2024

I should not that even for scan_tol = 1e-6, LN_NELDERMEAD finishes in ~400 ms (so still faster than the gradient-based methods).

@TorkelE
Copy link
Author

TorkelE commented Apr 23, 2024

A final note (sorry for all the comments). When I run using a gradient-based method, also supplying a gradient, I get lots of

┌ Warning: autodiff gradient is not available, switching to finite difference mode
└ @ LikelihoodProfiler ~/.julia/packages/LikelihoodProfiler/Qi97K/src/cico_one_pass.jl:67

messages. Exactly what does this mean? I am supplying a gradient, so autodiff should not be relevant?

@ivborissov
Copy link
Collaborator

In theory the gradient-based methods should be faster. With your model I see that the number of likelihood function calls ("right/left CP counter") with LD_LBFGS is less than with LN_NELDERMEAD (at least for the parameters I have tested), which means LD_LBFGS need less likelihood function calls to get to the endpoint. However derivateve-free LN_NELDERMEAD appear to be faster... It may be due to the simplicity of the model (for more complicated models, the timing comparison may be different) or the way gradients are used and computed in LikelihoodProfiler/PEtab/NLopt. For LikelihoodProfiler I can say we didn't have much experience with gradient-based methods and there are plenty things to optimize in the code of the package. Planning to do it soon.

The warning with autodiff is really surprising if you provide the gradient function. Is it the same model? Can you share the script/function you run ?

@TorkelE
Copy link
Author

TorkelE commented Apr 29, 2024

The warning are from a large scan of auto-generated data sets on a hpc, so it is non-trivial to create a reproducing MWE. I will have a go through, and report to you when/if I get one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants