-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DifferentiationInterface for AD in Implicit Solvers #2567
base: master
Are you sure you want to change the base?
Conversation
In order for this to be completely done we'll need a DI equivalent for the SparseDiffTools Is a good way to do this to make an extension in SciMLOperators for DifferentiationInterface that will have something like a @ChrisRackauckas @oscardssmith @gdalle Any thoughts? |
Yes |
@avik-pal might already have one? |
Awesome work @jClugstor, thanks! Ping me when this is ready for a first round of DI-specific review.
Just to be clear, this wasn't possible before? So is this the first time that Enzyme can be used out-of-the-box to solve ODEs?
Another option, which requires a bit more work (and is probably not worth it) would be to make SparseDiffTools compatible with the sparsity API of ADTypes v1. I think it might allow a more seamless upgrade. See e.g. JuliaDiff/SparseDiffTools.jl#298 for the detection aspect, and there should be a similar issue for the coloring aspect. Speaking of SparseDiffTools, it still has an edge over DI when combined with FiniteDiff. The PR JuliaDiff/FiniteDiff.jl#191 could fix that, maybe @oscardssmith would be willing to take another look?
Agreed, preparation is a one-time cost so I don't think we should worry too much (at least in the prototype stage).
What do you mean by unexpected sparse things?
We may also want to involve @oschulz and his AutoDiffOperators package to avoid duplication of efforts? As a side note, DifferentiationInterface only has two dependencies: ADTypes and LinearAlgebra. For packages that use it extensively, I think it's reasonable to make it a full dep instead of a weakdep. |
@@ -7,6 +7,8 @@ version = "1.3.0" | |||
ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b" | |||
ArrayInterface = "4fba245c-0d91-5ea0-9b3e-6abc04ee57a9" | |||
DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e" | |||
DifferentiationInterface = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63" | |||
Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Enzyme need to become a dependency? This adds significant install overhead, but if AutoEnzyme
is to be the new default AD then it makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably doesn't need to be a dependency unless we're committing to having it be the default.
@@ -25,6 +29,7 @@ ADTypes = "1.11" | |||
ArrayInterface = "7" | |||
DiffEqBase = "6" | |||
DiffEqDevTools = "2.44.4" | |||
DifferentiationInterface = "0.6.23" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DifferentiationInterface = "0.6.23" | |
DifferentiationInterface = "0.6.28" |
the other deps are also missing compat bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DifferentiationInterface = "0.6.23" | |
DifferentiationInterface = "0.6.31" |
alg, autodiff = AutoForwardDiff(chunksize = cs)) | ||
function prepare_ADType(alg::AutoFiniteDiff, prob, u0, p, standardtag) | ||
# If the autodiff alg is AutoFiniteDiff, prob.f.f isa FunctionWrappersWrapper, | ||
# and fdtype is complex, fdtype needs to change to something not complex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that DI does not explicitly support complex numbers yet. What I mean by that is that we forward things to the backend as much as possible, so if the backend does support complex numbers then it will probably work, but there are no tests or hard API guarantees on that. See JuliaDiff/DifferentiationInterface.jl#646 for the discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that some differentiation operators are not defined unambiguously for complex numbers (e.g. the derivative for complex input)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enzyme has an explicit variant of modes for complex numbers, that it probably would be wise to similarly wrap here (by default it will instead err warning about ambiguity if a function returns a complex number otherwise): https://enzyme.mit.edu/julia/stable/api/#EnzymeCore.ReverseHolomorphic . @gdalle I'm not sure DI supports this yet? so perhaps that means you may need to just call Enzyme.jacobian / autodiff directly in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jClugstor can you maybe specify where we will encounter complex numbers by filling the following table?
derivative | jacobian | |
---|---|---|
complex inputs possible | yes / no | yes / no |
complex outputs possible | yes / no | yes / no |
When there are both complex inputs and complex outputs, that's where we run into trouble because we cannot represent derivatives as a single scalar. In that case, the differentiation operators are not clearly defined (the Jacobian matrix is basically twice as big as it should be) so we would need to figure out what convention the ODE solvers need (see https://discourse.julialang.org/t/taking-complex-autodiff-seriously-in-chainrules/39317).
@wsmoses I understand your concern, but I find it encouraging that DI actually allowed Enzyme to be used here for the first time (or at least so I've been told). This makes me think that the right approach is to handle complex numbers properly in DI instead of introducing a special case for Enzyme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure adding proper complex number support to DI would be great, but a three line change here to use in-spec Complex support when there's already overloads for other ADTypes feels reasonable?
e.g. something like
function jacobian(f, x::AbstractArray{<:Complex}, integrator::WhatevertheTypeIs{<:AutoEnzyme})
Enzyme.jacobian(ReverseHolomorphic, f, x)
end
from the discussion in JuliaDiff/DifferentiationInterface.jl#646 I think DI complex support is a much thornier issue. In particular, various tools have different conventions (e.g. jax vs pytorch pick different conjugates of what is propagated). So either DI needs to make a choice and shim/force all tools to use it (definitely doable), and then user code must be converted to that convention (e.g. a separate shim on the user side). For example, suppose DI picked a different conjugate from forwarddiff.jl. DI could write its shim once in forward diff to convert which is reasonable. But suppose one was defining a custom rule within ForwardDiff and the code called DI somewhere, now that user code needs to conditionally do a different the shim to conjugate which feels kind of nasty to be put everywhere (in contrast to a self consistent assumption). I suppose the other alternative is for DI to not pick a convention, but that again prevents users from using since it's not possible to know whether they get the correct value for them -- and worse, they won't know when they need to do a conversion or not.
Thus, if complex support is desired, a three line patch where things are explicitly supported seems okay (at least until the DI story is figured out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that for now, this change seems to do the job (although it raises the question of consistency with the other backends that are handled via DI). But what will happen if the function in question is not holomorphic? That's the thorniest part of the problem, and that's why I wanted to inquire a bit more as to what kind of functions we can expect. Perhaps @jClugstor or @ChrisRackauckas can tell us more?
In any case, I have started a discussion on Discourse to figure out the right conventions: https://discourse.julialang.org/t/choosing-a-convention-for-complex-numbers-in-differentiationinterface/124433
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the Enzyme-specific fix only handles dense Jacobians, not sparse Jacobians (which are one of the main reasons to use DI in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I can't really tell you much about the complex number support, other than previously only ForwardDiff or FiniteDiff were used, so when someone used an implicit solver on a complex problem, their conventions were used I guess. Also just wanted to note that the code this comment is on is just making sure that the FiniteDiff fdtype isn't complex if the function is a function wrapper and doesn't have to do with complex numbers through the solver in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest release of DI inches closer to support for complex numbers. I read a little about conventions for non-holomorphic differentiation and it was a mess, so as a starting point DI assumes that the function is holomorphic. If you want e.g. a Jacobian, it is pretty much the only convention that makes sense anyway, otherwise you end up with a
Add a dispatch to https://github.com/SciML/NonlinearSolve.jl/blob/master/lib/SciMLJacobianOperators/src/SciMLJacobianOperators.jl#L115 |
As far as I know this is the first time Enzyme has been used for the implicit solvers yes. |
@avik-pal I noticed that the constructors for your |
the prepare_jvp and prepare_vjp functions assume a 2/3 arg function for oop/iip respectively, that won't hold for ordinarydiffeq |
Hey @jClugstor, just a friendly ping, do you need a hand on this one? |
Not particularly, I've just been otherwise occupied the past few days. I'm just going through all the tests and making sure that they pass one by one. Most of the difficulties have been related to making sure that the sparse AD works. For example making sure that if using sparse AD certain caches are constructed with sparse LU operators when needed etc. Next step is making sure the stats tests pass, previously all of the function calls in the Jacobian/JacVec calculations were being counted. But obviously that's not true when using DI, so we'll need a better way to count how many times the user provided function has been called. I have some ideas already. |
I did something similar in DITest, you may want to take a look: https://github.com/JuliaDiff/DifferentiationInterface.jl/blob/main/DifferentiationInterfaceTest/src/tests/benchmark.jl#L1-L22 |
Also note that counting function executions only makes sense for AD backend like FiniteDiff and ForwardDiff. When you use Enzyme, it doesn't actually call the function while differentiating it (that's the difference between source transformation and operator overloading). So I'm not sure how you plan to do the counting in that case? |
We count the number of times Enzyme is called. |
I got pretty close to getting the stats working, but I was off by one for some of them. I think the reason is that when the DI prep object is created during the building of the Wrapping the user function so that there's a counter every time it's called would work for FiniteDiff, ForwardDiff, but it sounds like it won't work for Enzyme and Zygote etc. Maybe one way is to have those counter wrappers have custom AD rules, so that when they're differentiated, it just differentiates the function inside, and iterates the counter? |
Indeed that happens, but there might also be a call for reasons like getting the size and eltype of During execution too, the number of calls might be off by one, depending on the backend and how it handles things like |
I think for now it's reasonable to disregard the function calls from the Jacobian preparation. Ideally though I think we would like to make sure we count every function call. I think that can be for a later time. @oscardssmith does that sound ok? |
sounds good. |
So what's next? |
Once I get the stats tests working again, it looks like this will be really close. Many of the failing CI tests are also failing on main. But:
|
For this part specifically, I put a hidden preparation update function called |
The relevant functions are here: |
What does it even mean to resize the Jacobian config for a sparse Jacobian? How do you anticipate the sparsity pattern for a bigger input variable 🤔 ? |
Think about a PDE semi-discretization in 1D: it's just a banded matrix where the bands are all the same, it's just the number of times you repeat the pattern. This is very common for example in adaptive meshing PDE solvers. |
Here's a stop gap solution:
I think if auto-sparse is fast enough, (3) might go away naturally. I know the Trixi case only really uses the SSP methods so they would avoid this. Some of the finite element cases may be investigating (3) but we don't have full support already, so this basically all folds into the next OrdinaryDiffEq breaking v7 and we can just look forward not backwards on this. But I believe most finite element cases would appreciate (2) anyways? The other thing that could be done is that the case of (3) would require that the new sparsity pattern is given with a special form of |
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
This is at a point where we can do stuff like this:
and it actually uses sparsity detection and greedy jacobian coloring plus Enzyme to compute the Jacobians.
Some things I'm unsure about:
The current behavior is to use Jacobian coloring and SparseDiffTools by default. In order to keep that up, we have to wrap any ADType given in an AutoSparse unless it's already an AutoSparse. This does change the ADType that the user entered to be wrapped in an AutoSparse, which feels weird to me. Maybe there should be an option to just directly use the ADType entered, but by default we wrap it into an AutoSparse? I'm not sure.
The biggest issue is that the way the sparsity detectors work with DI is by using operator overloading (both TracerSparsityDetector and SymbolicsSparsityDetector do), but that's an issue when using AutoSpecialilzation, because of the FunctionWrappers. The solution I found was to just unwrap the function in the preparation process. I'm not sure what performance implication this will have, but I don't think it should do much, since the preparation should be run just once.
There's still pieces in here that use raw SparseDiffTools, (build_J_W) that I haven't looked in to how to convert to DI yet.
I may need to fix some of the versions.
There are some places that are getting sparse things where it's not expected.