Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal for serialisation of models #2787

Closed
martinjrobins opened this issue Mar 20, 2023 · 7 comments · Fixed by #3397
Closed

proposal for serialisation of models #2787

martinjrobins opened this issue Mar 20, 2023 · 7 comments · Fixed by #3397
Assignees
Labels
difficulty: hard Will take several weeks feature in-progress Assigned in the core dev monthly meeting priority: medium To be resolved if time allows

Comments

@martinjrobins
Copy link
Contributor

Description

  1. Develop a text serialisation format for pybamm models
  2. Write an export tool to export pybamm models into this format
  3. Write an import tool to read in models in this format

Motivation

One of the original goals of PyBaMM was to facilitate the sharing of physics-based battery models. PyBaMM has been very successful in this, allowing users to share pybamm models created using the Python language. However, to enable a wider "shareability" of PyBaMM models I would propose that we need a text serialisation format that pybamm models can be converted to and created from. This would enable easiler interoperability of pybamm with other solvers or tools. For example, if someone was developing a battery model in Matlab they could write it out in this format for later import into pybamm. Or if someone was developing a battery parameterisation tool (in any language) they could allow the import of pybamm models by writing a reader for our serialisation format.

Possible Implementation

I would propose that we focus on serialisation of pybamm models that are already discretised and ready to be solved, as sharing a continuum model still leaves many questions on how in particular this model should be discretised.

My proposal for a serialisation format is a text based, human readable language based on tensors (inspired mainly by the TACO tensor algebra compiler for reference), and example of which is below. This is based on another project I'm working on and I'm happy to iterate on this, just wanted to put something down to start the conversation!

in_i {                    //"input" tensor, describes input parameters to the model
    r -> [0, inf],
    k -> [0, inf],
}
sm_ij {                     // rank 2 tensor (indexed by i and j)
    (0..2, 0..2): 1,       // diagonal entries denoted by .. range format
}
I_ij {
    (0:2, 0:2): sm_ij,    // block entries denoted by : range format
    (2, 2): 1,                // sparse tensors, any indices not here are implicitly zero
    (3, 3): 1,
}
u_i {
    y -> R**2 = 1,      // "u" tensor is the state, here y is a vector of dimension 2, initialised to 1 at t=0
    z -> R**2,
}
rhs_i {
    (r * y_i) * (1 - (y_i / k)),    // expressions use tensor index notation
    (2 * y_i) - z_i,
}
F_i {                                  // model equations expressed by "F" and "G" tensors
    dot(y_i),                       // such that the equations are $ F(u, \dot{u}, t) = G(u, t) $
    0,
    0,
}
G_i {
    sum(j, I_ij * rhs_i),          // this is a matrix multiply using a sum over index "j"
}
out_i {                             // "out" tensor describes that output of the model
    y_i,
    t,
    z_i,
}

Additional context

There have been a few proposals for serialisation formats for model parameters (e.g. BPX) but I would argue that the usefulness of these is very much hampered by the lack of a model serialisation format. Having a bunch of parameters means nothing unless you have a description of the model that uses these parameters. E.g. $y = exp(-at)$ with $a=1$ is very different to $y = exp(-at/10)$ with $a=1$, even if both of those models are very similar (you would describe them both as "exponential decay", just the details are different)

@valentinsulzer
Copy link
Member

Sounds good. Anytree also have a json exporter which would be easier to implement but maybe not as generalizable. What about FMU?

@martinjrobins
Copy link
Contributor Author

FMU is nice that its an existing standard, I like that. The bit I don't like is the XML description, its machine readable but not human readable. There is a reason we program in languages and not in xml.....

Mind you, the "existing standard" bit is very convincing, so happy to be pursuaded!

@martinjrobins
Copy link
Contributor Author

can FMU do sparse vectors or linear algebra? I can't find this....?

@martinjrobins
Copy link
Contributor Author

I'm still playing around with a human-readable serialisation format similar to the above, but would suggest that for now we just go with @tinosulzer's suggestion of a json exporter, you basically write out every node in the expression tree in a large json-format tree. Not really readable but it will be much easier to write the exporter/importer.

I think we should stick a version number in the output and make sure that if the format ever needs to change (e.g. a node in the expression tree gets a new field) we increment the version number, and make sure we support reading in all prior versions)

@martinjrobins
Copy link
Contributor Author

martinjrobins commented Apr 17, 2023

an alternative to json is flatbuffers (https://flatbuffers.dev/). This could simplify transferring pybamm models to other languages (e.g. Julia). Saying that, there are lots of libraries for json as well, so perhaps not simplify, just make the actual data transfer a lot faster!

UPDATE: Supported languages are:

C
C++ - snapcraft.io
C# - nuget.org
Dart - pub.dev
Go - go.dev
Java - Maven
JavaScript - NPM
Kotlin
Lobster
Lua
PHP
Python - PyPI
Rust - crates.io
Swift - swiftpackageindex
TypeScript - NPM
Nim
Julia - https://docs.juliahub.com/FlatBuffers/rNtRK/0.6.1/

@martinjrobins
Copy link
Contributor Author

There is also this: https://protobuf.dev/

@valentinsulzer valentinsulzer added priority: medium To be resolved if time allows difficulty: hard Will take several weeks labels May 15, 2023
@valentinsulzer valentinsulzer added the in-progress Assigned in the core dev monthly meeting label Jun 12, 2023
@pipliggins
Copy link
Contributor

pipliggins commented Jul 7, 2023

Just wanted to post a quick progress update on this issue:

I first looked at Pydantic, a library which can use type annotations to generate a JSON schema for serialising Python objects. To integrate with Pydantic, PyBaMM would have to be type-hinted throughout and inherit from the Pydantic's BaseModel class. Actually, we’d have to make use to this patch which fixes a Pydantic issue related to property getters/setters – a pattern used frequently in PyBaMM. However Pydantic’s serialisation support doesn’t work out of the box for PyBaMM, since most PyBaMM objects are not natively JSON serialisable. I.e., it does not seem that Pydantic can infer this from the base types alone so we'd still have to manually extend the JSONEncoder for each PyBaMM object we wish to serialise.

While experimenting with Pydantic, I added type hints to most expression tree files. I’ll include these in a separate pull request even if they end up not being required for serialization.

Before continuing with Pydantic, I looked for more automated alternatives. JSONpickle is an obvious candidate: the library reads/writes JSON files for pickleable Python objects. The authors demonstrate cross-language support with a deserialization module that can reconstruct Python objects in JavaScript, but similar code could be written in any language. JSONpickle also supports complex Python objects: “py/id” tags are used to handle multiple references made to the same Python object.

I ran a few tests of JSONpickle. First, I serialized an expression tree (inspired by the PyBaMM expression tree example)

Import pybamm
import jsonpickle

y = pybamm.StateVector(slice(0,1))
t = pybamm.t

equation = 2*y * (1 - y) + t

eq_json = jsonpickle.dumps(equation, keys=True)
eq_loaded = jsonpickle.loads(eq_json, keys=True)

This works great! Next, I tried to serialize a model object which is part of a PyBaMM simulation (inspired by this example):

import pybamm
import jsonpickle
import jsonpickle.ext.numpy as jsonpickle_numpy
jsonpickle_numpy.register_handlers()

model = pybamm.lithium_ion.DFN()
sim = pybamm.Simulation(model)
state = sim.__getstate__()

model_json = jsonpickle.dumps(state[‘model’], keys=True)
model_r = jsonpickle.loads(model_json, keys=True)

Unfortunately, this code produces errors. I started debugging by writing a script that recurses through the object and tries dumping & loading each property. This reveals errors with multiple properties in the object structure. I’ve attached a stack dump I generated: 2023-07-06_13-06-43.txt

I’m going to continue debugging issues here while having a look at Google Protobuf as an alternative cross-platform serialisation method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: hard Will take several weeks feature in-progress Assigned in the core dev monthly meeting priority: medium To be resolved if time allows
Projects
Development

Successfully merging a pull request may close this issue.

3 participants