What are the recommended approaches for querying completed results? #1581

Andrew-S-Rosen · 2023-04-19T05:37:19Z

Andrew-S-Rosen
Apr 19, 2023

I'm just getting started with covalent, and I found it super intuitive to set up and run workflows (lattices). That said, it's not yet clear to me how to best query the results when dealing with high-throughput calculations. Do you have any suggestions? Are there mechanisms for querying the data that is generated aside from using the GUI?

For instance, here are some scenarios:

I run 1000 lattices. For this toy example, the lattice takes in a string as input and returns "hello " prepended to it. Is there a way to query the dispatch list to find all jobs that completed successfully and had, say, an input of "world"?
I run 1000 lattices across multiple different projects where the lattices aren't necessarily related to one another. All the results are getting deposited in the same local log directory and dispatch list, so I now have a pile of results but they're across my various projects and are a bit of a mess. How can I query the inputs/outputs for all the lattices associated with a given project (e.g. by querying some sort of metadata associated with the lattices).

I know I can query individual lattices by the dispatch ID with covalent.get_result(), but that requires you to know in advance which IDs you need and also is not really meant to return multiple results. In general though, I'm looking for advice on what one does after you've set everything up and generated a lot of results.

For context, I am coming from this as a user of Fireworks, which stores results in a Mongo database that can then be queried by the user.

Answered by santoshkumarradha

Apr 19, 2023

Hello @arosen93! 👋 Thank you for your interest in Covalent!

I'm still not sure about the best way to query results for high-throughput calculations. Do you have any suggestions? Are there any other methods for querying the generated data besides using the GUI?

I understand your concern. Covalent's design approach differs slightly from other HPC workflows, such as Fireworks. In scenarios with high-throughput calculations, for example, 1000s of computations, it's preferable to create 1000s of electrons within a single lattice rather than dispatching 1000s of lattices. This allows post-processing steps and other operations to be performed right after the completion of those 1000 electrons.…

View full answer

santoshkumarradha · 2023-04-19T18:43:57Z

santoshkumarradha
Apr 19, 2023
Maintainer

Hello @arosen93! 👋 Thank you for your interest in Covalent!

I'm still not sure about the best way to query results for high-throughput calculations. Do you have any suggestions? Are there any other methods for querying the generated data besides using the GUI?

I understand your concern. Covalent's design approach differs slightly from other HPC workflows, such as Fireworks. In scenarios with high-throughput calculations, for example, 1000s of computations, it's preferable to create 1000s of electrons within a single lattice rather than dispatching 1000s of lattices. This allows post-processing steps and other operations to be performed right after the completion of those 1000 electrons. For your reference, we typically run stress tests with 5-10k electrons in a single lattice, which is well-supported by our system. However, when using a local laptop for hosting the UI, you may experience some UI stuttering. We are working on optimizing this as well !

In this setup, "querying" a result from one of those 1000 calculations is simply accessing a node's result, rather than dealing with 1000s of lattices. The electrons' inputs/outputs can be programmatically accessed via the result object as well.

Consider a lattice as a single "experiment" and an electron as a unit of computation. For example, if you're generating 1000 different configurations of material to calculate the total energy and then selecting a few configurations for further analysis, you'd create a single lattice with 1000 individual electrons. These results can then be combined in a sublattice for subsequent, higher-quality calculations.

In future releases, we plan to introduce features like adding or swapping electrons in existing lattices, re-dispatching lattices to continue from where they stopped or failed, and more. These enhancements are designed to improve the high-throughput calculation experience just like the one you mentioned. Additionally, we will be implementing project management features such as tagging and organization in the UI for better experiment organization as well !

Regarding your question on managing results from multiple projects:

I run 1000 lattices across multiple different projects where the lattices aren't necessarily related to one another. All the results are getting deposited in the same local log directory and dispatch list, so I now have a pile of results, but they're across my various projects and could be better.

Covalent maintains a central database for all your dispatches, irrespective of the lattice or where it is dispatched from. Running covalent config in a shell will display the db_path configuration, typically located at /.local/share/covalent/dispatcher_db.sqlite. This database contains information about all runs and their corresponding asset folders. While interfacing with the database is a bit low-level, you are more than welcome to explore it if you feel adventurous!

Please feel free to let us know if you have any other questions or feature requests (or if you want to contribute !) ⭐

1 reply

Andrew-S-Rosen Apr 19, 2023
Author

Thank you so much! This is a huge help. I hadn't thought of organizing electrons within lattices in this way, and this is super helpful for me in terms of figuring out how to mentally transition between workflow systems.

I'm going to take some time to digest this and see if I can put it all to practice! And thanks for sharing some forthcoming plans too!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the recommended approaches for querying completed results? #1581

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

What are the recommended approaches for querying completed results? #1581

Andrew-S-Rosen Apr 19, 2023

Replies: 1 comment · 1 reply

santoshkumarradha Apr 19, 2023 Maintainer

Andrew-S-Rosen Apr 19, 2023 Author

Andrew-S-Rosen
Apr 19, 2023

Replies: 1 comment 1 reply

santoshkumarradha
Apr 19, 2023
Maintainer

Andrew-S-Rosen Apr 19, 2023
Author