-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose additional beaker caching backends #15349
Conversation
I just looked at why my new test, test_resolution_cache_db.py failed in my fork's workflow. Apparently, the latest Beaker package 1.12.0 was installed there, whereas I did my local development with 1.11.0. It works fine in 1.11.0 but when I upgraded to 1.12.0 it fails. I need to work out why this is so. |
I created a simple test case demonstrating the issue in Beaker 1.12.0 and working in 1.11.0 and opened an issue with it on the Beaker github page. In the meantime, can we revert the main code base back to Beaker 1.11.0 because who knows how long it will take them to fix this. |
It looks like the database connection isn't released on app shutdown, is there a hook in beaker to do that ? I think that's why the integration tests are eventually failing. |
There is no explicit hook. I looked through the Beaker source code and it looks like they are releasing connections because they are using connections inside a with block.
Also, there seems to be one sqlalchemy engine per process as there is a class level dictionary that mains a separate engine for each unique db url/cache table name combination.
I'm guessing this is an artefact of how the integration tests are implemented.
Each process would have 3 engines in a class level dictionary for each of the 3 caches in galaxy, mulled_resolution, citations, and biotools_service.
Each engine by default will have a connection pool of 5 connections. So that could be up to 15 open connections per process.
How exactly are the integration tests implemented?
Are there multiple threads of execution within each process each running concurrent tests? If so, this could result in all 5 connections in each pool being opened at the same time.This would mean up to 15 connections per process.
Are there multiple concurrent processes running tests? If so there could be up to 15 open connections per process.
From the error I'm assuming the integration tests are running in postgress rather than sqlite.What are the max number of connections set at for postgres?
I could try setting the connection pool size to a very small number just for the integration tests to limit the number of connections.Alternatively, we could up the max allowed postgress connections for the integration tests.This would require a new config parameter for the beaker caches.
On Tuesday, January 31, 2023 at 03:25:10 PM EST, Marius van den Beek ***@***.***> wrote:
It looks like the database connection isn't released on app shutdown, is there a hook in beaker to do that ? I think that's why the integration tests are eventually failing.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
A simple thing to try is to use the same table for all 3 caches. This can be specified in the galaxy.yml.This will cut down on the max number of connections by 2/3's. What galaxy.yml is used by the integration tests?
On Wednesday, February 1, 2023 at 11:41:21 AM EST, Claudio Fratarcangeli ***@***.***> wrote:
There is no explicit hook. I looked through the Beaker source code and it looks like they are releasing connections because they are using connections inside a with block.
Also, there seems to be one sqlalchemy engine per process as there is a class level dictionary that mains a separate engine for each unique db url/cache table name combination.
I'm guessing this is an artefact of how the integration tests are implemented.
Each process would have 3 engines in a class level dictionary for each of the 3 caches in galaxy, mulled_resolution, citations, and biotools_service.
Each engine by default will have a connection pool of 5 connections. So that could be up to 15 open connections per process.
How exactly are the integration tests implemented?
Are there multiple threads of execution within each process each running concurrent tests? If so, this could result in all 5 connections in each pool being opened at the same time.This would mean up to 15 connections per process.
Are there multiple concurrent processes running tests? If so there could be up to 15 open connections per process.
From the error I'm assuming the integration tests are running in postgress rather than sqlite.What are the max number of connections set at for postgres?
I could try setting the connection pool size to a very small number just for the integration tests to limit the number of connections.Alternatively, we could up the max allowed postgress connections for the integration tests.This would require a new config parameter for the beaker caches.
On Tuesday, January 31, 2023 at 03:25:10 PM EST, Marius van den Beek ***@***.***> wrote:
It looks like the database connection isn't released on app shutdown, is there a hook in beaker to do that ? I think that's why the integration tests are eventually failing.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I don't think that will work, the integration tests call the galaxy/lib/galaxy_test/driver/driver_util.py Line 702 in c49abd3
|
I changed the config_schema.yml setting the default cache table names for the 3 beaker caches, mulled resolution, citations, biotools, to the same value, beaker_cache rather than having separate tables. Because beaker creates a separate sqlalchemy engine and associated connection pool for each distinct table this will now create a single connection pool for all 3 caches rather than 3 connection pools. This in turn cuts down on the total number of potentially active connections by about 2/3's. As a consequence the integration tests are no longer failing. |
lib/galaxy/managers/citations.py
Outdated
"cache.type": getattr(config, "citation_cache_type", "ext:database"), | ||
"cache.data_dir": getattr(config, "citation_cache_data_dir", None), | ||
"cache.lock_dir": getattr(config, "citation_cache_lock_dir", None), | ||
"cache.url": getattr(config, "citation_cache_url", None), | ||
"cache.table_name": getattr(config, "citation_cache_table_name", None), | ||
"cache.schema_name": getattr(config, "citation_cache_schema_name", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still default to file, but I also think the getattr
isn't necessary at all and predates a refactoring of the config
instance. All config attributes are always set now, so
"cache.type": getattr(config, "citation_cache_type", "ext:database"), | |
"cache.data_dir": getattr(config, "citation_cache_data_dir", None), | |
"cache.lock_dir": getattr(config, "citation_cache_lock_dir", None), | |
"cache.url": getattr(config, "citation_cache_url", None), | |
"cache.table_name": getattr(config, "citation_cache_table_name", None), | |
"cache.schema_name": getattr(config, "citation_cache_schema_name", None), | |
"cache.type": config.citation_cache_type, | |
"cache.data_dir": config.citation_cache_data_dir, | |
"cache.lock_dir": config.citation_cache_lock_dir, | |
"cache.url": config.citation_cache_url, | |
"cache.table_name": config.citation_cache_table_name, | |
"cache.schema_name": config.citation_cache_schema_name, |
should work too
lib/galaxy/tools/biotools.py
Outdated
"cache.type": getattr(config, "biotools_service_cache_type", "ext:database"), | ||
"cache.data_dir": getattr(config, "biotools_service_cache_data_dir", None), | ||
"cache.lock_dir": getattr(config, "biotools_service_cache_lock_dir", None), | ||
"cache.url": getattr(config, "biotools_service_cache_url", config.database_connection), | ||
"cache.table_name": getattr(config, "biotools_service_cache_table_name", None), | ||
"cache.schema_name": getattr(config, "biotools_service_cache_schema_name", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cache.type": getattr(config, "biotools_service_cache_type", "ext:database"), | |
"cache.data_dir": getattr(config, "biotools_service_cache_data_dir", None), | |
"cache.lock_dir": getattr(config, "biotools_service_cache_lock_dir", None), | |
"cache.url": getattr(config, "biotools_service_cache_url", config.database_connection), | |
"cache.table_name": getattr(config, "biotools_service_cache_table_name", None), | |
"cache.schema_name": getattr(config, "biotools_service_cache_schema_name", None), | |
"cache.type": config.biotools_service_cache_type, | |
"cache.data_dir": config.biotools_service_cache_data_dir, | |
"cache.lock_dir": config.biotools_service_cache_lock_dir, | |
"cache.url": config.biotools_service_cache_url, | |
"cache.table_name": config.biotools_service_cache_table_name, | |
"cache.schema_name": config.biotools_service_cache_schema_name, |
beaker==1.11.0 ; python_version >= "3.7" and python_version < "3.12" | ||
sqlalchemy==1.4.46 ; python_version >= "3.7" and python_version < "3.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package requirements are ideally unpinned. It seems that 1.12.1 fixes your issue, so I think we should remove the pin here (ideally for sqlalchemy too, or if 2.0 fails, to pin it to sqlalchemy<=2
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as pinning package versions in requirements.txt is concerned this has always been a best practice in my experience. In fact, the main galaxy requirements.txt specifies specific versions or ranges of versions for all referenced packages. If you do not pin a referenced package to a specific version or a version range there is no way to guarantee that what gets deployed is the same thing that was tested. This is especially true for third party packages as we have seen with the bug introduced in Beaker v1.12.0. I would think that test-requirements.txt should be consistent with the main requirements.txt file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's typically a distinction being made between libraries and applications that are shipped to users/deployers. Galaxy as an application pins all dependencies hard (i.e to specific, tested versions, see https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/dependencies/pinned-requirements.txt), while libraries should be pinned lightly, so that as the author of an application you have a good chance of being able to pick a set of compatible requirements.
And these packages here are published individually to pypi for re-use in other libraries or applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to Marius' comment, the central list of compatible "light" pinnings for Galaxy Python dependencies is in pyproject.toml
, and the various *requirements.txt
files inside packages/
should mirror them, see my other comment.
lib/galaxy/app.py
Outdated
@@ -331,6 +331,8 @@ def _configure_toolbox(self): | |||
mulled_resolution_cache = CacheManager(**parse_cache_config_options(cache_opts)).get_cache( | |||
"mulled_resolution" | |||
) | |||
# If using database cache clear cache table contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be persistent across processes and restarts, what was the reason to change this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't know the requirements for these caches. How long are entries expected to stay in the caches? I made this change based upon your comment in the meeting that you didn't want the default cache type to be database because you were concerned that data would accumulate in the cache table and not be cleaned out. I also assumed that in a production environment app restarts would be very infrequent. Is that not the case? So I figured this would be a good place to clean out the table to address your concern. I presume data accumulation would also be a problem with a file based cache. So why is there a concern specifically with the database cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restarts can be frequent, for config updates, or because of resources limits, and they can be simultaneous or staggered, so I don't think this is safe.
It's not specific to the database cache, but it's safe to assume people know how to delete a file, while dropping the appropriate table requires a bit more knowledge. The lifetime is typically unlimited, but we may have to drop data occasionally, that for instance was necessary when admins upgraded python versions with incompatible pickle protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a long answer to say that I would default to a file-based cache for simplicity reasons as you're doing now, and database is a good option for production deployments with kubernetes, where you can run out of (very low) file lock limits. If the cache needs to be cleared admins have to do this manually, and that's easier with files.
I had originally mentioned that the beaker docs say that the lock_dir is always required implying that using a database cache would not fix the galaxyproject/galaxy-helm#399 issue in which there were problems with the file system lock directory. |
That's great news, thanks for checking! |
Rather than store cache data in the file system store cache data in the database. Partially addresses issue 15216. For now this was only changed for the mulled resolution cache. Introduced new config parameters for database url, cache table name. Introduced logic to default values of a parameter to values of another parameter. i.e. default cache database url to database_connection. Metadata for this was placed in code. Should probably be placed eventually in config file which would require a change to the config_schema.yml. Introduced a new unit test, test_mulled_resolution_cache_db.py.
ext:database. Rather than store cache data in the file system store cache data in the database. These are the last 2 beaker caches that need to be converted to the database as part of issue 15216. Introduce new config parameters for database url, cache table name. Introduce logic to default values of a parameter to values of another parameter. i.e. default cache database url to database_connection. Metadata for this was placed in code. Should probably be placed eventually in config file which would require a change to the config_schema.yml. Introduced a new unit test, test_citations_db.py Did not introduce a new test for the biotools_service beaker cache because I could not find a place in the code where it was used.
based beaker cache. Change test-requirements.txt to require v beaker 1.11.0 because latest version of beaker apparently introduced a bug that broke the ability to use database as a cache. Also fixed lint errors.
…eaker cache. mypy kept complaining that module 'galaxy' has no attribute 'config' on an 'from galaxy import config' statement. So I changed it to 'import galaxy.config'
…ased beaker cache.
All 3 beaker caches, mulled_resolution, citations, biotools service, now use the same default cache table, beaker_cache. This reduces the number of open db connections because beaker creates a separate sqlalchemy engine and associated connection pool for each distinct table and each connection pool opens up a certain number of connections. This was causing the integration tests to fail because the max postgress connections(100) were being exceeded.
Clear the caches in the event the cache type is database. This will delete any rows in the cache table that could be left over from a prior run of the app preventing an accumulation of stale data.
The test/unit/app directory tree has a test-requirements.txt file that brings in dependencies required by database utility modules imported by the unit test.
Co-authored-by: Nicola Soranzo <[email protected]>
Thanks again @claudiofr, this is great work! |
Thanks for this fix @claudiofr. Does this mean we can remove this setting now? https://github.com/galaxyproject/galaxy-helm/pull/402/files#diff-86e6e8118f9d5ad6d181dd2e12c268062e9a66f5ef98bd0cc44b93661d08e9b2R412 |
You should be able to remove the setting if you set the mulled_resolution_cache_type parameter to ext:database because in this case it uses the database rather than the file system for caching or locking.
See the other settings below that apply when you set the cache type to ext:database.
Also keep in mind if you use ext:database that a new table with a default name of "beaker_cache" will be created and populated and it should probably be monitored and purged of old data periodically.
# Mulled resolution caching. Mulled resolution uses external APIs of
# quay.io, these requests are caching using this and the following
# parameters
#mulled_resolution_cache_type: file
# Data directory used by beaker for caching mulled resolution
# requests.
# The value of this option will be resolved with respect to
# <cache_dir>.
#mulled_resolution_cache_data_dir: mulled/data
# Lock directory used by beaker for caching mulled resolution
# requests.
# The value of this option will be resolved with respect to
# <cache_dir>. #mulled_resolution_cache_lock_dir: mulled/locks
# Seconds until the beaker cache is considered old and a new value is
# created.
#mulled_resolution_cache_expire: 3600
# When mulled_resolution_cache_type = ext:database, this is the url of
# the database used by beaker for caching mulled resolution requests.
# The application config code will set it to the value of
# database_connection if this is not set.
#mulled_resolution_cache_url: null
# When mulled_resolution_cache_type = ext:database, this is the
# database table name used by beaker for caching mulled resolution
# requests.
#mulled_resolution_cache_table_name: beaker_cache
# When mulled_resolution_cache_type = ext:database, this is the
# database schema name of the table used by beaker for caching mulled
# resolution requests.
#mulled_resolution_cache_schema_name: null
|
Thanks for the clarification. Will discuss this with the Systems SIG and see how we can handle chart defaults going forward. |
Closes #15216
Convert beaker mulled resolution, citations and biotools service caches to use the database rather than the file system.
Introduce new config parameters for the beaker cache database url, schema name and cache table name.
Introduce logic to default values of a parameter to values of another parameter. i.e. default cache database url to database_connection. Metadata for this was placed in code. Should probably be placed eventually in config file which would require a change to the config_schema.yml.
Introduced a new unit tests, test_mulled_resolution_cache_db.py, test_citations_db.py
Did not introduce a new test for the biotools_service beaker cache because I could not find a place in the code where it was used.
How to test the changes?
(Select all options that apply)
License