Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide BiopythonDeprecationWarnings when reading certain sequence files #1731

Open
wants to merge 3 commits into
base: victorlin/centralize-sequence-read
Choose a base branch
from

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jan 17, 2025

Description of proposed changes

Biopython 1.85 shows a deprecation warning when using format='fasta' with files that start with anything but '>'.

The warning as-is should not be exposed to Augur users. It is not triggered when reading files with format='fasta-pearson', so this is the easiest thing to do continue accepting such files for Biopython >=1.85.

This way, Augur users get consistent, backwards compatible behavior no matter the Biopython version they use.

Related issue(s)

Fixes #1727

Checklist

@victorlin victorlin self-assigned this Jan 17, 2025
@victorlin victorlin marked this pull request as ready for review January 17, 2025 23:26
@corneliusroemer corneliusroemer linked an issue Jan 20, 2025 that may be closed by this pull request
@corneliusroemer
Copy link
Member

Suggestion for this elegant solution with version switch is from @tsibley: #1727 (comment)

@@ -19,7 +24,7 @@ def read_sequence(

def read_sequences(
*paths: Iterable[str],
format: str = "fasta",
format: str = BIOPYTHON_FASTA_FORMAT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why we don't do the BIOPYTHON_FASTA_FORMAT version switch for biopython version inside this function.

Rather than requiring users (even internal ones) to pass the fasta argument, we should just rewrite fasta -> fasta-pearson inside the function here and note this in the docstring and our changelog.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

# Allow comments in FASTA format using fasta-pearson in later biopython versions
if Version(version("biopython")) >= Version("1.85") and biopython_format == "fasta":
biopython_format = "fasta-pearson"

@@ -399,7 +399,7 @@ def run(args):
aln = args.alignment
ref = None
if args.root_sequence:
for fmt in ['fasta', 'genbank']:
for fmt in [BIOPYTHON_FASTA_FORMAT, 'genbank']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could abstract away this change here by doing the version switch inside of read_sequence(s) helpers instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

# Allow comments in FASTA format using fasta-pearson in later biopython versions
if Version(version("biopython")) >= Version("1.85") and biopython_format == "fasta":
biopython_format = "fasta-pearson"

Copy link
Member

@corneliusroemer corneliusroemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unit test testing that a variety of fasta's are supported would be cool maybe, we already know that our integration tests catch it, but not unit ones.

@victorlin victorlin force-pushed the victorlin/centralize-sequence-read branch from 63cdf89 to cc365cb Compare January 21, 2025 21:11
@victorlin victorlin force-pushed the victorlin/fix-ci branch 2 times, most recently from 945011a to b031ce6 Compare January 21, 2025 23:59
@victorlin victorlin force-pushed the victorlin/centralize-sequence-read branch from cc365cb to 8fa862a Compare January 21, 2025 23:59
@victorlin victorlin force-pushed the victorlin/centralize-sequence-read branch from 8fa862a to cf5c8be Compare January 22, 2025 00:33
Instead of relying on Biopython's which are subject to change across
versions.
Biopython 1.85 shows a deprecation warning when using format='fasta'
with files that start with anything but '>'.

The warning as-is should not be exposed to Augur users. It is not
triggered when reading files with format='fasta-pearson', so this is the
easiest thing to do continue accepting such files for Biopython
>=1.85.

This way, Augur users get consistent, backwards compatible behavior no
matter the Biopython version they use.
This reverts "Temporarily disable failing test" (f5323be) which should no longer fail after previous changes.
@victorlin victorlin force-pushed the victorlin/centralize-sequence-read branch from cf5c8be to fbb56fe Compare January 22, 2025 01:35
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.01%. Comparing base (fbb56fe) to head (5c53256).

Additional details and impacted files
@@                          Coverage Diff                           @@
##           victorlin/centralize-sequence-read    #1731      +/-   ##
======================================================================
+ Coverage                               72.94%   73.01%   +0.06%     
======================================================================
  Files                                      79       79              
  Lines                                    8316     8326      +10     
  Branches                                 1696     1698       +2     
======================================================================
+ Hits                                     6066     6079      +13     
+ Misses                                   1961     1959       -2     
+ Partials                                  289      288       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI failing with new BioPython deprecation warning
2 participants