Skip to content

Releases: jlumpe/gambit

v1.1.0

02 Dec 03:23
Compare
Choose a tag to compare
  • Command line interface:
    • Better error reporting when database file(s) not found
    • Add more details to output of gambit signatures info command
  • Major overhaul of internal Python API and tests (see below)
    • Many fixes to API documentation
  • Increase minimum Python version to 3.9
  • Make compatible with SQLAlchemy 2.0

Internal

  • Changed project structure to src/ layout
  • Build system:
    • Fix issues with package data not being properly included (only affected building distributions archives/wheels)
    • Build with Cython 3
    • Remove build-time dependency on Numpy
  • Removed some unnecessary or outdated modules/code:
    • gambit.util.typing
    • gambit.util.dev
    • gambit.sigs.convert (sparse_to_dense and dese_to_sparse functions moved to gambit.sigs.calc)
    • gambit.db.migrate
  • Moved modules containing test-oriented code out of gambit package and into tests/:
    • gambit.test
    • gambit.sigs.test
    • gambit.results.test
    • gambit.cli.test
  • Remove use of deprecated Python/library features:
    • typing aliases to collection types
  • Other internal API changes:
    • Convert gambit.results subpackage to module
    • Move functions for comparison query result objects to tests/results.py
    • Significant restructuring of classes used to represent query results
    • Removed SequenceFile class, replaced most occurrences with Path / "path-like" (compressed FASTA files are not detected and handled automatically).
  • Tests:
    • Structure as package (add __init__.py files)
    • Simplify test file names (no longer need to be unique due to previous item)
    • Added type annotations throughout
    • Rewrote several test files to remove overly complicated Pytest fixtures
  • Other:
    • Added (hidden) --pretty flag to query subcommand to prettify JSON output
    • Improvements to GitHub CI pipeline
    • HDF5 file format version no

v1.1.0pre1

01 Dec 11:55
Compare
Choose a tag to compare
v1.1.0pre1 Pre-release
Pre-release
Fix CI setup

v1.0.1

13 Mar 06:01
Compare
Choose a tag to compare
  • Significant documentation updates.
  • Better error reporting:
    • When database files cannot be found (in CLI and API).
    • On attempting to open an invalid signatures file.
  • Misc
    • Run tests on Python 3.11 and 3.12.
    • Minor changes to output of gambit signatures info.

v1.0.0

07 Oct 03:39
Compare
Choose a tag to compare

New features

  • tree command for generating hierarchical clustering trees from distance matrices.

General

  • Preferred extensions for genome database files and signatures files have been changed from .db and .h5 to .gdb and .gs.

Performance improvements

  • Use process-based parallelism by default for parsing multiple sequence files (much faster).
  • Speed up gambit dist with -s option applied.

CLI

  • Strip directory and extension from input file IDs. This applies to CSV output for querying distance calculation and IDs in generated signature files.
  • -k and --prefix parameters now default to values used RefSeq database.
  • Add option to specify number of cores to use.
  • Add option to disable progress bar printing.

v0.5.1

22 Aug 02:44
Compare
Choose a tag to compare

Minor edits to project README and metadata.

v0.5.0

18 May 19:50
Compare
Choose a tag to compare

New features

  • gambit dist command for calculating distance matrices.

CLI

  • Sequence file input
    • Explicitly restrict input to FASTA format only.
    • Files may be gzipped.
    • Read input file lists from text files.
  • Minor changes to options of subcommands in signatures group.

API

  • gambit.db subpackage:
    • Database-loading funcs moved to class methods of ReferenceDatabase.
    • Additional taxonomy tree methods.
    • Some additional internal reorganization/refactoring.

v0.4.0

19 Feb 23:34
Compare
Choose a tag to compare

Changes from 0.3.0:

New features

  • Result reporting
    • Results include list of closest reference genomes. This is only reported in JSON-based
      output formats.
    • New "next_taxon" attribute, indicating the next most specific taxon for which the
      threshold was not met.

CLI

  • signatures info subcommand uses current reference DB by default.

Documentation

  • Some improvements to API docs.

API and internals

  • calc_signature() function can take multiple sequences as input.
  • Remove calc_signature_parse() function.
  • Refactoring
    • Rename GAMBITDatabase -> ReferenceDatabase, gambit.db.gambitdb -> .refdb
    • Rename gambit.signatures -> gambit.sigs.
    • Merge gambit.sigs.array, gambit.sigs.meta -> gambit.sigs.base
    • Rename gambit.io.export -> gambit.results
    • Move generic sequence code from gambit.kmers to gambit.seq.
    • Merge gambit.io.seq -> gambit.seq.
    • Rename load_database* funcs -> load_db*.
    • Move gambit.io.json -> gambit.util.json, gambit.io.util -> gambit.util.io,
      remove gambit.io.
    • Moved some other stuff between modules.
  • Improvements to gambit.sigs.hdf5.HDF5Signatures
    • Improvements to .create() method.
    • Support compression.
  • Format-independent functions for reading/writing signature data.
  • jaccarddist_pairwise() function.
  • Add more tree-based methods to Taxon.
  • gambit.metric changes
    • jaccarddist_array and jaccarddist_matrix functions now accept any sequence type (e.g.
      list) for the refs argument, but with diminished performance.

0.4.0b1

10 Jan 02:58
Compare
Choose a tag to compare
0.4.0b1 Pre-release
Pre-release
v0.4.0b1

Update version to 0.4.0b1

v0.3.0

24 Sep 05:51
Compare
Choose a tag to compare

Changes from v0.2.2:

  • CLI updates
    • gambit query now accepts query signatures from a signature file.
    • New command group gambit signatures with info and create subcommands.
    • New debug command group (hidden).
  • Performance enhancements
    • Signature calculation for multiple sequence files can be run in parallel.
    • Signature calculation with large k much faster.
    • Benchmarks for signature calculation.
  • Documentation
    • Installation instructions
    • More complete CLI docs
  • API and internals
    • Major refactor to gambit.kmers and gambit.signatures
      • find_kmers() renamed to calc_signature() and moved to gambit.signatures.calc, related
        functions also renamed and moved.
      • Refactored k-mer search into new find_kmers() function, which finds locations of prefix
        matches in sequence.
      • Several other classes and functions moved from gambit.kmers to gambit.signatures submodules.
      • Rearrangement of stuff within gambit.signatures.
      • Added required kmerspec attribute to AbstractKmerArray.
      • Renamed some KmerSpec attributes
      • Rename gambit.kmers.reverse_complement() -> revcomp()
    • Refactor of Jaccard functions
      • Removed _sparse from function names
      • Array and matrix functions now calculate distance only, renamed from jaccard_* to jaccarddist_*
    • New features
      • Most functions which take DNA sequences now accept str, bytes, or Bio.Seq.Seq.
      • Convert signatures between compatible KmerSpecs.
      • HDF5Signatures close() method and context manager.
    • Other
      • Updated Cython kmers code.
      • Many updates/improvements to tests.

v0.2.2

25 Aug 02:02
Compare
Choose a tag to compare

Changes from v0.2.1:

  • Replace testdb_210126 with testdb_210818. Small enough to include all files, including reference signatures and query sequences, in version control.
  • Store pre-calculated query results for tests.
  • Some other minor test improvements and bug fixes.