Releases: jlumpe/gambit
Releases · jlumpe/gambit
v1.1.0
- Command line interface:
- Better error reporting when database file(s) not found
- Add more details to output of
gambit signatures info
command
- Major overhaul of internal Python API and tests (see below)
- Many fixes to API documentation
- Increase minimum Python version to 3.9
- Make compatible with SQLAlchemy 2.0
Internal
- Changed project structure to
src/
layout - Build system:
- Fix issues with package data not being properly included (only affected building distributions archives/wheels)
- Build with Cython 3
- Remove build-time dependency on Numpy
- Removed some unnecessary or outdated modules/code:
gambit.util.typing
gambit.util.dev
gambit.sigs.convert
(sparse_to_dense
anddese_to_sparse
functions moved togambit.sigs.calc
)gambit.db.migrate
- Moved modules containing test-oriented code out of
gambit
package and intotests/
:gambit.test
gambit.sigs.test
gambit.results.test
gambit.cli.test
- Remove use of deprecated Python/library features:
typing
aliases to collection types
- Other internal API changes:
- Convert
gambit.results
subpackage to module - Move functions for comparison query result objects to
tests/results.py
- Significant restructuring of classes used to represent query results
- Removed
SequenceFile
class, replaced most occurrences withPath
/ "path-like" (compressed FASTA files are not detected and handled automatically).
- Convert
- Tests:
- Structure as package (add
__init__.py
files) - Simplify test file names (no longer need to be unique due to previous item)
- Added type annotations throughout
- Rewrote several test files to remove overly complicated Pytest fixtures
- Structure as package (add
- Other:
- Added (hidden)
--pretty
flag toquery
subcommand to prettify JSON output - Improvements to GitHub CI pipeline
- HDF5 file format version no
- Added (hidden)
v1.1.0pre1
Fix CI setup
v1.0.1
v1.0.0
New features
tree
command for generating hierarchical clustering trees from distance matrices.
General
- Preferred extensions for genome database files and signatures files have been changed from
.db
and.h5
to.gdb
and.gs
.
Performance improvements
- Use process-based parallelism by default for parsing multiple sequence files (much faster).
- Speed up
gambit dist
with-s
option applied.
CLI
- Strip directory and extension from input file IDs. This applies to CSV output for querying distance calculation and IDs in generated signature files.
-k
and--prefix
parameters now default to values used RefSeq database.- Add option to specify number of cores to use.
- Add option to disable progress bar printing.
v0.5.1
v0.5.0
New features
gambit dist
command for calculating distance matrices.
CLI
- Sequence file input
- Explicitly restrict input to FASTA format only.
- Files may be gzipped.
- Read input file lists from text files.
- Minor changes to options of subcommands in
signatures
group.
API
gambit.db
subpackage:- Database-loading funcs moved to class methods of
ReferenceDatabase
. - Additional taxonomy tree methods.
- Some additional internal reorganization/refactoring.
- Database-loading funcs moved to class methods of
v0.4.0
Changes from 0.3.0:
New features
- Result reporting
- Results include list of closest reference genomes. This is only reported in JSON-based
output formats. - New "next_taxon" attribute, indicating the next most specific taxon for which the
threshold was not met.
- Results include list of closest reference genomes. This is only reported in JSON-based
CLI
signatures info
subcommand uses current reference DB by default.
Documentation
- Some improvements to API docs.
API and internals
calc_signature()
function can take multiple sequences as input.- Remove
calc_signature_parse()
function. - Refactoring
- Rename
GAMBITDatabase
->ReferenceDatabase
,gambit.db.gambitdb
->.refdb
- Rename
gambit.signatures
->gambit.sigs
. - Merge
gambit.sigs.array
,gambit.sigs.meta
->gambit.sigs.base
- Rename
gambit.io.export
->gambit.results
- Move generic sequence code from
gambit.kmers
togambit.seq
. - Merge
gambit.io.seq
->gambit.seq
. - Rename
load_database*
funcs ->load_db*
. - Move
gambit.io.json
->gambit.util.json
,gambit.io.util
->gambit.util.io
,
removegambit.io
. - Moved some other stuff between modules.
- Rename
- Improvements to
gambit.sigs.hdf5.HDF5Signatures
- Improvements to
.create()
method. - Support compression.
- Improvements to
- Format-independent functions for reading/writing signature data.
jaccarddist_pairwise()
function.- Add more tree-based methods to
Taxon
. gambit.metric
changesjaccarddist_array
andjaccarddist_matrix
functions now accept any sequence type (e.g.
list
) for therefs
argument, but with diminished performance.
0.4.0b1
v0.3.0
Changes from v0.2.2
:
- CLI updates
gambit query
now accepts query signatures from a signature file.- New command group
gambit signatures
withinfo
andcreate
subcommands. - New
debug
command group (hidden).
- Performance enhancements
- Signature calculation for multiple sequence files can be run in parallel.
- Signature calculation with large
k
much faster. - Benchmarks for signature calculation.
- Documentation
- Installation instructions
- More complete CLI docs
- API and internals
- Major refactor to
gambit.kmers
andgambit.signatures
find_kmers()
renamed tocalc_signature()
and moved togambit.signatures.calc
, related
functions also renamed and moved.- Refactored k-mer search into new
find_kmers()
function, which finds locations of prefix
matches in sequence. - Several other classes and functions moved from
gambit.kmers
togambit.signatures
submodules. - Rearrangement of stuff within
gambit.signatures
. - Added required
kmerspec
attribute toAbstractKmerArray
. - Renamed some
KmerSpec
attributes - Rename
gambit.kmers.reverse_complement()
->revcomp()
- Refactor of Jaccard functions
- Removed
_sparse
from function names - Array and matrix functions now calculate distance only, renamed from
jaccard_*
tojaccarddist_*
- Removed
- New features
- Most functions which take DNA sequences now accept
str
,bytes
, orBio.Seq.Seq
. - Convert signatures between compatible
KmerSpec
s. HDF5Signatures
close()
method and context manager.
- Most functions which take DNA sequences now accept
- Other
- Updated Cython
kmers
code. - Many updates/improvements to tests.
- Updated Cython
- Major refactor to
v0.2.2
Changes from v0.2.1
:
- Replace
testdb_210126
withtestdb_210818
. Small enough to include all files, including reference signatures and query sequences, in version control. - Store pre-calculated query results for tests.
- Some other minor test improvements and bug fixes.