Skip to content

Releases: biotite-dev/biotite

Biotite 1.1.0

24 Dec 12:44
e76eb24
Compare
Choose a tag to compare

Changelog

Additions

  • Support for Python 3.13 (#661).
  • Support for structural alphabets that encodes geometric information from structure.AtomArray objects in sequence.Sequence objects.
    • structure.alphabet.to_3di converts a structure into a 3Di sequence from Foldseek. (#665)
    • structure.alphabet.to_protein_blocks() converts a structure into a Protein Blocks sequence. (#676)
    • Added color schemes and defailt sequence.align.SubstitutionMatrix for these structural alphabets. (#682)
  • Support for positional substitution matrices. (#655)
    • sequence.PositionalSequence acts as a placeholder sequence for a sequence profile in alignment functions from sequence.align.
    • sequence.align.SubstitutionMatrix.as_positional() expands a substitution matrix into a positional substitution matrix.
  • New functionalities for structure.io.pdbx:
    • Custom annotations can be written with set_structure() with extra_fields=True. (#669)
    • Missing columns in atom_site category are now handled by using sensible default annotations. (#670)
    • Added compress() function that automatically finds optimal encodings for BinaryCIFFile, decreasing the file size by a factor of approx 8x. (#674)
    • get_assembly() now adds a sym_id annotation to better distinguish copies of the asymmetric unit. (#700)
  • Added PDBFile.get_space_group() and PDBFile.set_space_group() to read and write the space group information. (#707)
  • Added structure.concatenate(), which supports concatenating more than AtomArrayStack objects. (#712)
  • Performance improvements:
    • Doubled speed to parsing CIF files with structure.io.pdbx.CIFFile (#722).
    • Tripled speed of pickling sequence.align.KmerTable objects. (#664)
      • Speed of transferring sequence.align.KmerTable between processes improves likewise.

Changes

  • structure.io.pdbx.set_structure() does not write 'canonical' bonds to struct_conn category anymore. (#678)
  • The internal Chemical Component Dictionary is not version controlled anymore. (#687, #716)
    • For installations from local repository clones, it must be built with python -m biotite.setup_ccd.

Fixes

  • structure.io.pdbx can now handle atom_site categories with quoted values containing whitespaces. (#673)
  • nan values in structure.AtomArray now count as equal when comparing AtomArray objects. (#714)
  • Fixed wrong band calculation in sequence.align.align_banded() that occurred when the given band was outside the sequence bounds. (#723)
    • This lead to premature traceback termination in rare cases.

Biotite 1.0.1

02 Sep 08:32
0f02222
Compare
Choose a tag to compare

Changelog

Fixes

  • Fixed structure.AtomArray.chain_id having the chain ID restricted to 4 characters. (#643)
  • Fixed corrupted category parsing in structure.io.pdbx.CIFFile, when a multiline value contains a quote character. (#651)
  • Fixed duplicate bonds written to chem_comp_bond category, when include_bonds=True is set in structure.io.pdbx.set_structure(). (#653)
  • Fixed non-deterministic altloc atom selection by occupancy, if two altlocs have the same occupancy. (#649)
  • Fixed the version switcher in the documentation showing the latest version twice. (#646)

Biotite 1.0.0

27 Aug 11:08
e77f8c4
Compare
Choose a tag to compare

Changelog

Additions

  • Support for Numpy 2.0 (#529)
    • 1.x versions are still compatible.
  • Trajectory file interfaces in structure.io do not require mdtraj as extra dependency anymore (#627).
    • Instead the much smaller biotraj package is now a mandatory dependency of Biotite.
  • New documentation website (#552).
  • Improved performance of multiple auxiliary methods in sequence.Alphabet.

Changes

  • sequence.graphics uses flower color scheme as default instead of rainbow. (#617).
    • It represents similarity of amino acids better.
  • structure.io.pdbx.get_sequence() returns dict mapping chain IDs to sequences (#611).
  • Previously deprecated functionality was removed. (#624)
    • read() instance method of File classes: Use read() class method instead.
    • temp_file() and temp_dir(): Use corresponding functionality from tempfile instead.
    • application.viennarna.RNAfoldApp.get_mfe(): Use application.viennarna.RNAfoldApp.get_free_energy() instead.
    • atom_mask parameter of structure.connect_via_distances() and structure.connect_via_residue_names(): Filter the atoms before instead.
    • Support for Alignment objects as input to sequence.graphics.plot_sequence_logo(): Input a Profile instead.
    • sequence.io.fastq.FastqFile.get_sequence(): Use sequence.io.fastq.FastqFile.get_seq_string() or sequence.io.fastq.get_sequence() instead.
    • structure.filter_backbone(): Use structure.filter_peptide_backbone() instead.
    • structure.check_id_continuity(): Use structure.check_res_id_continuity() instead.
    • structure.check_bond_continuity(): Use structure.check_backbone_continuity() instead.
    • structure.renumber_atom_ids(): Set the atom_id annotation with numpy.arange() instead.
    • structure.renumber_res_ids(): Use structure.create_continuous_res_ids() instead.
    • chain_id parameter of structure.annotate_sse(): Filter the AtomArray before instead.
    • structure.superimpose_apply(): Use structure.AffineTransformation.apply() instead.
    • structure.io.read_structure_from_ctab() and structure.io.write_structure_to_ctab(): Use corresponding functions from structure.io.mol.
    • structure.io.mol.MolFile.get_header() and structure.io.mol.MolFile.set_header(): Use the header attribute instead.
    • structure.io.npz: Internal .npz format is not used anymore.
    • structure.io.pdbx.PDBxFile: Use structure.io.pdbx.CIFFile instead.
    • structure.io.mmtf: .mmtf was superseded by .bcif accessible with structure.io.pdbx.BinaryCIFFile.

Fixes

  • Fixed compilation warnings about deprecated NumPy API. (#626)
  • Fixed structure.BondList sometime discarding bonds after merging to bond lists (#618).
  • Fixed incorrect handling of quotes when reading and writing a structure.io.pdbx.CIFFile (#619).

Biotite 0.41.2

28 Jun 08:53
270f2d6
Compare
Choose a tag to compare

Changelog

Fixes

  • Updated platform and tooling versions in CI. The previous configuration caused wheels to not be available for MacOS-ARM. (#603)
  • Fix Atom __repr__() (#602)
  • Fix artifact name for source distribution (#608)
  • Fix indexing with inverse slices(#610)
  • Fix mdtraj 1.10 incompatibility (#612)

Biotite 0.41.1

17 Jun 13:42
47e0a6b
Compare
Choose a tag to compare

Changelog

Fixes

  • NumPy version is now properly restricted (#601)

Biotite 0.41.0

10 Jun 08:01
c29422e
Compare
Choose a tag to compare

Changelog

Additions

  • Improved MOL/SDF file support in biotite.structure.mol
    • CTAB V3000 blocks can be read and written in addition to V2000 in MOLFile (#575)
    • M CHG lines in CTAB V2000 block can can be read and written in MOLFile (#589)
    • Added biotite.structure.SDFile for full support of SD files (#589)
      • SD files with multiple records (i.e. multiple molecules) ca be read and written
      • Metadata in SD files can be read and written
  • Intra-residue bonds can now be read/written to CIF/BinaryCIF files in biotite.structure.io.pdbx (#567)
    The bonds are written to the chem_comp_bond category, if include_bonds=True in set_structure()
    • Previously Intra-residue bonds were obtained from the Chemical Component Dictionary which only works for residues in the PDB
  • Added repair functions for AtomArray objects with missing or irregular annotations
    • structure.create_continuous_res_ids() renumbers residue IDs to make them continuous for each chain (#576)
    • structure.infer_elements() guesses chemical elements from atom names in case the element annotation is missing (#576)
    • structure.create_atom_names() names atoms based on their element in case the atom_name annotation is missing (#581)
  • Canonical amino acids/nucleotides can be found for arbitrary residues
    • structure.info.one_letter_code() obtains the most appropriate one-letter code (if existing) for a residue name, based on information from the Chemical Component Dictionary (#572)
    • structure.to_sequence() converts an AtomArray into a Sequence based on codes obtained via structure.info.one_letter_code() (#587)
  • Added new superimposition functions (#587)
    • structure.superimpose_without_outliers() allows superimposition with iterative conformational outlier removal to decrease the RMSD of the remaining atoms
    • structure.superimpose_homologs() finds corresponding atoms via sequence alignment and optional outlier removal
      • This function is quite robust for simply superimposing homologous proteins/nucleic acids without the need of atom filtering
    • structure.AffineTransform can be converted into a 4x4 transformation matrix containing both, translation and rotation (#576)
  • sequence.align.write_alignment_to_cigar() now includes terminal gaps in the segment sequence (usually the shorter sequence) in the CIGAR string, if include_terminal_gaps is set to True (#563)
    • The default behavior is unchanged

Changes

  • Deprecated get_header() and set_header() in biotite.structure.io.mol.MOLFile
    • MOLFile.header attribute should be used instead (#589)
  • Deprecated structure.renumber_atom_ids() and structure.renumber_res_ids()
    • renumber_res_ids() can be substituted with create_continuous_res_ids()

Fixes

  • Fixed parsing multi-line values in PDBx NextGen files in structure.io.pdbx.CIFFile (#555)
  • Fixed parsing of some operation expressions in structure.io.pdbx.get_assembly() (#555)
  • structure.io.pdb.PDBFile.set_structure() checks if input annotations exceed the fixed number of columns, preventing writing malformed PDB files (#588)
  • Trying to write malformed CIF files with categories containing no rows now raises an exception (#586)
  • Added more descriptive error message, if structure.residue() cannot find the requested residue (#580)
  • sequence.io.fasta.get_sequence() converts pyrrolyine (O) into lysine (K) when creating the Sequence object (#587)
  • Fixed indexing with an Annotation in sequence.AnnotatedSequence if annotation is on the minus strand (#577)
  • Fixed exception in biotite.structure.base_pairs() and biotite.structure.dot_bracket_from_structure() if no base paris were found (#573)

Biotite 0.40.0

02 Apr 18:18
4b0355a
Compare
Choose a tag to compare

Changelog

Additions

  • Refactored struc.superimpose() (#526)
    • Multiple fixed models are allowed
    • Increased performance for multiple models
  • Support for BinaryCIF file format (#531)
    • Added 'bcif' format to database.rcsb.fetch()
    • Added structure.io.pdbx.BinaryCIFFile to parse BinaryCIF files
    • Added structure.io.pdbx.CIFFile to parse CIF files with analogous API to BinaryCIFFile
    • High-level PDBx API (get_structure(), get_assembly(), etc.) supports these new file classes
    • Added include_bondsparameter to structure.io.pdbx.get_structure() and structure.io.pdbx.get_assembly() to parse bond information from file
  • Refactored structure.info subpackage (#540)
    • Decreased initial loading time when package is imported
    • The component dataset is now stored as compressed BinaryCIF decreasing the Biotite package size
    • The component dataset is updated to the current version, i.e. the latest chemical components from the wwPDB are included
    • The project now contains the setup_ccd.py script, enabling the user to get an up-to-date version of the component dataset

Changes

  • Removed structure.info.bond_order() and structure.info.bond_dataset (#540)
  • struc.superimpose returns now an AffineTransformation object instead of a transformation tuple (#526)
    • superimpose_apply() is deprecated in favor of AffineTransformation.apply()
  • structure.io.pdbx.PDBxFile is deprecated and superseded by CIFFile (#531)
  • structure.io.mmtf is deprecated and superseded by BinaryCIFFile (#531)

Fixes

  • Handle invalid CRYST1 records in PDB files correctly (#523)
  • Ensure that NumPy 1.x is used (#537)
    • Support for 2.x will be added in the future

Biotite 0.39.0

05 Jan 17:04
123e533
Compare
Choose a tag to compare

Changelog

Additions

  • Add build for Python 3.12 (#513)
  • Added modern fast k-mer subsetting methods to sequence.align (#510)
    • These include:
      • MinimizerSelector
      • SyncmerSelector
      • CachedSyncmerSelector
      • MincodeSelector
    • The following k-mer ordering methods are available:
      • RandomPermutation
      • FrequencyPermutation
    • Added BucketKmerTable to support indexing of long k-mers with reasonable memory consumption
  • Support conversion of biotite.sequence.align.Alignment from/to CIGAR strings (#516)
    • read_alignment_from_cigar()
    • write_alignment_to_cigar()
  • Added sequence.graphics.plot_alignment_array() (#485)
  • Support new 5-character residue names in structures from PDB (#512)
  • Support NCBI API keys in database.entrez to increase download limits (#514)
  • Increased performance of application.sra(#504).
    • prefetch is called before fasterq_dump, as suggested here
    • FastaDumpApp is added, which decreases computation time by writing as FASTA instead of a FASTQ file, which omits the scores

Changes

  • application.sra.FastaDumpApp.get_sequences() now only returns sequence (#504) strings and not scores anymore (#504)
    • Use get_sequences_and_scores() instead

Fixes

  • Fixed memory leak in sequence.align.KmerTable.from_tables() (#510)
  • Fixed problems of plotting functionalities with recent Matplotlib versions (#518)

Biotite 0.38.0

09 Sep 17:27
Compare
Choose a tag to compare

Changelog

Additions

  • Faster k-mer decomposition in sequence.align.KmerAlphabet.create_kmers() (#475)
  • Sequence type can be set when reading sequences and alignments using sequence.io.fasta ( #478)

Fixes

  • Fixed error that appeared when indexing an sequence.AnnotatedSequence with a slice (#479)
  • Fixed reading MOL/SDF files with more than 100 bonds (#480)
  • Fixed compilation of Biotite with Cython 3.x (#493)
  • Fixed usage of box parameter in structure.rdf() (#494)

Biotite 0.37.0

13 May 13:17
7fc6a2e
Compare
Choose a tag to compare

Changelog

Additions

  • Added PubChem database interface with database.pubchem (#472)
    • Analogous to the other database subpackages, it supports, search() and fetch()
    • fetch_property() can be used to quickly obtain a wide range of properties for a given list of compound IDs
    • Automatic throttle control ensures that the PubChem usage control is obeyed
  • Extended functionality for database.rcsb.search() and database.rcsb.count() (#466):
    • Added support for computational structures (e.g. from Alphafold DB) via the content_types parameter
    • Added support for grouping via the new group_by and return_groups parameters
      • the type of grouping is selected via Grouping subclasses
    • Added support for ascending sorting with the Sorting class
  • database.entrez.search() now also accepts the common database name in addition to the E-utility database name (#471)
    • This is now consistent with the behavior in database.entrez.fetch()
  • Added structure.io.pdb.PDBFile.get_b_factor() analogous to structure.io.pdb.PDBFile.get_coord() (#469)
  • Added structure.io.pdbx.get_component() and set_component() (#468)
    • Allows getting/setting chemical components from/to PDBx files via their chem_comp group of categories instead of atom_site

Changes

  • Deprecate atom_mask parameter in structure.connect_via_residue_names() and structure.connect_via_distances() (#474)
    • It has no effect anymore
  • In structure.BondList.merge() the BondList given as parameter takes precedence, if both BondLists contain the same bond with different BondType (#473)
    • Previously it was the other way round
  • The BondList returned by structure.io.pdb.PDBFile.get_structure() (if include_bonds is True) gives appropriate BondTypes, if they can be determined using the CCD (#473)
    • Otherwise the BondType is BondType.ANY
    • Previously it was BondType.ANY for all bonds
  • Refactored structure.remove_pbc()(#460)
    • PCB removal is conducted for each molecule separately
    • Not the first atom but the centroid of a molecule is placed within the box
    • The selection can only be a boolean matrix

Fixes

  • Fixed a bug in structure.connect_via_distances() and structure.connect_via_residue_names() that allowed unexpected bonds between polymer and non-polymer residues (#473)