Releases: biotite-dev/biotite
Releases · biotite-dev/biotite
Biotite 1.1.0
Changelog
Additions
- Support for Python 3.13 (#661).
- Support for structural alphabets that encodes geometric information from
structure.AtomArray
objects insequence.Sequence
objects. - Support for positional substitution matrices. (#655)
sequence.PositionalSequence
acts as a placeholder sequence for a sequence profile in alignment functions fromsequence.align
.sequence.align.SubstitutionMatrix.as_positional()
expands a substitution matrix into a positional substitution matrix.
- New functionalities for
structure.io.pdbx
:- Custom annotations can be written with
set_structure()
withextra_fields=True
. (#669) - Missing columns in
atom_site
category are now handled by using sensible default annotations. (#670) - Added
compress()
function that automatically finds optimal encodings forBinaryCIFFile
, decreasing the file size by a factor of approx 8x. (#674) get_assembly()
now adds asym_id
annotation to better distinguish copies of the asymmetric unit. (#700)
- Custom annotations can be written with
- Added
PDBFile.get_space_group()
andPDBFile.set_space_group()
to read and write the space group information. (#707) - Added
structure.concatenate()
, which supports concatenating more thanAtomArrayStack
objects. (#712) - Performance improvements:
Changes
structure.io.pdbx.set_structure()
does not write 'canonical' bonds tostruct_conn
category anymore. (#678)- The internal Chemical Component Dictionary is not version controlled anymore. (#687, #716)
- For installations from local repository clones, it must be built with
python -m biotite.setup_ccd
.
- For installations from local repository clones, it must be built with
Fixes
structure.io.pdbx
can now handleatom_site
categories with quoted values containing whitespaces. (#673)nan
values instructure.AtomArray
now count as equal when comparingAtomArray
objects. (#714)- Fixed wrong band calculation in
sequence.align.align_banded()
that occurred when the given band was outside the sequence bounds. (#723)- This lead to premature traceback termination in rare cases.
Biotite 1.0.1
Changelog
Fixes
- Fixed
structure.AtomArray.chain_id
having the chain ID restricted to 4 characters. (#643) - Fixed corrupted category parsing in
structure.io.pdbx.CIFFile
, when a multiline value contains a quote character. (#651) - Fixed duplicate bonds written to
chem_comp_bond
category, wheninclude_bonds=True
is set instructure.io.pdbx.set_structure()
. (#653) - Fixed non-deterministic altloc atom selection by occupancy, if two altlocs have the same occupancy. (#649)
- Fixed the version switcher in the documentation showing the latest version twice. (#646)
Biotite 1.0.0
Changelog
Additions
- Support for Numpy
2.0
(#529)1.x
versions are still compatible.
- Trajectory file interfaces in
structure.io
do not requiremdtraj
as extra dependency anymore (#627).- Instead the much smaller
biotraj
package is now a mandatory dependency of Biotite.
- Instead the much smaller
- New documentation website (#552).
- Improved performance of multiple auxiliary methods in
sequence.Alphabet
.
Changes
sequence.graphics
usesflower
color scheme as default instead ofrainbow
. (#617).- It represents similarity of amino acids better.
structure.io.pdbx.get_sequence()
returns dict mapping chain IDs to sequences (#611).- Previously deprecated functionality was removed. (#624)
read()
instance method ofFile
classes: Useread()
class method instead.temp_file()
andtemp_dir()
: Use corresponding functionality fromtempfile
instead.application.viennarna.RNAfoldApp.get_mfe()
: Useapplication.viennarna.RNAfoldApp.get_free_energy()
instead.atom_mask
parameter ofstructure.connect_via_distances()
andstructure.connect_via_residue_names()
: Filter the atoms before instead.- Support for
Alignment
objects as input tosequence.graphics.plot_sequence_logo()
: Input aProfile
instead. sequence.io.fastq.FastqFile.get_sequence()
: Usesequence.io.fastq.FastqFile.get_seq_string()
orsequence.io.fastq.get_sequence()
instead.structure.filter_backbone()
: Usestructure.filter_peptide_backbone()
instead.structure.check_id_continuity()
: Usestructure.check_res_id_continuity()
instead.structure.check_bond_continuity()
: Usestructure.check_backbone_continuity()
instead.structure.renumber_atom_ids()
: Set theatom_id
annotation withnumpy.arange()
instead.structure.renumber_res_ids()
: Usestructure.create_continuous_res_ids()
instead.chain_id
parameter ofstructure.annotate_sse()
: Filter theAtomArray
before instead.structure.superimpose_apply()
: Usestructure.AffineTransformation.apply()
instead.structure.io.read_structure_from_ctab()
andstructure.io.write_structure_to_ctab()
: Use corresponding functions fromstructure.io.mol
.structure.io.mol.MolFile.get_header()
andstructure.io.mol.MolFile.set_header()
: Use theheader
attribute instead.structure.io.npz
: Internal.npz
format is not used anymore.structure.io.pdbx.PDBxFile
: Usestructure.io.pdbx.CIFFile
instead.structure.io.mmtf
:.mmtf
was superseded by.bcif
accessible withstructure.io.pdbx.BinaryCIFFile
.
Fixes
Biotite 0.41.2
Biotite 0.41.1
Biotite 0.41.0
Changelog
Additions
- Improved MOL/SDF file support in
biotite.structure.mol
- CTAB
V3000
blocks can be read and written in addition toV2000
inMOLFile
(#575) M CHG
lines in CTABV2000
block can can be read and written inMOLFile
(#589)- Added
biotite.structure.SDFile
for full support of SD files (#589)- SD files with multiple records (i.e. multiple molecules) ca be read and written
- Metadata in SD files can be read and written
- CTAB
- Intra-residue bonds can now be read/written to CIF/BinaryCIF files in
biotite.structure.io.pdbx
(#567)
The bonds are written to thechem_comp_bond
category, ifinclude_bonds=True
inset_structure()
- Previously Intra-residue bonds were obtained from the Chemical Component Dictionary which only works for residues in the PDB
- Added repair functions for
AtomArray
objects with missing or irregular annotationsstructure.create_continuous_res_ids()
renumbers residue IDs to make them continuous for each chain (#576)structure.infer_elements()
guesses chemical elements from atom names in case theelement
annotation is missing (#576)structure.create_atom_names()
names atoms based on their element in case theatom_name
annotation is missing (#581)
- Canonical amino acids/nucleotides can be found for arbitrary residues
structure.info.one_letter_code()
obtains the most appropriate one-letter code (if existing) for a residue name, based on information from the Chemical Component Dictionary (#572)structure.to_sequence()
converts anAtomArray
into aSequence
based on codes obtained viastructure.info.one_letter_code()
(#587)
- Added new superimposition functions (#587)
structure.superimpose_without_outliers()
allows superimposition with iterative conformational outlier removal to decrease the RMSD of the remaining atomsstructure.superimpose_homologs()
finds corresponding atoms via sequence alignment and optional outlier removal- This function is quite robust for simply superimposing homologous proteins/nucleic acids without the need of atom filtering
structure.AffineTransform
can be converted into a 4x4 transformation matrix containing both, translation and rotation (#576)
sequence.align.write_alignment_to_cigar()
now includes terminal gaps in the segment sequence (usually the shorter sequence) in the CIGAR string, ifinclude_terminal_gaps
is set toTrue
(#563)- The default behavior is unchanged
Changes
- Deprecated
get_header()
andset_header()
inbiotite.structure.io.mol.MOLFile
MOLFile.header
attribute should be used instead (#589)
- Deprecated
structure.renumber_atom_ids()
andstructure.renumber_res_ids()
renumber_res_ids()
can be substituted withcreate_continuous_res_ids()
Fixes
- Fixed parsing multi-line values in PDBx NextGen files in
structure.io.pdbx.CIFFile
(#555) - Fixed parsing of some operation expressions in
structure.io.pdbx.get_assembly()
(#555) structure.io.pdb.PDBFile.set_structure()
checks if input annotations exceed the fixed number of columns, preventing writing malformed PDB files (#588)- Trying to write malformed CIF files with categories containing no rows now raises an exception (#586)
- Added more descriptive error message, if
structure.residue()
cannot find the requested residue (#580) sequence.io.fasta.get_sequence()
converts pyrrolyine (O
) into lysine (K
) when creating theSequence
object (#587)- Fixed indexing with an
Annotation
insequence.AnnotatedSequence
if annotation is on the minus strand (#577) - Fixed exception in
biotite.structure.base_pairs()
andbiotite.structure.dot_bracket_from_structure()
if no base paris were found (#573)
Biotite 0.40.0
Changelog
Additions
- Refactored
struc.superimpose()
(#526)- Multiple
fixed
models are allowed - Increased performance for multiple models
- Multiple
- Support for BinaryCIF file format (#531)
- Added
'bcif'
format
todatabase.rcsb.fetch()
- Added
structure.io.pdbx.BinaryCIFFile
to parse BinaryCIF files - Added
structure.io.pdbx.CIFFile
to parse CIF files with analogous API toBinaryCIFFile
- High-level PDBx API (
get_structure()
,get_assembly()
, etc.) supports these new file classes - Added
include_bonds
parameter tostructure.io.pdbx.get_structure()
andstructure.io.pdbx.get_assembly()
to parse bond information from file
- Added
- Refactored
structure.info
subpackage (#540)- Decreased initial loading time when package is imported
- The component dataset is now stored as compressed BinaryCIF decreasing the Biotite package size
- The component dataset is updated to the current version, i.e. the latest chemical components from the wwPDB are included
- The project now contains the
setup_ccd.py
script, enabling the user to get an up-to-date version of the component dataset
Changes
- Removed
structure.info.bond_order()
andstructure.info.bond_dataset
(#540) struc.superimpose
returns now anAffineTransformation
object instead of a transformation tuple (#526)superimpose_apply()
is deprecated in favor ofAffineTransformation.apply()
structure.io.pdbx.PDBxFile
is deprecated and superseded byCIFFile
(#531)structure.io.mmtf
is deprecated and superseded byBinaryCIFFile
(#531)- This is reflected by the RCSB announcement to deprecate MMTF
Fixes
Biotite 0.39.0
Changelog
Additions
- Add build for Python 3.12 (#513)
- Added modern fast k-mer subsetting methods to
sequence.align
(#510)- These include:
MinimizerSelector
SyncmerSelector
CachedSyncmerSelector
MincodeSelector
- The following k-mer ordering methods are available:
RandomPermutation
FrequencyPermutation
- Added
BucketKmerTable
to support indexing of long k-mers with reasonable memory consumption
- These include:
- Support conversion of
biotite.sequence.align.Alignment
from/to CIGAR strings (#516)read_alignment_from_cigar()
write_alignment_to_cigar()
- Added
sequence.graphics.plot_alignment_array()
(#485) - Support new 5-character residue names in structures from PDB (#512)
- Support NCBI API keys in
database.entrez
to increase download limits (#514) - Increased performance of
application.sra
(#504).prefetch
is called beforefasterq_dump
, as suggested hereFastaDumpApp
is added, which decreases computation time by writing as FASTA instead of a FASTQ file, which omits the scores
Changes
application.sra.FastaDumpApp.get_sequences()
now only returns sequence (#504) strings and not scores anymore (#504)- Use
get_sequences_and_scores()
instead
- Use
Fixes
Biotite 0.38.0
Biotite 0.37.0
Changelog
Additions
- Added PubChem database interface with
database.pubchem
(#472)- Analogous to the other
database
subpackages, it supports,search()
andfetch()
fetch_property()
can be used to quickly obtain a wide range of properties for a given list of compound IDs- Automatic throttle control ensures that the PubChem usage control is obeyed
- Analogous to the other
- Extended functionality for
database.rcsb.search()
anddatabase.rcsb.count()
(#466):- Added support for computational structures (e.g. from Alphafold DB) via the
content_types
parameter - Added support for grouping via the new
group_by
andreturn_groups
parameters- the type of grouping is selected via
Grouping
subclasses
- the type of grouping is selected via
- Added support for ascending sorting with the
Sorting
class
- Added support for computational structures (e.g. from Alphafold DB) via the
database.entrez.search()
now also accepts the common database name in addition to the E-utility database name (#471)- This is now consistent with the behavior in
database.entrez.fetch()
- This is now consistent with the behavior in
- Added
structure.io.pdb.PDBFile.get_b_factor()
analogous tostructure.io.pdb.PDBFile.get_coord()
(#469) - Added
structure.io.pdbx.get_component()
andset_component()
(#468)- Allows getting/setting chemical components from/to PDBx files via their
chem_comp
group of categories instead ofatom_site
- Allows getting/setting chemical components from/to PDBx files via their
Changes
- Deprecate
atom_mask
parameter instructure.connect_via_residue_names()
andstructure.connect_via_distances()
(#474)- It has no effect anymore
- In
structure.BondList.merge()
theBondList
given as parameter takes precedence, if bothBondList
s contain the same bond with differentBondType
(#473)- Previously it was the other way round
- The
BondList
returned bystructure.io.pdb.PDBFile.get_structure()
(ifinclude_bonds
isTrue
) gives appropriateBondType
s, if they can be determined using the CCD (#473)- Otherwise the
BondType
isBondType.ANY
- Previously it was
BondType.ANY
for all bonds
- Otherwise the
- Refactored
structure.remove_pbc()
(#460)- PCB removal is conducted for each molecule separately
- Not the first atom but the centroid of a molecule is placed within the box
- The
selection
can only be a boolean matrix
Fixes
- Fixed a bug in
structure.connect_via_distances()
andstructure.connect_via_residue_names()
that allowed unexpected bonds between polymer and non-polymer residues (#473)