Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased#
v3.6.3-post1 - 2025-03-04#
Fixed#
Extra key in
pyproject.toml
causing build issues with version0.11.0
ofscikit-build-core
.
v3.6.3 - 2024-11-04#
Fixed#
Dynamic dispatch to NEON connection scorer on Aarch64 MacOS.
v3.6.2 - 2024-11-03#
Fixed#
pyrodigal
console script not being installed.
v3.6.1 - 2024-11-03#
Added#
Compilation of the connection scoring code for AVX-512.
Fixed#
Import issue on platforms without AVX2 runtime support.
Missing metadata in
pyproject.toml
.
v3.6.0 - 2024-11-02#
Added#
Support for Python 3.13.
Changed#
Reorganize project to build with CMake and
scikit-build-core
.Build separate Python modules for various SIMD implementations to avoid potential linking issues.
Fixed#
Pointer dereference issue when calling
TrainingInfo.load
in PyPI or with objects missing areadinto
method.
Removed#
Support for Python 3.6.
v3.5.2 - 2024-09-04#
Added#
Warning in CLI when given sequences with empty identifiers.
Fixed#
FASTA parser used in CLI crashing on empty header lines (#61).
v3.5.1 - 2024-07-17#
Fixed#
Outdated code in
pyrodigal.cli
breaking the CLI.
v3.5.0 - 2024-07-17 - YANKED#
Added#
Changed#
Migrate documentation to
pydata-sphinx-theme
.
Fixed#
Cython warnings with unused
except *
statements inMetagenomicBins
.Signatures of
__init__
methods missing from all Cython types after thev3.0
update.Small typos in documentation.
v3.4.1 - 2024-05-23#
Changed#
Refactor SIMD code to reduce number of required registers, and improve SSE2 performance.
Refactor Prodigal initialization functions into sparse initializer code to reduce library size.
v3.4.0 - 2024-05-19#
Added#
strict
argument toGene.translate
to control translation of ambiguous codons with unambiguous translation (#54).strict_translation
argument toGenes.write_genbank
andGenes.write_translation
.Support for translation tables 26 to 33 in
Gene.translate
.Support for translation tables 26, 29, 30, 32 and 33 in
GeneFinder.train
.Genes.score
property to count the total score of all extracted genes.full_id
parameter toGenes.write_gff
,Genes.write_translation
andGenes.write_genes
to control theID
field written for each gene (#53).
Changed#
Gene.translate
now raises a warning when called with a translation table incompatible with the training info.
Fixed#
Bug in code for masking trailing nucleotides (#55).
v3.3.0 - 2024-01-24#
Added#
Changed#
Scorer
internal API to separate connection scoring and overlap disentangling.
Fixed#
Bug with computation of minimum node in connection scoring loop (hyattpd/Prodigal#108).
Out-of-bounds sequence access in
_shine_dalgarno_exact
and_shine_dalgarno_mm
methods ofSequence
.Memory leak in
Nodes.__setstate__
caused by incorrect reallocation.
v3.2.2 - 2024-01-21#
Fixed#
Always mark SSE2 support on x86-64 CPUs independently of
archspec
-detected features (#49).
v3.2.1 - 2023-11-27#
Added#
Option to change argument parser in
pyrodigal.cli.main
.
v3.2.0 - 2023-11-27#
Added#
AVX-512 implementation of the SIMD pre-filter.
Additional support for reading
lz4
andxz
andzstd
-compressed input in the CLI.Option to change gene finder type in
pyrodigal.cli.main
.
v3.1.1 - 2023-11-06#
Fixed#
Incorrect unpickling of
GeneFinder
causing crashes with multiprocessing (#46).
v3.1.0 - 2023-10-22#
Added#
Support for Python 3.12.
min_mask
argument toGeneFinder
to control the minimum lenght of masked regions onmask=True
.
v3.0.1 - 2023-09-27#
Fixed#
Genes.write_scores
andGenes.write_gff
crashing on emptyGenes
(#44).
v3.0.0 - 2023-09-17#
Added#
MetagenomicBins
collection to store a dense array ofMetagenomicBin
objects.metagenomic_bins
keyword argument toGeneFinder
allowing to control which models are used when running gene finding in meta mode (#24).metagenomic_bin
attribute toGenes
referencing the metagenomic model with which the genes were predicted, if in meta mode.Additional
TrainingInfo
properties (missing_motif_weight
,coding_statistics
).Setters for all remaining
TrainingInfo
properties.Proper
TrainingInfo
constructor with configuration option for all attributes.TrainingInfo.to_dict
method to extract all parameters from aTrainingInfo
.Genes.write_genbank
method to write a GenBank record with all predicted genes from a sequence.include_stop
flag toGene.translate
andGenes.write_translations
to allow excluding the stop codon from the translated sequence.include_translation_table
flag toGenes.write_gff
to include the translation table to the GFF attributes of each gene.gbk
output format to the Pyrodigal CLI.Sequence.unknown
property exposing the number of unknown nucleotides in the sequence.Sequence.start_probability
andSequence.stop_probability
to estimate the probability of encountering a start and a stop codon based on the GC%.
Fixed#
Genes.write_gff
not properly reporting the number of bytes written.Merge several
nogil
sections inSequence
constructor.Several Cython functions missing a
noexcept
qualifier.
Changed#
BREAKING: Rename
OrfFinder
toGeneFinder
for consistency.BREAKING: Use
memoryview
to expose allTrainingInfo
attributes instead manually building lists or tuples.Reorganize memory management of the built-in metagenomic models.
Make the internal Cython model public (
pyrodigal.lib
) to allow importing the underlying classes in other Cython projects.Use
typing.Literal
for allowed translation table values inpyrodigal.lib
annotationsCache intermediate log-odds in
Nodes._raw_coding_score
to reduce calls topow
andlog
functions.Inline connection scoring functions to reduce function call overhead.
Reorganize
struct _node
fields to reduce size in memory.Make
GeneFinder.find_genes
andGeneFinder.train
reserve memory for theNodes
based on the GC% of the input sequence.Avoid storing temporary results in the generic implementation of
ConnectionScorer.compute_skippable
.Use Cython
freelist
for allocatingNode
,Gene
,MetagenomicBin
andMask
.Increase minimum allocation for
Genes
andNodes
to reduce early reallocations.
Removed#
BREAKING:
metagenomic_bin
attribute ofTrainingInfo
.
v2.3.0 - 2023-07-20#
Changed#
Bump Cython to
v3.0.0
.
v2.2.0 - 2023-06-19#
Changed#
Release GIL while masking sequence regions in
Sequence.__init__
.Use
archspec
instead ofcpu_features
for runtime feature detection.
Added#
Support for reading
gzip
andbz2
-compressed input in the CLI.CLI flag to run ORF detection in parallel when input contains several contigs.
Removed#
Support for Python 3.5.
v2.1.0 - 2023-02-20#
Changed#
Update Prodigal to
v2.6.3+c1e2d36
to fix a bug with Shine-Dalgarno detection on reverse contig edge (hyattpd/Prodigal#100).
Added#
Fixed#
ArchLinux User Repository package generation in CI.
v2.0.4 - 2023-01-09#
Fixed#
GC% computation and RBS scoring for reverse strand nodes close to the contig edge (#27).
v2.0.3 - 2022-12-20#
Fixed#
OrfFinder(mask=True)
ignoring the minimum mask size when masking regions (#26).
Changed#
Use
cibuildhweel
for building wheel distributions.
Added#
Wheels for MacOS Aarch64 platforms.
v2.0.2 - 2022-11-01#
Fixed#
Syntax issue in Cython files failing build on Bioconda runner.
v2.0.1 - 2022-11-01#
Fixed#
Syntax issue in Cython files failing build on some environments.
v2.0.0 - 2022-11-01#
Added#
MMX implementation of the SIMD prefilter.
Proper GFF headers and metadata section to GFF output.
Sequence.gc_frame_plot
method to compute the max GC frame profile from Python.metagenomic_bin
property toTrainingInfo
to support recovering the object corresponding to a pre-trained model.meta
attribute toGenes
to store whether genes were predicted in single or in meta mode.pyrodigal.PRODIGAL_VERSION
constant storing the wrapped Prodigal version.pyrodigal.MIN_SINGLE_GENOME
andpyrodigal.IDEAL_SINGLE_GENOME
constants storing the minimum and recommended sequence sizes for training.
Changed#
Make all write methods of
Genes
objects require asequence_id
argument instead of using the internal sequence number.Rewrite SIMD prefilter using a generic template with C macros.
Make
Mask
record coordinates in start-inclusive end-exclusive mode to follow Python conventions.Make connection scoring tests only score some randomly selected node pairs for faster runs.
Rewrite tests to use
importlib.resources
for managing test data.
Removed#
from_bytes
andfrom_string
constructors ofSequence
objects.
Fixed#
Duplicate extraction of start codons located on contig edges inside
Nodes._extract
(#21).Pickling and unpickling of
TrainingInfo
objects corresponding to pre-trained models.Implementation of
calc_most_gc_frame
being inconsistent with the Prodigal implementation.Implementation of the maximum search in
score_connection_forward_start
not following the (weird?) behaviour from Prodigal (#21).Gene identifier being used instead of the sequence identifier in the GFF output (#18).
Out of bound access to sequence data in
Sequence._shine_dalgarno_mm
andSequence._shine_dalgarno_exact
.
v1.1.2 - 2022-08-31#
Changed#
Use the
vbicq
Arm intrinsic in the NEON implementation to combinevandq
andvmvnq
.
Fixed#
Prevent direct instantiation of
Node
andGene
objects from Python code.Configuration of platform-specific NEON flags in
setup.py
not being applied to the linker.
v1.1.1 - 2022-07-08#
Fixed#
Some
cpu_features
source files not being included in source distribution.
v1.1.0 - 2022-06-09#
Changed#
OrfFinder.train
can now be given more than one sequence argument to train on contigs from an unclosed genome.Updated
cpu_features
tov0.7.0
and added hardware detection of NEON features on Linux Aarch64 platforms.
v1.0.2 - 2022-05-13#
Fixed#
Detection of Arm64 platform in
setup.py
(#16).
v1.0.1 - 2022-04-28#
Changed#
pyrodigal.cli
now concatenates training sequences the same way as Prodigal does.
v1.0.0 - 2022-04-20#
Stable version, to be published in the Journal of Open-Source Software.
Added#
pickle
protocol implementation forNodes
,TrainingInfo
,OrfFinder
,Sequence
,Masks
andGenes
objects.Buffer protocol implementation for
Sequence
, allowing access to raw digits.__eq__
and__repr__
magic methods toMask
objects.
Changed#
Optimized code used for region masking to avoid searching for the same mask repeatedly.
TRANSLATION_TABLES
andMETAGENOMIC_BINS
are now exposed as constants in the toppyrodigal
module.Refactored connection scoring into different functions based on the type (start/stop) and strand (direct/reverse) of the node being scored.
Changed the growth factor for dynamic arrays to be the same as the one used in CPython
list
buffers.
v0.7.3 - 2022-04-06#
Added#
Gene.score
property to get the gene score as reported in the score data string.
Fixed#
OrfFinder.find_genes
not producing consistent results across runs in meta mode (#13).OrfFinder.find_genes
returningNodes
with incomplete score information.
v0.7.2 - 2022-03-15#
Changed#
Improve performance of
mer_ndx
andscore_connection
using dedicated implementations with better branch prediction.Mark arguments as
const
in C code where possible.
Fixed#
Signatures of Cython classes not displaying properly because of the
embedsignature
directive._sequence.h
functions not being inlined as expected.
v0.7.1 - 2022-03-14#
Changed#
Rewrite internal
Sequence
code using inlined functions to increase performance when the strand is known.
Fixed#
Nodes.copy
potentially failing on empty collections after trying to allocate 0 bytes.TestGenes.test_write_scores
failing on some machines because of float rounding issues.Gene.translate
ignoring theunknown_residue
argument value and always using"X"
.Memory leak in
Pyrodigal.train
cause by memory not being freed after building the GC frame plot.
v0.7.0 - 2022-03-12#
Added#
Support for setting a custom minimum gene length in
pyrodigal.OrfFinder
.Genes.write_scores
method to write the node scores to a file.Gene.__repr__
andNode.__repr__
methods to display some useful attributes.Sequence.__str__
method to get back a nucleotide string from aSequence
object.
Changed#
Use a more compact data structure to store
Gene
data.
Fixed#
Nodes._calc_orf_gc
reading nucleotides after the sequence end when computing GC content for edge nodes.
Removed#
pyrodigal.Pyrodigal
class (usepyrodigal.OrfFinder
instead).pyrodigal.Predictions
class (functionality merged intopyrodigal.Genes
).
v0.6.4 - 2021-12-23#
Added#
load
anddump
methods toTrainingInfo
for storing and loading a raw training info structure.Support for creating an
OrfFinder
pre-configured with a training info.-t
and-n
flags to the CLI.
v0.6.3 - 2021-12-23#
Added#
pyrodigal
command line script exposing a CLI mimicking the originalprodigal
binary.write_gff
,write_genes
andwrite_translations
methods topyrodigal.Predictions
to write the predictions results to a file in different formats.Implementation for masking regions of unknown nucleotides in input sequences.
Changed#
Renamed
pyrodigal.Pyrodigal
class topyrodigal.OrfFinder
.
Fixed#
setup.py
build different SIMD implementations with the same set of feature flags, causing compilers to re-optimize the SIMD implementations.
v0.6.2 - 2021-09-25#
Added#
Sphinx documentation with small install guide and API reference.
Fixed#
setup.py
not detecting SSE2 and AVX2 build support because of a linker error.
Changed#
Build OSX extension without AVX2 support since runtime detection of AVX2 to avoid the
Illegal Instruction: 4
bug on older CPUs.
v0.6.1 - 2021-09-24#
Fixed#
Source distribution lacking C files necessary for building
cpu_features
.
v0.6.0 - 2021-09-23#
Added#
SIMD code to build an index of which connections can be skipped when scoring node connections in the dynamic programming routine (#6).
v0.5.4 - 2021-09-18#
Added#
Prediction.confidence
method to compute the confidence for a prediction like reported in Prodigal’s GFF output.Prediction.sequence
method get the nucleotide sequence of a predicted gene (#4).
Changed#
Replaced internal storage of input sequences to use a byte array instead of a bitmap.
Fixed#
Extract
Prediction.gc_cont
number directly from the start node instead of the text representation to get full accuracy.Prodigal bug causing nodes on the reverse strand to always receive a penalty instead of penalizing only small ORFs (hyattpd/Prodigal#88).
v0.5.3 - 2021-09-12#
Fixed#
Prediction.translate
not translating the last unknown codon properly for genes on the direct strand.
v0.5.2 - 2021-09-11#
Changed#
Make
Pyrodigal.train
return a reference to the newly createdTrainingInfo
for inspection if needed.Reimplement
add_nodes
andadd_genes
to use a growable array instead of counting and pre-allocating the C arrays.
Fixed#
Inconsistent handling of unknown nucleotides in input sequences and gene translations.
v0.5.1 - 2021-09-04#
Added#
Additional
Gene
properties to access the score
Changed#
Use more efficient
PyUnicode
macros when reading or creating a string containing a nucleotide or a protein sequence.Release the GIL when creating a bitmap for an
str
given as input toPyrodigal.find_genes
.Release the GIL when creating the protein sequence returned by
Gene.translate
.
Fixed#
Pyrodigal.find_genes
andGene.translate
not behaving like Prodigal when handling sequences with unknown nucleotides.
v0.5.0 - 2021-06-15#
Added#
pyrodigal.TrainingInfo
class exposing variables obtained during training as an attribute toPyrodigal
,Gene
andGenes
instance.Support for passing objects implementing the buffer protocol to
Pyrodigal.find_genes
andPyrodigal.train
instead of requiringstr
sequences.
Fixed#
Potential data race on training info in case a
Gene.translate
with a non-default translation table was being translated at the same time as aPyrodigal.find_genes
call.Spurious handling of Unicode strings causing potential issues on platform using a different base encoding.
v0.4.7 - 2021-04-09#
Fixed#
Pyrodigal.find_genes
segfaulting on some sequences when called insingle
mode (#2).MemoryError
potentially not being properly raised on allocation issues for sequence bitmaps.
v0.4.6 - 2021-03-05#
Changed#
Tests are now in the
pyrodigal.tests
module and can be run after a site install.
Fixed#
Pyrodigal.find_genes
stalling on sequences shorter than 3 nucleotides.
v0.4.5 - 2021-03-03#
Fixed#
Compilation of OSX and Windows wheels.
v0.4.4 - 2021-03-03#
Fixed#
Mark package as OS-independent.
Added#
Support for Python 3.5.
Compilation of PyPy wheels on OSX.
v0.4.3 - 2021-03-01#
Fixed#
Buffer overflow when running in
meta
mode on a sequence too small to have any dynamic programming nodes.
v0.4.2 - 2021-02-07#
Fixed#
Buffer overflow coming from the node array, caused by an incorrect estimation of the node count from the sequence length.
v0.4.1 - 2021-01-07#
Removed#
Python 3.5 from the project metadata (the code was only compatible with Python 3.6+ already because of f-strings).
Fixed#
Broken linking of static
libprodigal
against the_pyrodigal
extension on some OSX environments (bioconda/bioconda-recipes#25568).
v0.4.0 - 2021-01-06#
Changed#
trans_table
keyword argument toPyrodigal.train
has been renamed totranslation_table
.
Added#
Option to change the translation table to any allowed number in
Gene.translate
(#1).
v0.3.2 - 2020-11-27#
Fixed#
Broken compilation of PyPy wheels in Travis-CI.
v0.3.1 - 2020-11-27#
Added#
Link to Zenodo record in
README.md
.Typing :: Typed
classifier to the PyPI metadata.Explicit support for Python 3.9.
Changed#
Streamlined compilation process when building from source distribution.
v0.3.0 - 2020-09-07#
Added#
Thread-safety for all
Pyrodigal
methods
Fixed#
Reduced total amount of memory used to allocated dynamic programming nodes for a given sequence.
v0.2.4 - 2020-09-04#
Added#
Precompiled wheels for Windows x86-64 platform.
Changed#
Compilation of large
Prodigal/training.c
file is now done in chunks and usesstatic const
to reduce build time.
v0.2.3 - 2020-08-09#
Fixed#
Buffer overflow issue with Pyrodigal in
closed=False
mode.
v0.2.2 - 2020-07-14#
Added#
Access to the translation table of a
Gene
object.
v0.2.1 - 2020-05-29#
Fixed#
Memory issues causing PyPy to crash when using
Pyrodigal
in single mode.
v0.2.0 - 2020-05-28#
Added#
Support for Prodigal’s single mode.
v0.1.1 - 2020-04-30#
Added#
Distribution of CPython wheels for ManyLinux2010 and OSX platforms.
v0.1.0 - 2020-04-27#
Initial release.