Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased#
v3.6.2 - 2024-11-03#
Fixed#
pyrodigalconsole script not being installed.
v3.6.1 - 2024-11-03#
Added#
Compilation of the connection scoring code for AVX-512.
Fixed#
Import issue on platforms without AVX2 runtime support.
Missing metadata in
pyproject.toml.
v3.6.0 - 2024-11-02#
Added#
Support for Python 3.13.
Changed#
Reorganize project to build with CMake and
scikit-build-core.Build separate Python modules for various SIMD implementations to avoid potential linking issues.
Fixed#
Pointer dereference issue when calling
TrainingInfo.loadin PyPI or with objects missing areadintomethod.
Removed#
Support for Python 3.6.
v3.5.2 - 2024-09-04#
Added#
Warning in CLI when given sequences with empty identifiers.
Fixed#
FASTA parser used in CLI crashing on empty header lines (#61).
v3.5.1 - 2024-07-17#
Fixed#
Outdated code in
pyrodigal.clibreaking the CLI.
v3.5.0 - 2024-07-17 - YANKED#
Added#
Changed#
Migrate documentation to
pydata-sphinx-theme.
Fixed#
Cython warnings with unused
except *statements inMetagenomicBins.Signatures of
__init__methods missing from all Cython types after thev3.0update.Small typos in documentation.
v3.4.1 - 2024-05-23#
Changed#
Refactor SIMD code to reduce number of required registers, and improve SSE2 performance.
Refactor Prodigal initialization functions into sparse initializer code to reduce library size.
v3.4.0 - 2024-05-19#
Added#
strictargument toGene.translateto control translation of ambiguous codons with unambiguous translation (#54).strict_translationargument toGenes.write_genbankandGenes.write_translation.Support for translation tables 26 to 33 in
Gene.translate.Support for translation tables 26, 29, 30, 32 and 33 in
GeneFinder.train.Genes.scoreproperty to count the total score of all extracted genes.full_idparameter toGenes.write_gff,Genes.write_translationandGenes.write_genesto control theIDfield written for each gene (#53).
Changed#
Gene.translatenow raises a warning when called with a translation table incompatible with the training info.
Fixed#
Bug in code for masking trailing nucleotides (#55).
v3.3.0 - 2024-01-24#
Added#
Changed#
Scorerinternal API to separate connection scoring and overlap disentangling.
Fixed#
Bug with computation of minimum node in connection scoring loop (hyattpd/Prodigal#108).
Out-of-bounds sequence access in
_shine_dalgarno_exactand_shine_dalgarno_mmmethods ofSequence.Memory leak in
Nodes.__setstate__caused by incorrect reallocation.
v3.2.2 - 2024-01-21#
Fixed#
Always mark SSE2 support on x86-64 CPUs independently of
archspec-detected features (#49).
v3.2.1 - 2023-11-27#
Added#
Option to change argument parser in
pyrodigal.cli.main.
v3.2.0 - 2023-11-27#
Added#
AVX-512 implementation of the SIMD pre-filter.
Additional support for reading
lz4andxzandzstd-compressed input in the CLI.Option to change gene finder type in
pyrodigal.cli.main.
v3.1.1 - 2023-11-06#
Fixed#
Incorrect unpickling of
GeneFindercausing crashes with multiprocessing (#46).
v3.1.0 - 2023-10-22#
Added#
Support for Python 3.12.
min_maskargument toGeneFinderto control the minimum lenght of masked regions onmask=True.
v3.0.1 - 2023-09-27#
Fixed#
Genes.write_scoresandGenes.write_gffcrashing on emptyGenes(#44).
v3.0.0 - 2023-09-17#
Added#
MetagenomicBinscollection to store a dense array ofMetagenomicBinobjects.metagenomic_binskeyword argument toGeneFinderallowing to control which models are used when running gene finding in meta mode (#24).metagenomic_binattribute toGenesreferencing the metagenomic model with which the genes were predicted, if in meta mode.Additional
TrainingInfoproperties (missing_motif_weight,coding_statistics).Setters for all remaining
TrainingInfoproperties.Proper
TrainingInfoconstructor with configuration option for all attributes.TrainingInfo.to_dictmethod to extract all parameters from aTrainingInfo.Genes.write_genbankmethod to write a GenBank record with all predicted genes from a sequence.include_stopflag toGene.translateandGenes.write_translationsto allow excluding the stop codon from the translated sequence.include_translation_tableflag toGenes.write_gffto include the translation table to the GFF attributes of each gene.gbkoutput format to the Pyrodigal CLI.Sequence.unknownproperty exposing the number of unknown nucleotides in the sequence.Sequence.start_probabilityandSequence.stop_probabilityto estimate the probability of encountering a start and a stop codon based on the GC%.
Fixed#
Genes.write_gffnot properly reporting the number of bytes written.Merge several
nogilsections inSequenceconstructor.Several Cython functions missing a
noexceptqualifier.
Changed#
BREAKING: Rename
OrfFindertoGeneFinderfor consistency.BREAKING: Use
memoryviewto expose allTrainingInfoattributes instead manually building lists or tuples.Reorganize memory management of the built-in metagenomic models.
Make the internal Cython model public (
pyrodigal.lib) to allow importing the underlying classes in other Cython projects.Use
typing.Literalfor allowed translation table values inpyrodigal.libannotationsCache intermediate log-odds in
Nodes._raw_coding_scoreto reduce calls topowandlogfunctions.Inline connection scoring functions to reduce function call overhead.
Reorganize
struct _nodefields to reduce size in memory.Make
GeneFinder.find_genesandGeneFinder.trainreserve memory for theNodesbased on the GC% of the input sequence.Avoid storing temporary results in the generic implementation of
ConnectionScorer.compute_skippable.Use Cython
freelistfor allocatingNode,Gene,MetagenomicBinandMask.Increase minimum allocation for
GenesandNodesto reduce early reallocations.
Removed#
BREAKING:
metagenomic_binattribute ofTrainingInfo.
v2.3.0 - 2023-07-20#
Changed#
Bump Cython to
v3.0.0.
v2.2.0 - 2023-06-19#
Changed#
Release GIL while masking sequence regions in
Sequence.__init__.Use
archspecinstead ofcpu_featuresfor runtime feature detection.
Added#
Support for reading
gzipandbz2-compressed input in the CLI.CLI flag to run ORF detection in parallel when input contains several contigs.
Removed#
Support for Python 3.5.
v2.1.0 - 2023-02-20#
Changed#
Update Prodigal to
v2.6.3+c1e2d36to fix a bug with Shine-Dalgarno detection on reverse contig edge (hyattpd/Prodigal#100).
Added#
Fixed#
ArchLinux User Repository package generation in CI.
v2.0.4 - 2023-01-09#
Fixed#
GC% computation and RBS scoring for reverse strand nodes close to the contig edge (#27).
v2.0.3 - 2022-12-20#
Fixed#
OrfFinder(mask=True)ignoring the minimum mask size when masking regions (#26).
Changed#
Use
cibuildhweelfor building wheel distributions.
Added#
Wheels for MacOS Aarch64 platforms.
v2.0.2 - 2022-11-01#
Fixed#
Syntax issue in Cython files failing build on Bioconda runner.
v2.0.1 - 2022-11-01#
Fixed#
Syntax issue in Cython files failing build on some environments.
v2.0.0 - 2022-11-01#
Added#
MMX implementation of the SIMD prefilter.
Proper GFF headers and metadata section to GFF output.
Sequence.gc_frame_plotmethod to compute the max GC frame profile from Python.metagenomic_binproperty toTrainingInfoto support recovering the object corresponding to a pre-trained model.metaattribute toGenesto store whether genes were predicted in single or in meta mode.pyrodigal.PRODIGAL_VERSIONconstant storing the wrapped Prodigal version.pyrodigal.MIN_SINGLE_GENOMEandpyrodigal.IDEAL_SINGLE_GENOMEconstants storing the minimum and recommended sequence sizes for training.
Changed#
Make all write methods of
Genesobjects require asequence_idargument instead of using the internal sequence number.Rewrite SIMD prefilter using a generic template with C macros.
Make
Maskrecord coordinates in start-inclusive end-exclusive mode to follow Python conventions.Make connection scoring tests only score some randomly selected node pairs for faster runs.
Rewrite tests to use
importlib.resourcesfor managing test data.
Removed#
from_bytesandfrom_stringconstructors ofSequenceobjects.
Fixed#
Duplicate extraction of start codons located on contig edges inside
Nodes._extract(#21).Pickling and unpickling of
TrainingInfoobjects corresponding to pre-trained models.Implementation of
calc_most_gc_framebeing inconsistent with the Prodigal implementation.Implementation of the maximum search in
score_connection_forward_startnot following the (weird?) behaviour from Prodigal (#21).Gene identifier being used instead of the sequence identifier in the GFF output (#18).
Out of bound access to sequence data in
Sequence._shine_dalgarno_mmandSequence._shine_dalgarno_exact.
v1.1.2 - 2022-08-31#
Changed#
Use the
vbicqArm intrinsic in the NEON implementation to combinevandqandvmvnq.
Fixed#
Prevent direct instantiation of
NodeandGeneobjects from Python code.Configuration of platform-specific NEON flags in
setup.pynot being applied to the linker.
v1.1.1 - 2022-07-08#
Fixed#
Some
cpu_featuressource files not being included in source distribution.
v1.1.0 - 2022-06-09#
Changed#
OrfFinder.traincan now be given more than one sequence argument to train on contigs from an unclosed genome.Updated
cpu_featurestov0.7.0and added hardware detection of NEON features on Linux Aarch64 platforms.
v1.0.2 - 2022-05-13#
Fixed#
Detection of Arm64 platform in
setup.py(#16).
v1.0.1 - 2022-04-28#
Changed#
pyrodigal.clinow concatenates training sequences the same way as Prodigal does.
v1.0.0 - 2022-04-20#
Stable version, to be published in the Journal of Open-Source Software.
Added#
pickleprotocol implementation forNodes,TrainingInfo,OrfFinder,Sequence,MasksandGenesobjects.Buffer protocol implementation for
Sequence, allowing access to raw digits.__eq__and__repr__magic methods toMaskobjects.
Changed#
Optimized code used for region masking to avoid searching for the same mask repeatedly.
TRANSLATION_TABLESandMETAGENOMIC_BINSare now exposed as constants in the toppyrodigalmodule.Refactored connection scoring into different functions based on the type (start/stop) and strand (direct/reverse) of the node being scored.
Changed the growth factor for dynamic arrays to be the same as the one used in CPython
listbuffers.
v0.7.3 - 2022-04-06#
Added#
Gene.scoreproperty to get the gene score as reported in the score data string.
Fixed#
OrfFinder.find_genesnot producing consistent results across runs in meta mode (#13).OrfFinder.find_genesreturningNodeswith incomplete score information.
v0.7.2 - 2022-03-15#
Changed#
Improve performance of
mer_ndxandscore_connectionusing dedicated implementations with better branch prediction.Mark arguments as
constin C code where possible.
Fixed#
Signatures of Cython classes not displaying properly because of the
embedsignaturedirective._sequence.hfunctions not being inlined as expected.
v0.7.1 - 2022-03-14#
Changed#
Rewrite internal
Sequencecode using inlined functions to increase performance when the strand is known.
Fixed#
Nodes.copypotentially failing on empty collections after trying to allocate 0 bytes.TestGenes.test_write_scoresfailing on some machines because of float rounding issues.Gene.translateignoring theunknown_residueargument value and always using"X".Memory leak in
Pyrodigal.traincause by memory not being freed after building the GC frame plot.
v0.7.0 - 2022-03-12#
Added#
Support for setting a custom minimum gene length in
pyrodigal.OrfFinder.Genes.write_scoresmethod to write the node scores to a file.Gene.__repr__andNode.__repr__methods to display some useful attributes.Sequence.__str__method to get back a nucleotide string from aSequenceobject.
Changed#
Use a more compact data structure to store
Genedata.
Fixed#
Nodes._calc_orf_gcreading nucleotides after the sequence end when computing GC content for edge nodes.
Removed#
pyrodigal.Pyrodigalclass (usepyrodigal.OrfFinderinstead).pyrodigal.Predictionsclass (functionality merged intopyrodigal.Genes).
v0.6.4 - 2021-12-23#
Added#
loadanddumpmethods toTrainingInfofor storing and loading a raw training info structure.Support for creating an
OrfFinderpre-configured with a training info.-tand-nflags to the CLI.
v0.6.3 - 2021-12-23#
Added#
pyrodigalcommand line script exposing a CLI mimicking the originalprodigalbinary.write_gff,write_genesandwrite_translationsmethods topyrodigal.Predictionsto write the predictions results to a file in different formats.Implementation for masking regions of unknown nucleotides in input sequences.
Changed#
Renamed
pyrodigal.Pyrodigalclass topyrodigal.OrfFinder.
Fixed#
setup.pybuild different SIMD implementations with the same set of feature flags, causing compilers to re-optimize the SIMD implementations.
v0.6.2 - 2021-09-25#
Added#
Sphinx documentation with small install guide and API reference.
Fixed#
setup.pynot detecting SSE2 and AVX2 build support because of a linker error.
Changed#
Build OSX extension without AVX2 support since runtime detection of AVX2 to avoid the
Illegal Instruction: 4bug on older CPUs.
v0.6.1 - 2021-09-24#
Fixed#
Source distribution lacking C files necessary for building
cpu_features.
v0.6.0 - 2021-09-23#
Added#
SIMD code to build an index of which connections can be skipped when scoring node connections in the dynamic programming routine (#6).
v0.5.4 - 2021-09-18#
Added#
Prediction.confidencemethod to compute the confidence for a prediction like reported in Prodigal’s GFF output.Prediction.sequencemethod get the nucleotide sequence of a predicted gene (#4).
Changed#
Replaced internal storage of input sequences to use a byte array instead of a bitmap.
Fixed#
Extract
Prediction.gc_contnumber directly from the start node instead of the text representation to get full accuracy.Prodigal bug causing nodes on the reverse strand to always receive a penalty instead of penalizing only small ORFs (hyattpd/Prodigal#88).
v0.5.3 - 2021-09-12#
Fixed#
Prediction.translatenot translating the last unknown codon properly for genes on the direct strand.
v0.5.2 - 2021-09-11#
Changed#
Make
Pyrodigal.trainreturn a reference to the newly createdTrainingInfofor inspection if needed.Reimplement
add_nodesandadd_genesto use a growable array instead of counting and pre-allocating the C arrays.
Fixed#
Inconsistent handling of unknown nucleotides in input sequences and gene translations.
v0.5.1 - 2021-09-04#
Added#
Additional
Geneproperties to access the score
Changed#
Use more efficient
PyUnicodemacros when reading or creating a string containing a nucleotide or a protein sequence.Release the GIL when creating a bitmap for an
strgiven as input toPyrodigal.find_genes.Release the GIL when creating the protein sequence returned by
Gene.translate.
Fixed#
Pyrodigal.find_genesandGene.translatenot behaving like Prodigal when handling sequences with unknown nucleotides.
v0.5.0 - 2021-06-15#
Added#
pyrodigal.TrainingInfoclass exposing variables obtained during training as an attribute toPyrodigal,GeneandGenesinstance.Support for passing objects implementing the buffer protocol to
Pyrodigal.find_genesandPyrodigal.traininstead of requiringstrsequences.
Fixed#
Potential data race on training info in case a
Gene.translatewith a non-default translation table was being translated at the same time as aPyrodigal.find_genescall.Spurious handling of Unicode strings causing potential issues on platform using a different base encoding.
v0.4.7 - 2021-04-09#
Fixed#
Pyrodigal.find_genessegfaulting on some sequences when called insinglemode (#2).MemoryErrorpotentially not being properly raised on allocation issues for sequence bitmaps.
v0.4.6 - 2021-03-05#
Changed#
Tests are now in the
pyrodigal.testsmodule and can be run after a site install.
Fixed#
Pyrodigal.find_genesstalling on sequences shorter than 3 nucleotides.
v0.4.5 - 2021-03-03#
Fixed#
Compilation of OSX and Windows wheels.
v0.4.4 - 2021-03-03#
Fixed#
Mark package as OS-independent.
Added#
Support for Python 3.5.
Compilation of PyPy wheels on OSX.
v0.4.3 - 2021-03-01#
Fixed#
Buffer overflow when running in
metamode on a sequence too small to have any dynamic programming nodes.
v0.4.2 - 2021-02-07#
Fixed#
Buffer overflow coming from the node array, caused by an incorrect estimation of the node count from the sequence length.
v0.4.1 - 2021-01-07#
Removed#
Python 3.5 from the project metadata (the code was only compatible with Python 3.6+ already because of f-strings).
Fixed#
Broken linking of static
libprodigalagainst the_pyrodigalextension on some OSX environments (bioconda/bioconda-recipes#25568).
v0.4.0 - 2021-01-06#
Changed#
trans_tablekeyword argument toPyrodigal.trainhas been renamed totranslation_table.
Added#
Option to change the translation table to any allowed number in
Gene.translate(#1).
v0.3.2 - 2020-11-27#
Fixed#
Broken compilation of PyPy wheels in Travis-CI.
v0.3.1 - 2020-11-27#
Added#
Link to Zenodo record in
README.md.Typing :: Typedclassifier to the PyPI metadata.Explicit support for Python 3.9.
Changed#
Streamlined compilation process when building from source distribution.
v0.3.0 - 2020-09-07#
Added#
Thread-safety for all
Pyrodigalmethods
Fixed#
Reduced total amount of memory used to allocated dynamic programming nodes for a given sequence.
v0.2.4 - 2020-09-04#
Added#
Precompiled wheels for Windows x86-64 platform.
Changed#
Compilation of large
Prodigal/training.cfile is now done in chunks and usesstatic constto reduce build time.
v0.2.3 - 2020-08-09#
Fixed#
Buffer overflow issue with Pyrodigal in
closed=Falsemode.
v0.2.2 - 2020-07-14#
Added#
Access to the translation table of a
Geneobject.
v0.2.1 - 2020-05-29#
Fixed#
Memory issues causing PyPy to crash when using
Pyrodigalin single mode.
v0.2.0 - 2020-05-28#
Added#
Support for Prodigal’s single mode.
v0.1.1 - 2020-04-30#
Added#
Distribution of CPython wheels for ManyLinux2010 and OSX platforms.
v0.1.0 - 2020-04-27#
Initial release.