Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased¶
v3.4.0 - 2024-05-19¶
Added¶
strictargument toGene.translateto control translation of ambiguous codons with unambiguous translation (#54).strict_translationargument toGenes.write_genbankandGenes.write_translation.Support for translation tables 26 to 33 in
Gene.translate.Support for translation tables 26, 29, 30, 32 and 33 in
GeneFinder.train.Genes.scoreproperty to count the total score of all extracted genes.full_idparameter toGenes.write_gff,Genes.write_translationandGenes.write_genesto control theIDfield written for each gene (#53).
Changed¶
Gene.translatenow raises a warning when called with a translation table incompatible with the training info.
Fixed¶
Bug in code for masking trailing nucleotides (#55).
v3.3.0 - 2024-01-24¶
Added¶
Changed¶
Scorerinternal API to separate connection scoring and overlap disentangling.
Fixed¶
Bug with computation of minimum node in connection scoring loop (hyattpd/Prodigal#108).
Out-of-bounds sequence access in
_shine_dalgarno_exactand_shine_dalgarno_mmmethods ofSequence.Memory leak in
Nodes.__setstate__caused by incorrect reallocation.
v3.2.2 - 2024-01-21¶
Fixed¶
Always mark SSE2 support on x86-64 CPUs independently of
archspec-detected features (#49).
v3.2.1 - 2023-11-27¶
Added¶
Option to change argument parser in
pyrodigal.cli.main.
v3.2.0 - 2023-11-27¶
Added¶
AVX-512 implementation of the SIMD pre-filter.
Additional support for reading
lz4andxzandzstd-compressed input in the CLI.Option to change gene finder type in
pyrodigal.cli.main.
v3.1.1 - 2023-11-06¶
Fixed¶
Incorrect unpickling of
GeneFindercausing crashes with multiprocessing (#46).
v3.1.0 - 2023-10-22¶
Added¶
Support for Python 3.12.
min_maskargument toGeneFinderto control the minimum lenght of masked regions onmask=True.
v3.0.1 - 2023-09-27¶
Fixed¶
Genes.write_scoresandGenes.write_gffcrashing on emptyGenes(#44).
v3.0.0 - 2023-09-17¶
Added¶
MetagenomicBinscollection to store a dense array ofMetagenomicBinobjects.metagenomic_binskeyword argument toGeneFinderallowing to control which models are used when running gene finding in meta mode (#24).metagenomic_binattribute toGenesreferencing the metagenomic model with which the genes were predicted, if in meta mode.Additional
TrainingInfoproperties (missing_motif_weight,coding_statistics).Setters for all remaining
TrainingInfoproperties.Proper
TrainingInfoconstructor with configuration option for all attributes.TrainingInfo.to_dictmethod to extract all parameters from aTrainingInfo.Genes.write_genbankmethod to write a GenBank record with all predicted genes from a sequence.include_stopflag toGene.translateandGenes.write_translationsto allow excluding the stop codon from the translated sequence.include_translation_tableflag toGenes.write_gffto include the translation table to the GFF attributes of each gene.gbkoutput format to the Pyrodigal CLI.Sequence.unknownproperty exposing the number of unknown nucleotides in the sequence.Sequence.start_probabilityandSequence.stop_probabilityto estimate the probability of encountering a start and a stop codon based on the GC%.
Fixed¶
Genes.write_gffnot properly reporting the number of bytes written.Merge several
nogilsections inSequenceconstructor.Several Cython functions missing a
noexceptqualifier.
Changed¶
BREAKING: Rename
OrfFindertoGeneFinderfor consistency.BREAKING: Use
memoryviewto expose allTrainingInfoattributes instead manually building lists or tuples.Reorganize memory management of the built-in metagenomic models.
Make the internal Cython model public (
pyrodigal.lib) to allow importing the underlying classes in other Cython projects.Use
typing.Literalfor allowed translation table values inpyrodigal.libannotationsCache intermediate log-odds in
Nodes._raw_coding_scoreto reduce calls topowandlogfunctions.Inline connection scoring functions to reduce function call overhead.
Reorganize
struct _nodefields to reduce size in memory.Make
GeneFinder.find_genesandGeneFinder.trainreserve memory for theNodesbased on the GC% of the input sequence.Avoid storing temporary results in the generic implementation of
ConnectionScorer.compute_skippable.Use Cython
freelistfor allocatingNode,Gene,MetagenomicBinandMask.Increase minimum allocation for
GenesandNodesto reduce early reallocations.
Removed¶
BREAKING:
metagenomic_binattribute ofTrainingInfo.
v2.3.0 - 2023-07-20¶
Changed¶
Bump Cython to
v3.0.0.
v2.2.0 - 2023-06-19¶
Changed¶
Release GIL while masking sequence regions in
Sequence.__init__.Use
archspecinstead ofcpu_featuresfor runtime feature detection.
Added¶
Support for reading
gzipandbz2-compressed input in the CLI.CLI flag to run ORF detection in parallel when input contains several contigs.
Removed¶
Support for Python 3.5.
v2.1.0 - 2023-02-20¶
Changed¶
Update Prodigal to
v2.6.3+c1e2d36to fix a bug with Shine-Dalgarno detection on reverse contig edge (hyattpd/Prodigal#100).
Added¶
Fixed¶
ArchLinux User Repository package generation in CI.
v2.0.4 - 2023-01-09¶
Fixed¶
GC% computation and RBS scoring for reverse strand nodes close to the contig edge (#27).
v2.0.3 - 2022-12-20¶
Fixed¶
OrfFinder(mask=True)ignoring the minimum mask size when masking regions (#26).
Changed¶
Use
cibuildhweelfor building wheel distributions.
Added¶
Wheels for MacOS Aarch64 platforms.
v2.0.2 - 2022-11-01¶
Fixed¶
Syntax issue in Cython files failing build on Bioconda runner.
v2.0.1 - 2022-11-01¶
Fixed¶
Syntax issue in Cython files failing build on some environments.
v2.0.0 - 2022-11-01¶
Added¶
MMX implementation of the SIMD prefilter.
Proper GFF headers and metadata section to GFF output.
Sequence.gc_frame_plotmethod to compute the max GC frame profile from Python.metagenomic_binproperty toTrainingInfoto support recovering the object corresponding to a pre-trained model.metaattribute toGenesto store whether genes were predicted in single or in meta mode.pyrodigal.PRODIGAL_VERSIONconstant storing the wrapped Prodigal version.pyrodigal.MIN_SINGLE_GENOMEandpyrodigal.IDEAL_SINGLE_GENOMEconstants storing the minimum and recommended sequence sizes for training.
Changed¶
Make all write methods of
Genesobjects require asequence_idargument instead of using the internal sequence number.Rewrite SIMD prefilter using a generic template with C macros.
Make
Maskrecord coordinates in start-inclusive end-exclusive mode to follow Python conventions.Make connection scoring tests only score some randomly selected node pairs for faster runs.
Rewrite tests to use
importlib.resourcesfor managing test data.
Removed¶
from_bytesandfrom_stringconstructors ofSequenceobjects.
Fixed¶
Duplicate extraction of start codons located on contig edges inside
Nodes._extract(#21).Pickling and unpickling of
TrainingInfoobjects corresponding to pre-trained models.Implementation of
calc_most_gc_framebeing inconsistent with the Prodigal implementation.Implementation of the maximum search in
score_connection_forward_startnot following the (weird?) behaviour from Prodigal (#21).Gene identifier being used instead of the sequence identifier in the GFF output (#18).
Out of bound access to sequence data in
Sequence._shine_dalgarno_mmandSequence._shine_dalgarno_exact.
v1.1.2 - 2022-08-31¶
Changed¶
Use the
vbicqArm intrinsic in the NEON implementation to combinevandqandvmvnq.
Fixed¶
Prevent direct instantiation of
NodeandGeneobjects from Python code.Configuration of platform-specific NEON flags in
setup.pynot being applied to the linker.
v1.1.1 - 2022-07-08¶
Fixed¶
Some
cpu_featuressource files not being included in source distribution.
v1.1.0 - 2022-06-09¶
Changed¶
OrfFinder.traincan now be given more than one sequence argument to train on contigs from an unclosed genome.Updated
cpu_featurestov0.7.0and added hardware detection of NEON features on Linux Aarch64 platforms.
v1.0.2 - 2022-05-13¶
Fixed¶
Detection of Arm64 platform in
setup.py(#16).
v1.0.1 - 2022-04-28¶
Changed¶
pyrodigal.clinow concatenates training sequences the same way as Prodigal does.
v1.0.0 - 2022-04-20¶
Stable version, to be published in the Journal of Open-Source Software.
Added¶
pickleprotocol implementation forNodes,TrainingInfo,OrfFinder,Sequence,MasksandGenesobjects.Buffer protocol implementation for
Sequence, allowing access to raw digits.__eq__and__repr__magic methods toMaskobjects.
Changed¶
Optimized code used for region masking to avoid searching for the same mask repeatedly.
TRANSLATION_TABLESandMETAGENOMIC_BINSare now exposed as constants in the toppyrodigalmodule.Refactored connection scoring into different functions based on the type (start/stop) and strand (direct/reverse) of the node being scored.
Changed the growth factor for dynamic arrays to be the same as the one used in CPython
listbuffers.
v0.7.3 - 2022-04-06¶
Added¶
Gene.scoreproperty to get the gene score as reported in the score data string.
Fixed¶
OrfFinder.find_genesnot producing consistent results across runs in meta mode (#13).OrfFinder.find_genesreturningNodeswith incomplete score information.
v0.7.2 - 2022-03-15¶
Changed¶
Improve performance of
mer_ndxandscore_connectionusing dedicated implementations with better branch prediction.Mark arguments as
constin C code where possible.
Fixed¶
Signatures of Cython classes not displaying properly because of the
embedsignaturedirective._sequence.hfunctions not being inlined as expected.
v0.7.1 - 2022-03-14¶
Changed¶
Rewrite internal
Sequencecode using inlined functions to increase performance when the strand is known.
Fixed¶
Nodes.copypotentially failing on empty collections after trying to allocate 0 bytes.TestGenes.test_write_scoresfailing on some machines because of float rounding issues.Gene.translateignoring theunknown_residueargument value and always using"X".Memory leak in
Pyrodigal.traincause by memory not being freed after building the GC frame plot.
v0.7.0 - 2022-03-12¶
Added¶
Support for setting a custom minimum gene length in
pyrodigal.OrfFinder.Genes.write_scoresmethod to write the node scores to a file.Gene.__repr__andNode.__repr__methods to display some useful attributes.Sequence.__str__method to get back a nucleotide string from aSequenceobject.
Changed¶
Use a more compact data structure to store
Genedata.
Fixed¶
Nodes._calc_orf_gcreading nucleotides after the sequence end when computing GC content for edge nodes.
Removed¶
pyrodigal.Pyrodigalclass (usepyrodigal.OrfFinderinstead).pyrodigal.Predictionsclass (functionality merged intopyrodigal.Genes).
v0.6.4 - 2021-12-23¶
Added¶
loadanddumpmethods toTrainingInfofor storing and loading a raw training info structure.Support for creating an
OrfFinderpre-configured with a training info.-tand-nflags to the CLI.
v0.6.3 - 2021-12-23¶
Added¶
pyrodigalcommand line script exposing a CLI mimicking the originalprodigalbinary.write_gff,write_genesandwrite_translationsmethods topyrodigal.Predictionsto write the predictions results to a file in different formats.Implementation for masking regions of unknown nucleotides in input sequences.
Changed¶
Renamed
pyrodigal.Pyrodigalclass topyrodigal.OrfFinder.
Fixed¶
setup.pybuild different SIMD implementations with the same set of feature flags, causing compilers to re-optimize the SIMD implementations.
v0.6.2 - 2021-09-25¶
Added¶
Sphinx documentation with small install guide and API reference.
Fixed¶
setup.pynot detecting SSE2 and AVX2 build support because of a linker error.
Changed¶
Build OSX extension without AVX2 support since runtime detection of AVX2 to avoid the
Illegal Instruction: 4bug on older CPUs.
v0.6.1 - 2021-09-24¶
Fixed¶
Source distribution lacking C files necessary for building
cpu_features.
v0.6.0 - 2021-09-23¶
Added¶
SIMD code to build an index of which connections can be skipped when scoring node connections in the dynamic programming routine (#6).
v0.5.4 - 2021-09-18¶
Added¶
Prediction.confidencemethod to compute the confidence for a prediction like reported in Prodigal’s GFF output.Prediction.sequencemethod get the nucleotide sequence of a predicted gene (#4).
Changed¶
Replaced internal storage of input sequences to use a byte array instead of a bitmap.
Fixed¶
Extract
Prediction.gc_contnumber directly from the start node instead of the text representation to get full accuracy.Prodigal bug causing nodes on the reverse strand to always receive a penalty instead of penalizing only small ORFs (hyattpd/Prodigal#88).
v0.5.3 - 2021-09-12¶
Fixed¶
Prediction.translatenot translating the last unknown codon properly for genes on the direct strand.
v0.5.2 - 2021-09-11¶
Changed¶
Make
Pyrodigal.trainreturn a reference to the newly createdTrainingInfofor inspection if needed.Reimplement
add_nodesandadd_genesto use a growable array instead of counting and pre-allocating the C arrays.
Fixed¶
Inconsistent handling of unknown nucleotides in input sequences and gene translations.
v0.5.1 - 2021-09-04¶
Added¶
Additional
Geneproperties to access the score
Changed¶
Use more efficient
PyUnicodemacros when reading or creating a string containing a nucleotide or a protein sequence.Release the GIL when creating a bitmap for an
strgiven as input toPyrodigal.find_genes.Release the GIL when creating the protein sequence returned by
Gene.translate.
Fixed¶
Pyrodigal.find_genesandGene.translatenot behaving like Prodigal when handling sequences with unknown nucleotides.
v0.5.0 - 2021-06-15¶
Added¶
pyrodigal.TrainingInfoclass exposing variables obtained during training as an attribute toPyrodigal,GeneandGenesinstance.Support for passing objects implementing the buffer protocol to
Pyrodigal.find_genesandPyrodigal.traininstead of requiringstrsequences.
Fixed¶
Potential data race on training info in case a
Gene.translatewith a non-default translation table was being translated at the same time as aPyrodigal.find_genescall.Spurious handling of Unicode strings causing potential issues on platform using a different base encoding.
v0.4.7 - 2021-04-09¶
Fixed¶
Pyrodigal.find_genessegfaulting on some sequences when called insinglemode (#2).MemoryErrorpotentially not being properly raised on allocation issues for sequence bitmaps.
v0.4.6 - 2021-03-05¶
Changed¶
Tests are now in the
pyrodigal.testsmodule and can be run after a site install.
Fixed¶
Pyrodigal.find_genesstalling on sequences shorter than 3 nucleotides.
v0.4.5 - 2021-03-03¶
Fixed¶
Compilation of OSX and Windows wheels.
v0.4.4 - 2021-03-03¶
Fixed¶
Mark package as OS-independent.
Added¶
Support for Python 3.5.
Compilation of PyPy wheels on OSX.
v0.4.3 - 2021-03-01¶
Fixed¶
Buffer overflow when running in
metamode on a sequence too small to have any dynamic programming nodes.
v0.4.2 - 2021-02-07¶
Fixed¶
Buffer overflow coming from the node array, caused by an incorrect estimation of the node count from the sequence length.
v0.4.1 - 2021-01-07¶
Removed¶
Python 3.5 from the project metadata (the code was only compatible with Python 3.6+ already because of f-strings).
Fixed¶
Broken linking of static
libprodigalagainst the_pyrodigalextension on some OSX environments (bioconda/bioconda-recipes#25568).
v0.4.0 - 2021-01-06¶
Changed¶
trans_tablekeyword argument toPyrodigal.trainhas been renamed totranslation_table.
Added¶
Option to change the translation table to any allowed number in
Gene.translate(#1).
v0.3.2 - 2020-11-27¶
Fixed¶
Broken compilation of PyPy wheels in Travis-CI.
v0.3.1 - 2020-11-27¶
Added¶
Link to Zenodo record in
README.md.Typing :: Typedclassifier to the PyPI metadata.Explicit support for Python 3.9.
Changed¶
Streamlined compilation process when building from source distribution.
v0.3.0 - 2020-09-07¶
Added¶
Thread-safety for all
Pyrodigalmethods
Fixed¶
Reduced total amount of memory used to allocated dynamic programming nodes for a given sequence.
v0.2.4 - 2020-09-04¶
Added¶
Precompiled wheels for Windows x86-64 platform.
Changed¶
Compilation of large
Prodigal/training.cfile is now done in chunks and usesstatic constto reduce build time.
v0.2.3 - 2020-08-09¶
Fixed¶
Buffer overflow issue with Pyrodigal in
closed=Falsemode.
v0.2.2 - 2020-07-14¶
Added¶
Access to the translation table of a
Geneobject.
v0.2.1 - 2020-05-29¶
Fixed¶
Memory issues causing PyPy to crash when using
Pyrodigalin single mode.
v0.2.0 - 2020-05-28¶
Added¶
Support for Prodigal’s single mode.
v0.1.1 - 2020-04-30¶
Added¶
Distribution of CPython wheels for ManyLinux2010 and OSX platforms.
v0.1.0 - 2020-04-27¶
Initial release.