Pyrodigal |Stars|
=================
.. |Stars| image:: https://img.shields.io/github/stars/althonos/pyrodigal.svg?style=social&maxAge=3600&label=Star
:target: https://github.com/althonos/pyrodigal/stargazers
*Cython bindings and Python interface to* `Prodigal `_,
*an ORF finder for genomes and metagenomes*. **Now with SIMD!**
|Actions| |Coverage| |PyPI| |Bioconda| |AUR| |Wheel| |Versions| |Implementations| |License| |Source| |Mirror| |Issues| |Docs| |Changelog| |Downloads| |Paper|
.. |Actions| image:: https://img.shields.io/github/actions/workflow/status/althonos/pyrodigal/test.yml?branch=main&logo=github&style=flat-square&maxAge=300
:target: https://github.com/althonos/pyrodigal/actions
.. |GitLabCI| image:: https://img.shields.io/gitlab/pipeline/larralde/pyrodigal/main?gitlab_url=https%3A%2F%2Fgit.embl.de&logo=gitlab&style=flat-square&maxAge=600
:target: https://git.embl.de/larralde/pyrodigal/-/pipelines
.. |Coverage| image:: https://img.shields.io/codecov/c/gh/althonos/pyrodigal?style=flat-square&maxAge=600
:target: https://codecov.io/gh/althonos/pyrodigal/
.. |PyPI| image:: https://img.shields.io/pypi/v/pyrodigal.svg?style=flat-square&maxAge=3600
:target: https://pypi.python.org/pypi/pyrodigal
.. |Bioconda| image:: https://img.shields.io/conda/vn/bioconda/pyrodigal?style=flat-square&maxAge=3600
:target: https://anaconda.org/bioconda/pyrodigal
.. |AUR| image:: https://img.shields.io/aur/version/python-pyrodigal?logo=archlinux&style=flat-square&maxAge=3600
:target: https://aur.archlinux.org/packages/python-pyrodigal
.. |Wheel| image:: https://img.shields.io/pypi/wheel/pyrodigal?style=flat-square&maxAge=3600
:target: https://pypi.org/project/pyrodigal/#files
.. |Versions| image:: https://img.shields.io/pypi/pyversions/pyrodigal.svg?style=flat-square&maxAge=3600
:target: https://pypi.org/project/pyrodigal/#files
.. |Implementations| image:: https://img.shields.io/pypi/implementation/pyrodigal.svg?style=flat-square&maxAge=3600&label=impl
:target: https://pypi.org/project/pyrodigal/#files
.. |License| image:: https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square&maxAge=3600
:target: https://choosealicense.com/licenses/mit/
.. |Source| image:: https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square
:target: https://github.com/althonos/pyrodigal/
.. |Mirror| image:: https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400
:target: https://git.embl.de/larralde/pyrodigal/
.. |Issues| image:: https://img.shields.io/github/issues/althonos/pyrodigal.svg?style=flat-square&maxAge=600
:target: https://github.com/althonos/pyrodigal/issues
.. |Docs| image:: https://img.shields.io/readthedocs/pyrodigal?style=flat-square&maxAge=3600
:target: http://pyrodigal.readthedocs.io/en/stable/?badge=stable
.. |Changelog| image:: https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square
:target: https://github.com/althonos/pyrodigal/blob/main/CHANGELOG.md
.. |Downloads| image:: https://img.shields.io/pypi/dm/pyrodigal?style=flat-square&color=303f9f&maxAge=86400&label=downloads
:target: https://pepy.tech/project/pyrodigal
.. |Paper| image:: https://img.shields.io/badge/paper-JOSS-9400ff?style=flat-square&maxAge=86400
:target: https://doi.org/10.21105/joss.04296
Overview
--------
Pyrodigal is a Python module that provides bindings to Prodigal using
`Cython `_. It directly interacts with the Prodigal
internals, which has the following advantages:
- **single dependency**: Pyrodigal is distributed as a Python package, so you
can add it as a dependency to your project, and stop worrying about the
Prodigal binary being present on the end-user machine.
- **no intermediate files**: Everything happens in memory, in a Python object
you fully control, so you don't have to invoke the Prodigal CLI using a
sub-process and temporary files. Sequences can be passed directly as
strings or bytes, which avoids the overhead of formatting your input to
FASTA for Prodigal.
- **lower memory usage**: Pyrodigal is slightly more conservative when it comes
to using memory, which can help process very large sequences. It also lets
you save some more memory when running several *meta*-mode analyses
- **better performance**: Pyrodigal uses *SIMD* instructions to compute which
dynamic programming nodes can be ignored when scoring connections. This can
save from a third to half the runtime depending on the sequence. The
`Benchmarks `_
page of the documentation contains comprehensive comparisons. See the
`JOSS paper `_ for details about how
this is achieved.
- **same results**: Pyrodigal is tested to make sure it produces
exactly the same results as Prodigal ``v2.6.3+31b300a``. This was verified
extensively by `Julian Hahnfeld `_ and can be
checked with his `comparison repository `_.
Features
--------
The library now features everything from the original Prodigal CLI:
- **run mode selection**: Choose between *single* mode, using a training
sequence to count nucleotide hexamers, or *metagenomic* mode, using
pre-trained data from different organisms (``prodigal -p``).
- **region masking**: Prevent genes from being predicted across regions
containing unknown nucleotides (``prodigal -m``).
- **closed ends**: Genes will be identified as running over edges if they
are larger than a certain size, but this can be disabled (``prodigal -c``).
- **training configuration**: During the training process, a custom
translation table can be given (``prodigal -g``), and the Shine-Dalgarno motif
search can be forcefully bypassed (``prodigal -n``)
- **output files**: Output files can be written in a format mostly
compatible with the Prodigal binary, including the protein translations
in FASTA format (``prodigal -a``), the gene sequences in FASTA format
(``prodigal -d``), or the potential gene scores in tabular format
(``prodigal -s``).
- **training data persistence**: Getting training data from a sequence and
using it for other sequences is supported; in addition, a training data
file can be saved and loaded transparently (``prodigal -t``).
In addition, the **new** features are available:
- **custom gene size threshold**: While Prodigal uses a minimum gene size
of 90 nucleotides (60 if on edge), Pyrodigal allows to customize this
threshold, allowing for smaller ORFs to be identified if needed.
Several changes were done regarding **memory management**:
- **digitized sequences**: Sequences are stored as raw bytes instead of compressed
bitmaps. This means that the sequence itself takes 3/8th more space, but since
the memory used for storing the sequence is often negligible compared to the
memory used to store dynamic programming nodes, this is an acceptable
trade-off for better performance when extracting said nodes.
- **node buffer growth**: Node arrays are dynamically allocated and grow
exponentially instead of being pre-allocated with a large size. On small
sequences, this leads to Pyrodigal using about 30% less memory.
- **lightweight genes**: Genes are stored in a more compact data structure than in
Prodigal (which reserves a buffer to store string data), saving around 1KiB
per gene.
Setup
-----
Run ``pip install pyrodigal`` in a shell to download the latest release and all
its dependencies from PyPi, or have a look at the
:doc:`Installation page ` to find other ways to install ``pyrodigal``.
Citation
--------
Pyrodigal is scientific software, with a
`published paper `_
in the `Journal of Open-Source Software `_. Check the
:doc:`Publications page ` to see how to cite Pyrodigal properly.
Library
-------
.. toctree::
:maxdepth: 2
Installation
Output Formats
Contributing
Publications
Benchmarks
API Reference
Changelog
License
-------
This library is provided under the `GNU General Public License v3.0 `_.
The Prodigal code was written by `Doug Hyatt `_ and is distributed under the
terms of the GPLv3 as well.
*This project is in no way not affiliated, sponsored, or otherwise endorsed by
the original* `Prodigal`_ *authors. It was developed by* `Martin Larralde `_ *during his
PhD project at the* `European Molecular Biology Laboratory `_
*in the* `Zeller team `_.