OrfFinder¶
- class pyrodigal.OrfFinder¶
A configurable ORF finder for genomes and metagenomes.
- meta¶
Whether or not this object is configured to find genes using the metagenomic bins or manually created training infos.
- Type
- training_info¶
The object storing the training information, or
Noneif the object is in metagenomic mode or hasn’t been trained yet.- Type
- max_overlap¶
The maximum number of nucleotides that can overlap between two genes on the same strand.
- Type
- __init__(training_info=None, *, meta=False, closed=False, mask=False, min_gene=90, min_edge_gene=60, max_overlap=60)¶
Instantiate and configure a new ORF finder.
- Parameters
training_info (
TrainingInfo, optional) – A training info instance to use in single mode without having to train first.- Keyword Arguments
meta (
bool) – Set toTrueto run in metagenomic mode, using a pre-trained profiles for better results with metagenomic or progenomic inputs. Defaults toFalse.closed (
bool) – Set toTrueto consider sequences ends closed, which prevents proteins from running off edges. Defaults toFalse.mask (
bool) – Prevent genes from running across regions containing unknown nucleotides. Defaults toFalse.min_gene (
int) – The minimum gene length. Defaults to the value used in Prodigal.min_edge_gene (
int) – The minimum edge gene length. Defaults to the value used in Prodigal.max_overlap (
int) – The maximum number of nucleotides that can overlap between two genes on the same strand. This must be lower or equal to the minimum gene length.
Changed in version 0.6.4: Added the
training_infoargument.Changed in version 0.7.0: Added
min_edge,min_edge_geneandmax_overlap.
- find_genes(sequence)¶
Find all the genes in the input DNA sequence.
- Parameters
sequence (
stror buffer) – The nucleotide sequence to use, either as a string of nucleotides, or as an object implementing the buffer protocol. Letters not corresponding to an usual nucleotide (not any of “ATGC”) will be ignored.- Returns
Genes– A list of all the genes found in the input.- Raises
MemoryError – When allocation of an internal buffers fails.
RuntimeError – On calling this method without having called
trainbefore while in single mode.TypeError – When
sequencedoes not implement the buffer protocol.
- train(sequence, *sequences, force_nonsd=False, start_weight=4.35, translation_table=11)¶
Search parameters for the ORF finder using a training sequence.
If more than one sequence is provided, it is assumed that they are different contigs part of the same genome. Like in the original Prodigal implementation, they will be merged together in a single sequence joined by
TTAATTAATTAAlinkers.- Parameters
sequence (
stror buffer) – The nucleotide sequence to use, either as a string of nucleotides, or as an object implementing the buffer protocol.- Keyword Arguments
force_nonsd (
bool, optional) – Set toTrueto bypass the heuristic algorithm that tries to determine if the organism the training sequence belongs to uses a Shine-Dalgarno motif or not.start_weight (
float, optional) – The start score weight to use. The default value has been manually selected by the Prodigal authors as an appropriate value for 99% of genomes.translation_table (
int, optional) – The translation table to use. Check the Wikipedia page listing all genetic codes for the available values.
- Returns
TrainingInfo– The resulting training info, which can be saved to disk and used later on to create a newOrfFinderinstance.- Raises
MemoryError – When allocation of an internal buffers fails.
RuntimeError – When calling this method while in metagenomic mode.
TypeError – When
sequencedoes not implement the buffer protocol.ValueError – When
translation_tableis not a valid genetic code number, or whensequenceis too short to train.