Genes¶
- class pyrodigal.Genes¶
A list of raw genes found by Prodigal in a single sequence.
- sequence¶
The compressed input sequence for which the gene predictions were made.
- Type:
- training_info¶
A reference to the training info these predictions were obtained with.
- Type:
- nodes¶
A collection of raw nodes found in the input sequence.
- Type:
- meta¶
Whether these genes have been found after a run in metagenomic mode, or in single mode.
- Type:
- metagenomic_bin¶
The metagenomic model with which these genes have been found.
- Type:
Added in version 0.5.4.
Added in version 2.0.0: The
metaattribute.Added in version 3.0.0: The
metagenomic_binattribute.- write_genbank(file, sequence_id, division='BCT', date=None, translation_table=None, strict_translation=True)¶
Write predicted genes and sequence to
filein GenBank format.- Parameters:
file (
io.TextIOBase) – A file open in text mode where to write the GenBank record.sequence_id (
str) – The identifier of the sequence these genes were extracted from.division (
str) – The GenBank division to write in the GenBank header. Should often beBCT(for bacterial sequences) given the scope of Prodigal.date (
datetime.date, optional) – The date to write in the GenBank header, orNoneto usenow.translation_table (
intorNone) – A translation table to pass toGene.translate, orNoneto use the translation table from theTrainingInfothese genes were obtained with.strict_translation (
bool) – Whether to handle ambiguous codons in strict mode when translating. See thestrictparameter ofGene.translatefor more information.
- Returns:
int– The number of bytes written to the file.
Note
The original Prodigal outputs incomplete GenBank files containing only the coordinates of the predicted genes inside CDS features, without including the translation or the original sequence. Since this is not the most useful output, and often requires additional post-processing, Pyrodigal outputs a complete GenBank record instead.
Added in version 3.0.0.
Added in version 3.4.0: The
translation_tableandstrict_translationparameters.
- write_genes(file, sequence_id, width=70, full_id=False)¶
Write nucleotide sequences of genes to
filein FASTA format.- Parameters:
file (
io.TextIOBase) – A file open in text mode where to write the nucleotide sequences.sequence_id (
str) – The identifier of the sequence these genes were extracted from.width (
int) – The width to use to wrap sequence lines. Prodigal uses 70 for nucleotide sequences.full_id (
bool) – PassTrueto use the full sequence identifier in the header of each record, orFalseto use the sequence numbering such as the one used in Prodigal.
- Returns:
int– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional
prefixargument withsequence_id.Added in version 3.4.0:: The
full_idparameter.
- write_gff(file, sequence_id, header=True, include_translation_table=False, full_id=True)¶
Write the genes to
filein General Feature Format.- Parameters:
file (
io.TextIOBase) – A file open in text mode where to write the features.sequence_id (
str) – The identifier of the sequence these genes were extracted from. Used in the first column of the GFF-formated output.header (
bool) –Trueto write a GFF header line,Falseotherwise.include_translation_table (
bool) –Trueto write the translation table used to predict the genes in the GFF attributes,Falseotherwise. Useful for genes that were predicted from meta mode, since the different metagenomic models have different translation tables.full_id (
bool) – PassTrueto use the full sequence identifier in the header of each record, orFalseto use the sequence numbering such as the one used in Prodigal.
- Returns:
int– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional``prefix`` argument with
sequence_id.Added in version 3.0.0: The
include_translation_tableargument.Added in version 3.4.0: The
full_idparameter.
- write_scores(file, sequence_id, header=True)¶
Write the start scores to
filein tabular format.- Parameters:
file (
io.TextIOBase) – A file open in text mode where to write the features.sequence_id (
str) – The identifier of the sequence these genes were extracted from.header (
bool) –Trueto write a header line,Falseotherwise.
- Returns:
int– The number of bytes written to the file.
Added in version 0.7.0.
Added in version 2.0.0: The
sequence_idargument.
- write_translations(file, sequence_id, width=60, translation_table=None, include_stop=True, strict_translation=True, full_id=False)¶
Write protein sequences of genes to
filein FASTA format.- Parameters:
file (
io.TextIOBase) – A file open in text mode where to write the protein sequences.sequence_id (
str) – The identifier of the sequence these genes were extracted from.width (
int) – The width to use to wrap sequence lines. Prodigal uses 60 for protein sequences.translation_table (
int, optional) – A different translation to use to translation the genes. IfNonegiven, use the one from the training info.include_stop (
bool) – PassFalseto disable translating the STOP codon into a star character (*) for complete genes.Truekeeps the default behaviour of Prodigal, however it often does not play nice with other programs or libraries that will use the FASTA file for downstream processing.strict_translation (
bool) – Whether to handle ambiguous codons in strict mode when translating. See thestrictparameter ofGene.translatefor more information.full_id (
bool) – PassTrueto use the full sequence identifier in the header of each record, orFalseto use the sequence numbering such as the one used in Prodigal.
- Returns:
int– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional``prefix`` argument with
sequence_id.Added in version 3.0.0: The
include_stopargument.Added in version 3.4.0: The
strict_translationandfull_idparameters.
- score¶
The total score of the gene path in the sequence.
This value can be used to compare the genes obtained on the same sequence with different
TrainingInfoparameters, and find the best set of parameters for a sequence.Added in version 3.4.0.
- Type:
- class pyrodigal.Gene¶
A single raw gene found by Prodigal within a DNA sequence.
Caution
The gene coordinates follows the conventions from Prodigal, not Python, so coordinates are 1-based, end-inclusive. To index the original sequence with a gene object, remember to switch back to zero-based coordinates:
sequence[gene.begin-1:gene.end].Added in version 0.5.4.
- confidence()¶
Estimate the confidence of the prediction.
- Returns:
float– A confidence percentage (between 0 and 100).
- sequence()¶
Build the nucleotide sequence of this predicted gene.
This function takes care of reverse-complementing the sequence if it is on the reverse strand.
Note
Since Pyrodigal uses a generic symbol for unknown nucleotides, any unknown characters in the original sequence will be rendered with an
N.Added in version 0.5.4.
- translate(translation_table=None, unknown_residue=88, include_stop=True, strict=True)¶
Translate the predicted gene into a protein sequence.
- Parameters:
translation_table (
int, optional) – An alternative translation table to use to translate the gene. UseNone(the default) to translate using the translation table this gene was found with.unknown_residue (
str) – A single character to use for residues translated from codons with unknown nucleotides.include_stop (
bool) – PassFalseto disable translating the STOP codon into a star character (*) for complete genes.Truekeeps the default behaviour of Prodigal, however it often does not play nice with other programs or libraries that will use the protein sequence for downstream processing.strict (
bool) – IfTrue(the default), translate codons containing any unknown nucleotide asunknown_residue. IfFalse, attempt to translate some incomplete codons when there is no ambiguity, taking into account the translation table (e.g.CCN, which always translates toPro).
- Returns:
str– The proteins sequence as a string using the right translation table and the standard single letter alphabet for proteins.- Raises:
ValueError – when
translation_tableis not a valid genetic code number.
Added in version 3.0.0: The
include_stopkeyword argument.Added in version 3.4.0: The
strictkeyword argument.Changed in version 3.4.0: Added support for additional translation tables 26 to 33.
- cscore¶
The coding score for the start node, based on 6-mer usage.
Added in version 0.5.1.
- Type:
- rbs_motif¶
The motif of the Ribosome Binding Site.
Possible non-
Nonevalues areGGA/GAG/AGG,3Base/5BMM,4Base/6BMM,AGxAG,GGxGG,AGGAG(G)/GGAGG,AGGA,AGGA/GGAG/GAGG,GGAG/GAGG,AGGAG/GGAGG,AGGAG,GGAGGorAGGAGG.- Type:
str, optional
- rbs_spacer¶
The number of bases between the RBS and the CDS.
Possible non-
Nonevalues are3-4bp,5-10bp,11-12bpor13-15bp.- Type:
str, optional
- score¶
The gene score, sum of the coding and start codon scores.
Added in version 0.7.3.
- Type:
- start_type¶
The start codon of this gene.
Can be one of
ATG,GTGorTTG, orEdgeif theGeneFinderhas been initialized in open ends mode and the gene starts right at the beginning of the input sequence.- Type: