Genes#
- class pyrodigal.Genes#
A list of raw genes found by Prodigal in a single sequence.
- sequence#
The compressed input sequence for which the gene predictions were made.
- Type:
- training_info#
A reference to the training info these predictions were obtained with.
- Type:
- nodes#
A collection of raw nodes found in the input sequence.
- Type:
- meta#
Whether these genes have been found after a run in metagenomic mode, or in single mode.
- Type:
- metagenomic_bin#
The metagenomic model with which these genes have been found.
- Type:
Added in version 0.5.4.
Added in version 2.0.0: The
meta
attribute.Added in version 3.0.0: The
metagenomic_bin
attribute.- __len__()#
Return len(self).
- write_genbank(file, sequence_id, division='BCT', date=None, translation_table=None, strict_translation=True)#
Write predicted genes and sequence to
file
in GenBank format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the GenBank record.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.division (
str
) – The GenBank division to write in the GenBank header. Should often beBCT
(for bacterial sequences) given the scope of Prodigal.date (
datetime.date
, optional) – The date to write in the GenBank header, orNone
to usenow
.translation_table (
int
orNone
) – A translation table to pass toGene.translate
, orNone
to use the translation table from theTrainingInfo
these genes were obtained with.strict_translation (
bool
) – Whether to handle ambiguous codons in strict mode when translating. See thestrict
parameter ofGene.translate
for more information.
- Returns:
int
– The number of bytes written to the file.
Note
The original Prodigal outputs incomplete GenBank files containing only the coordinates of the predicted genes inside CDS features, without including the translation or the original sequence. Since this is not the most useful output, and often requires additional post-processing, Pyrodigal outputs a complete GenBank record instead.
Added in version 3.0.0.
Added in version 3.4.0: The
translation_table
andstrict_translation
parameters.
- write_genes(file, sequence_id, width=70, full_id=False)#
Write nucleotide sequences of genes to
file
in FASTA format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the nucleotide sequences.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.width (
int
) – The width to use to wrap sequence lines. Prodigal uses 70 for nucleotide sequences.full_id (
bool
) – PassTrue
to use the full sequence identifier in the header of each record, orFalse
to use the sequence numbering such as the one used in Prodigal.
- Returns:
int
– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional
prefix
argument withsequence_id
.Added in version 3.4.0:: The
full_id
parameter.
- write_gff(file, sequence_id, header=True, include_translation_table=False, full_id=True)#
Write the genes to
file
in General Feature Format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the features.sequence_id (
str
) – The identifier of the sequence these genes were extracted from. Used in the first column of the GFF-formated output.header (
bool
) –True
to write a GFF header line,False
otherwise.include_translation_table (
bool
) –True
to write the translation table used to predict the genes in the GFF attributes,False
otherwise. Useful for genes that were predicted from meta mode, since the different metagenomic models have different translation tables.full_id (
bool
) – PassTrue
to use the full sequence identifier in the header of each record, orFalse
to use the sequence numbering such as the one used in Prodigal.
- Returns:
int
– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional
prefix
argument withsequence_id
.Added in version 3.0.0: The
include_translation_table
argument.Added in version 3.4.0: The
full_id
parameter.
- write_scores(file, sequence_id, header=True)#
Write the start scores to
file
in tabular format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the features.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.header (
bool
) –True
to write a header line,False
otherwise.
- Returns:
int
– The number of bytes written to the file.
Added in version 0.7.0.
Added in version 2.0.0: The
sequence_id
argument.
- write_translations(file, sequence_id, width=60, translation_table=None, include_stop=True, strict_translation=True, full_id=False)#
Write protein sequences of genes to
file
in FASTA format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the protein sequences.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.width (
int
) – The width to use to wrap sequence lines. Prodigal uses 60 for protein sequences.translation_table (
int
, optional) – A different translation to use to translation the genes. IfNone
given, use the one from the training info.include_stop (
bool
) – PassFalse
to disable translating the STOP codon into a star character (*
) for complete genes.True
keeps the default behaviour of Prodigal, however it often does not play nice with other programs or libraries that will use the FASTA file for downstream processing.strict_translation (
bool
) – Whether to handle ambiguous codons in strict mode when translating. See thestrict
parameter ofGene.translate
for more information.full_id (
bool
) – PassTrue
to use the full sequence identifier in the header of each record, orFalse
to use the sequence numbering such as the one used in Prodigal.
- Returns:
int
– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional
prefix
argument withsequence_id
.Added in version 3.0.0: The
include_stop
argument.Added in version 3.4.0: The
strict_translation
andfull_id
parameters.
- score#
The total score of the gene path in the sequence.
This value can be used to compare the genes obtained on the same sequence with different
TrainingInfo
parameters, and find the best set of parameters for a sequence.Added in version 3.4.0.
- Type:
- class pyrodigal.Gene#
A single raw gene found by Prodigal within a DNA sequence.
Caution
The gene coordinates follows the conventions from Prodigal, not Python, so coordinates are 1-based, end-inclusive. To index the original sequence with a gene object, remember to switch back to zero-based coordinates:
sequence[gene.begin-1:gene.end]
.Added in version 0.5.4.
- confidence()#
Estimate the confidence of the prediction.
- Returns:
float
– A confidence percentage (between 0 and 100).
- sequence()#
Build the nucleotide sequence of this predicted gene.
This function takes care of reverse-complementing the gene sequence if the gene is located on the reverse strand.
- Returns:
str
– The nucleotide sequence of the predicted gene.
Note
Since Pyrodigal uses a generic symbol for unknown nucleotides, any unknown characters in the original sequence will be rendered with an
N
.Added in version 0.5.4.
- translate(translation_table=None, unknown_residue=88, include_stop=True, strict=True)#
Translate the predicted gene into a protein sequence.
- Parameters:
translation_table (
int
, optional) – An alternative translation table to use to translate the gene. UseNone
(the default) to translate using the translation table this gene was found with.unknown_residue (
str
) – A single character to use for residues translated from codons with unknown nucleotides.include_stop (
bool
) – PassFalse
to disable translating the STOP codon into a star character (*
) for complete genes.True
keeps the default behaviour of Prodigal, however it often does not play nice with other programs or libraries that will use the protein sequence for downstream processing.strict (
bool
) – IfTrue
(the default), translate codons containing any unknown nucleotide asunknown_residue
. IfFalse
, attempt to translate some incomplete codons when there is no ambiguity, taking into account the translation table (e.g.CCN
, which always translates toPro
).
- Returns:
str
– The proteins sequence as a string using the right translation table and the standard single letter alphabet for proteins.- Raises:
ValueError – when
translation_table
is not a valid genetic code number.
Added in version 3.0.0: The
include_stop
keyword argument.Added in version 3.4.0: The
strict
keyword argument.Changed in version 3.4.0: Added support for additional translation tables 26 to 33.
- cscore#
The coding score for the start node, based on 6-mer usage.
Added in version 0.5.1.
- Type:
- rbs_motif#
The motif of the Ribosome Binding Site.
Possible non-
None
values areGGA/GAG/AGG
,3Base/5BMM
,4Base/6BMM
,AGxAG
,GGxGG
,AGGAG(G)/GGAGG
,AGGA
,AGGA/GGAG/GAGG
,GGAG/GAGG
,AGGAG/GGAGG
,AGGAG
,GGAGG
orAGGAGG
.- Type:
str
, optional
- rbs_spacer#
The number of bases between the RBS and the CDS.
Possible non-
None
values are3-4bp
,5-10bp
,11-12bp
or13-15bp
.- Type:
str
, optional
- score#
The gene score, sum of the coding and start codon scores.
Added in version 0.7.3.
- Type:
- start_type#
The start codon of this gene.
Can be one of
ATG
,GTG
orTTG
, orEdge
if theGeneFinder
has been initialized in open ends mode and the gene starts right at the beginning of the input sequence.- Type: