Genes¶
- class pyrodigal.Genes¶
A list of raw genes found by Prodigal in a single sequence.
- sequence¶
The compressed input sequence for which the gene predictions were made.
- Type:
- training_info¶
A reference to the training info these predictions were obtained with.
- Type:
- nodes¶
A collection of raw nodes found in the input sequence.
- Type:
- meta¶
Whether these genes have been found after a run in metagenomic mode, or in single mode.
- Type:
- metagenomic_bin¶
The metagenomic model with which these genes have been found.
- Type:
New in version 0.5.4.
New in version 2.0.0: The
meta
attribute.New in version 3.0.0: The
metagenomic_bin
attribute.- write_genbank(file, sequence_id, division='BCT', date=None)¶
Write predicted genes and sequence to
file
in GenBank format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the GenBank record.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.division (
str
) – The GenBank division to write in the GenBank header. Should often beBCT
(for bacterial sequences) given the scope of Prodigal.date (
datetime.date
, optional) – The date to write in the GenBank header, orNone
to usenow
.
- Returns:
int
– The number of bytes written to the file.
Note
The original Prodigal outputs incomplete GenBank files containing only the coordinates of the predicted genes inside CDS features, without including the translation or the original sequence. Since this is not the most useful output, and often requires additional post-processing, Pyrodigal outputs a complete GenBank record instead.
New in version 3.0.0.
- write_genes(file, sequence_id, width=70)¶
Write nucleotide sequences of genes to
file
in FASTA format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the nucleotide sequences.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.width (
int
) – The width to use to wrap sequence lines. Prodigal uses 70 for nucleotide sequences.
- Returns:
int
– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional
prefix
argument withsequence_id
.
- write_gff(file, sequence_id, header=True, include_translation_table=False)¶
Write the genes to
file
in General Feature Format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the features.sequence_id (
str
) – The identifier of the sequence these genes were extracted from. Used in the first column of the GFF-formated output.header (
bool
) –True
to write a GFF header line,False
otherwise.include_translation_table (
bool
) –True
to write the translation table used to predict the genes in the GFF attributes,False
otherwise. Useful for genes that were predicted from meta mode, since the different metagenomic models have different translation tables.
- Returns:
int
– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional``prefix`` argument with
sequence_id
.New in version 3.0.0: The
include_translation_table
argument.
- write_scores(file, sequence_id, header=True)¶
Write the start scores to
file
in tabular format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the features.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.header (
bool
) –True
to write a header line,False
otherwise.
- Returns:
int
– The number of bytes written to the file.
New in version 0.7.0.
New in version 2.0.0: The
sequence_id
argument.
- write_translations(file, sequence_id, width=60, translation_table=None, include_stop=True)¶
Write protein sequences of genes to
file
in FASTA format.- Parameters:
file (
io.TextIOBase
) – A file open in text mode where to write the protein sequences.sequence_id (
str
) – The identifier of the sequence these genes were extracted from.width (
int
) – The width to use to wrap sequence lines. Prodigal uses 60 for protein sequences.translation_table (
int
, optional) – A different translation to use to translation the genes. IfNone
given, use the one from the training info.include_stop (
bool
) – PassFalse
to disable translating the STOP codon into a star character (*
) for complete genes.True
keeps the default behaviour of Prodigal, however it often does not play nice with other programs or libraries that will use the FASTA file for downstream processing.
- Returns:
int
– The number of bytes written to the file.
Changed in version 2.0.0: Replaced optional``prefix`` argument with
sequence_id
.New in version 3.0.0: The
include_stop
argument.
- class pyrodigal.Gene¶
A single raw gene found by Prodigal within a DNA sequence.
Caution
The gene coordinates follows the conventions from Prodigal, not Python, so coordinates are 1-based, end-inclusive. To index the original sequence with a gene object, remember to switch back to zero-based coordinates:
sequence[gene.begin-1:gene.end]
.New in version 0.5.4.
- confidence()¶
Estimate the confidence of the prediction.
- Returns:
float
– A confidence percentage (between 0 and 100).
- sequence()¶
Build the nucleotide sequence of this predicted gene.
This function takes care of reverse-complementing the sequence if it is on the reverse strand.
Note
Since Pyrodigal uses a generic symbol for unknown nucleotides, any unknown characters in the original sequence will be rendered with an
N
.New in version 0.5.4.
- translate(translation_table=None, unknown_residue=88, include_stop=True)¶
Translate the predicted gene into a protein sequence.
- Parameters:
translation_table (
int
, optional) – An alternative translation table to use to translate the gene. UseNone
(the default) to translate using the translation table this gene was found with.unknown_residue (
str
) – A single character to use for residues translated from codons with unknown nucleotides.include_stop (
bool
) – PassFalse
to disable translating the STOP codon into a star character (*
) for complete genes.True
keeps the default behaviour of Prodigal, however it often does not play nice with other programs or libraries that will use the protein sequence for downstream processing.
- Returns:
str
– The proteins sequence as a string using the right translation table and the standard single letter alphabet for proteins.- Raises:
ValueError – when
translation_table
is not a valid genetic code number.
New in version 3.0.0: The
include_stop
keyword argument.
- cscore¶
The coding score for the start node, based on 6-mer usage.
New in version 0.5.1.
- Type:
- rbs_motif¶
The motif of the Ribosome Binding Site.
Possible non-
None
values areGGA/GAG/AGG
,3Base/5BMM
,4Base/6BMM
,AGxAG
,GGxGG
,AGGAG(G)/GGAGG
,AGGA
,AGGA/GGAG/GAGG
,GGAG/GAGG
,AGGAG/GGAGG
,AGGAG
,GGAGG
orAGGAGG
.- Type:
str
, optional
- rbs_spacer¶
The number of bases between the RBS and the CDS.
Possible non-
None
values are3-4bp
,5-10bp
,11-12bp
or13-15bp
.- Type:
str
, optional
- start_type¶
The start codon of this gene.
Can be one of
ATG
,GTG
orTTG
, orEdge
if theGeneFinder
has been initialized in open ends mode and the gene starts right at the beginning of the input sequence.- Type: