Genes

class pyrodigal.Genes

A list of raw genes found by Prodigal in a single sequence.

New in version 0.5.4.

sequence

The compressed input sequence for which the gene predictions were made.

Type

pyrodigal.Sequence

training_info

A reference to the training info these predictions were obtained with.

Type

pyrodigal.TrainingInfo

nodes

A collection of raw nodes found in the input sequence.

Type

pyrodigal.Nodes

write_genes(file, prefix='gene_', width=70)

Write nucleotide sequences of genes to file in FASTA format.

Parameters
  • file (io.TextIOBase) – A file open in text mode where to write the nucleotide sequences.

  • prefix (str) – The prefix to use to make identifiers for each predicted gene.

  • width (int) – The width to use to wrap sequence lines. Prodigal uses 70 for nucleotide sequences.

Returns

int – The number of bytes written to the file.

write_gff(file, prefix='gene_', width=60)

Write the genes to file in General Feature Format.

Parameters
  • file (io.TextIOBase) – A file open in text mode where to write the features.

  • prefix (str) – The prefix to use to make identifiers for each predicted gene.

Returns

int – The number of bytes written to the file.

write_scores(file, header=True)

Write the start scores to file in tabular format.

Parameters
  • file (io.TextIOBase) – A file open in text mode where to write the features.

  • header (bool) – True to write a header line, False otherwise.

Returns

int – The number of bytes written to the file.

New in version 0.7.0.

write_translations(file, prefix='gene_', width=60, translation_table=None)

Write protein sequences of genes to file in FASTA format.

Parameters
  • file (io.TextIOBase) – A file open in text mode where to write the protein sequences.

  • prefix (str) – The prefix to use to make identifiers for each predicted gene.

  • width (int) – The width to use to wrap sequence lines. Prodigal uses 60 for protein sequences.

  • translation_table (int, optional) – A different translation to use to translation the genes. If None given, use the one from the training info.

Returns

int – The number of bytes written to the file.

class pyrodigal.Gene

A single raw gene found by Prodigal within a DNA sequence.

New in version 0.5.4.

__init__(*args, **kwargs)
confidence()

Estimate the confidence of the prediction.

Returns

float – A confidence percentage (between 0 and 100).

sequence()

Build the nucleotide sequence of this predicted gene.

New in version 0.5.4.

translate(translation_table=None, unknown_residue='X')

Translate the predicted gene into a protein sequence.

Parameters
  • translation_table (int, optional) – An alternative translation table to use to translate the gene. Use None (the default) to translate using the translation table this gene was found with.

  • unknown_residue (str) – A single character to use for residues translated from codons with unknown nucleotides.

Returns

str – The proteins sequence as a string using the right translation table and the standard single letter alphabet for proteins.

Raises

ValueError – when translation_table is not a valid genetic code number.

begin

The coordinate at which the gene begins.

Type

int

cscore

The coding score for the start node, based on 6-mer usage.

New in version 0.5.1.

Type

float

end

The coordinate at which the gene ends.

Type

int

gc_cont

The GC content of the gene (between 0 and 1).

Type

float

partial_begin

whether the gene overlaps with the start of the sequence.

Type

bool

partial_end

whether the gene overlaps with the end of the sequence.

Type

bool

rbs_motif

The motif of the Ribosome Binding Site.

Possible non-None values are GGA/GAG/AGG, 3Base/5BMM, 4Base/6BMM, AGxAG, GGxGG, AGGAG(G)/GGAGG, AGGA, AGGA/GGAG/GAGG, GGAG/GAGG, AGGAG/GGAGG, AGGAG, GGAGG or AGGAGG.

Type

str, optional

rbs_spacer

The number of bases between the RBS and the CDS.

Possible non-None values are 3-4bp, 5-10bp, 11-12bp or 13-15bp.

Type

str, optional

rscore

The score for the RBS motif.

New in version 0.5.1.

Type

float

score

The gene score, sum of the coding and start codon scores.

New in version 0.7.3.

Type

float

sscore

The score for the strength of the start codon.

New in version 0.5.1.

Type

float

start_node

The start node at the beginning of this gene.

Type

Node

start_type

The start codon of this gene.

Can be one of ATG, GTG or TTG, or Edge if the OrfFinder has been initialized in open ends mode and the gene starts right at the beginning of the input sequence.

Type

str

stop_node

The stop node at the end of this gene.

Type

Node

strand

-1 if the gene is on the reverse strand, +1 otherwise.

Type

int

translation_table

The translation table used to find the gene.

Type

int

tscore

The score for the codon kind (ATG/GTG/TTG).

New in version 0.5.1.

Type

float

uscore

The score for the upstream regions.

New in version 0.5.1.

Type

float