Sequence

class pyrodigal.Sequence

A digitized input sequence.

gc

The GC content of the sequence, as a fraction (between 0 and 1). The computation in Prodigal takes the total length of the sequence as the denominator of the GC%, ignoring the unknown bases.

Type:

float

gc_known

The GC content of the sequence, taking only known nucleotides into account.

Type:

float

masks

A list of masked regions within the sequence. It will be empty if the sequence was created with mask=False.

Type:

Masks

unknown

The number of unknown bases (encoded as an N) in the sequence.

Type:

int

Changed in version 2.0.0: Removed the from_string and from_bytes constructors.

__init__()

Create a new Sequence object from a nucleotide sequence.

Parameters:
  • sequence (str, bytes or Sequence) – The sequence to read from. bytes or byte-like buffers will be treated as ASCII-encoded strings.

  • mask (bool) – Enable region-masking for spans of unknown characters, preventing genes from being built across them.

  • mask_size (int) – The minimum number of contiguous unknown nucleotides required to build a mask.

__len__()

Return the number of nucleotides in the sequence.