pydna.utils

Miscellaneous functions.

pydna.utils.three_frame_orfs(dna: str, limit: int = 100, startcodons: tuple = ('ATG',), stopcodons: tuple = ('TAG', 'TAA', 'TGA'))[source]: Overlapping orfs in three frames.

pydna.utils.shift_location(original_location, shift, lim)[source]: docstring.

pydna.utils.shift_feature(feature, shift, lim)[source]: Return a new feature with shifted location.

pydna.utils.smallest_rotation(s)[source]

Smallest rotation of a string.

Algorithm described in Pierre Duval, Jean. 1983. Factorizing Words over an Ordered Alphabet. Journal of Algorithms & Computational Technology 4 (4) (December 1): 363–381. and Algorithms on strings and sequences based on Lyndon words, David Eppstein 2011. https://gist.github.com/dvberkel/1950267

Examples

>>> from pydna.utils import smallest_rotation
>>> smallest_rotation("taaa")
'aaat'

pydna.utils.cai(seq: str, organism: str = 'sce', weights: dict = _weights)[source]: docstring.

pydna.utils.rarecodons(seq: str, organism='sce')[source]: docstring.

pydna.utils.express(seq: str, organism='sce')[source]

docstring.

NOT IMPLEMENTED YET

pydna.utils.open_folder(pth)[source]: docstring.

pydna.utils.rc(sequence: StrOrBytes) → StrOrBytes[source]

Reverse complement.

accepts mixed DNA/RNA

pydna.utils.complement(sequence: str)[source]

Complement.

accepts mixed DNA/RNA

pydna.utils.identifier_from_string(s: str) → str[source]

Return a valid python identifier.

based on the argument s or an empty string

pydna.utils.flatten(*args) → List[source]

Flattens an iterable of iterables.

Down to str, bytes, bytearray or any of the pydna or Biopython seq objects

pydna.utils.seq31(seq)[source]

Turn a three letter code protein sequence into one with one letter code.

The single input argument ‘seq’ should be a protein sequence using single letter codes, as a python string.

This function returns the amino acid sequence as a string using the one letter amino acid codes. Output follows the IUPAC standard (including ambiguous characters B for “Asx”, J for “Xle” and X for “Xaa”, and also U for “Sel” and O for “Pyl”) plus “Ter” for a terminator given as an asterisk.

Any unknown character (including possible gap characters), is changed into ‘Xaa’.

Examples

>>> from Bio.SeqUtils import seq3
>>> seq3("MAIVMGRWKGAR*")
'MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer'
>>> from pydna.utils import seq31
>>> seq31('MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer')
'M  A  I  V  M  G  R  W  K  G  A  R  *'

pydna.utils.randomRNA(length, maxlength=None)[source]: docstring.

pydna.utils.randomDNA(length, maxlength=None)[source]: docstring.

pydna.utils.randomORF(length, maxlength=None)[source]: docstring.

pydna.utils.randomprot(length, maxlength=None)[source]: docstring.

pydna.utils.eq(*args, **kwargs)[source]

Compare two or more DNA sequences for equality.

Compares two or more DNA sequences for equality i.e. if they represent the same double stranded DNA molecule.

Parameters:

args (iterable) – iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.
circular (bool, optional) – Consider all molecules circular or linear
linear (bool, optional) – Consider all molecules circular or linear

Returns:

eq – Returns True or False

Return type:

bool

Notes

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

They have the same sequence (case insensitive)
One sequence is the reverse complement of the other

Two circular sequences are considered equal if they are circular permutations meaning that they have the same length and:

One sequence can be found in the concatenation of the other sequence with itself.
The reverse complement of one sequence can be found in the concatenation of the other sequence with itself.

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.utils import eq
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True

pydna.utils.cuts_overlap(left_cut, right_cut, seq_len)[source]

pydna.utils.location_boundaries(loc: SimpleLocation | CompoundLocation)[source]

pydna.utils.locations_overlap(loc1: SimpleLocation | CompoundLocation, loc2: SimpleLocation | CompoundLocation, seq_len)[source]

pydna.utils.sum_is_sticky(three_prime_end: tuple[str, str], five_prime_end: tuple[str, str], partial: bool = False) → int[source]: Return the overlap length if the 3’ end of seq1 and 5’ end of seq2 ends are sticky and compatible for ligation. Return 0 if they are not compatible.

pydna.utils.limit_iterator(iterator, limit)[source]: Call the function with an iterator to raise an error if the number of items is greater than the limit.

pydna.utils.create_location(start: int, end: int, lim: int, strand: int | None = None) → Location[source]

Create a location object from a start and end position. If the end position is less than the start position, the location is circular. It handles negative positions.

Parameters:

start (int) – The start position of the location.
end (int) – The end position of the location.
lim (int) – The length of the sequence.
strand (int, optional) – The strand of the location. None, 1 or -1.

Returns:

location – The location object. Can be a SimpleLocation or a CompoundLocation if the feature spans the origin of a circular sequence.

Return type:

Location

Examples

>>> from pydna.utils import create_location
>>> str(create_location(0, 5, 10,-1))
'[0:5](-)'
>>> str(create_location(0, 5, 10,+1))
'[0:5](+)'
>>> str(create_location(0, 5, 10))
'[0:5]'
>>> str(create_location(8, 2, 10))
'join{[8:10], [0:2]}'
>>> str(create_location(8, 2, 10,-1))
'join{[0:2](-), [8:10](-)}'
>>> str(create_location(-2, 2, 10))
'join{[8:10], [0:2]}'

Note this special case, 0 is the same as len(seq) >>> str(create_location(5, 0, 10)) ‘[5:10]’

Note the special case where if start and end are the same, the location spans the entire sequence (it’s not empty). >>> str(create_location(5, 5, 10)) ‘join{[5:10], [0:5]}’