pydna.utils
Miscellaneous functions.
- pydna.utils.three_frame_orfs(dna: str, limit: int = 100, startcodons: tuple = ('ATG',), stopcodons: tuple = ('TAG', 'TAA', 'TGA'))[source]
Overlapping orfs in three frames.
- pydna.utils.smallest_rotation(s)[source]
Smallest rotation of a string.
Algorithm described in Pierre Duval, Jean. 1983. Factorizing Words over an Ordered Alphabet. Journal of Algorithms & Computational Technology 4 (4) (December 1): 363–381. and Algorithms on strings and sequences based on Lyndon words, David Eppstein 2011. https://gist.github.com/dvberkel/1950267
Examples
>>> from pydna.utils import smallest_rotation >>> smallest_rotation("taaa") 'aaat'
- pydna.utils.identifier_from_string(s: str) str [source]
Return a valid python identifier.
based on the argument s or an empty string
- pydna.utils.flatten(*args) List [source]
Flattens an iterable of iterables.
Down to str, bytes, bytearray or any of the pydna or Biopython seq objects
- pydna.utils.seq31(seq)[source]
Turn a three letter code protein sequence into one with one letter code.
The single input argument ‘seq’ should be a protein sequence using single letter codes, as a python string.
This function returns the amino acid sequence as a string using the one letter amino acid codes. Output follows the IUPAC standard (including ambiguous characters B for “Asx”, J for “Xle” and X for “Xaa”, and also U for “Sel” and O for “Pyl”) plus “Ter” for a terminator given as an asterisk.
Any unknown character (including possible gap characters), is changed into ‘Xaa’.
Examples
>>> from Bio.SeqUtils import seq3 >>> seq3("MAIVMGRWKGAR*") 'MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer' >>> from pydna.utils import seq31 >>> seq31('MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer') 'M A I V M G R W K G A R *'
- pydna.utils.eq(*args, **kwargs)[source]
Compare two or more DNA sequences for equality.
Compares two or more DNA sequences for equality i.e. if they represent the same double stranded DNA molecule.
- Parameters:
- Returns:
eq – Returns True or False
- Return type:
Notes
Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.
Two linear sequences are considiered equal if either:
They have the same sequence (case insensitive)
One sequence is the reverse complement of the other
Two circular sequences are considered equal if they are circular permutations meaning that they have the same length and:
One sequence can be found in the concatenation of the other sequence with itself.
The reverse complement of one sequence can be found in the concatenation of the other sequence with itself.
The topology for the comparison can be set using one of the keywords linear or circular to True or False.
If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).
Examples
>>> from pydna.dseqrecord import Dseqrecord >>> from pydna.utils import eq >>> eq("aaa","AAA") True >>> eq("aaa","AAA","TTT") True >>> eq("aaa","AAA","TTT","tTt") True >>> eq("aaa","AAA","TTT","tTt", linear=True) True >>> eq("Taaa","aTaa", linear = True) False >>> eq("Taaa","aTaa", circular = True) True >>> a=Dseqrecord("Taaa") >>> b=Dseqrecord("aTaa") >>> eq(a,b) False >>> eq(a,b,circular=True) True >>> a=a.looped() >>> b=b.looped() >>> eq(a,b) True >>> eq(a,b,circular=False) False >>> eq(a,b,linear=True) False >>> eq(a,b,linear=False) True >>> eq("ggatcc","GGATCC") True >>> eq("ggatcca","GGATCCa") True >>> eq("ggatcca","tGGATCC") True
- pydna.utils.locations_overlap(loc1: SimpleLocation | CompoundLocation, loc2: SimpleLocation | CompoundLocation, seq_len)[source]
- pydna.utils.sum_is_sticky(three_prime_end: tuple[str, str], five_prime_end: tuple[str, str], partial: bool = False) int [source]
Return the overlap length if the 3’ end of seq1 and 5’ end of seq2 ends are sticky and compatible for ligation. Return 0 if they are not compatible.
- pydna.utils.limit_iterator(iterator, limit)[source]
Call the function with an iterator to raise an error if the number of items is greater than the limit.
- pydna.utils.create_location(start: int, end: int, lim: int, strand: int | None = None) Location [source]
Create a location object from a start and end position. If the end position is less than the start position, the location is circular. It handles negative positions.
- Parameters:
- Returns:
location – The location object. Can be a SimpleLocation or a CompoundLocation if the feature spans the origin of a circular sequence.
- Return type:
Location
Examples
>>> from pydna.utils import create_location >>> str(create_location(0, 5, 10,-1)) '[0:5](-)' >>> str(create_location(0, 5, 10,+1)) '[0:5](+)' >>> str(create_location(0, 5, 10)) '[0:5]' >>> str(create_location(8, 2, 10)) 'join{[8:10], [0:2]}' >>> str(create_location(8, 2, 10,-1)) 'join{[0:2](-), [8:10](-)}' >>> str(create_location(-2, 2, 10)) 'join{[8:10], [0:2]}'
Note this special case, 0 is the same as len(seq) >>> str(create_location(5, 0, 10)) ‘[5:10]’
Note the special case where if start and end are the same, the location spans the entire sequence (it’s not empty). >>> str(create_location(5, 5, 10)) ‘join{[5:10], [0:5]}’