pydna.assembly2

Improved implementation of the assembly module. To see a list of issues with the previous implementation, see [issues tagged with fixed-with-new-assembly-model](https://github.com/pydna-group/pydna/issues?q=is%3Aissue%20state%3Aopen%20label%3Afixed-with-new-assembly-model)

pydna.assembly2.gather_overlapping_locations(locs: list[Location], fragment_length: int) list[tuple[Location, ...]][source]

Turn a list of locations into a list of tuples of those locations, where each tuple contains locations that overlap. For example, if locs = [loc1, loc2, loc3], and loc1 and loc2 overlap, the output will be [(loc1, loc2), (loc3,)].

pydna.assembly2.ends_from_cutsite(cutsite: Tuple[Tuple[int, int], _AbstractCut | None], seq: Dseq) tuple[tuple[str, str], tuple[str, str]][source]

Get the sticky or blunt ends created by a restriction enzyme cut.

Parameters:
  • cutsite (CutSiteType) – A tuple ((cut_watson, ovhg), enzyme) describing where the cut occurs

  • seq (_Dseq) – The DNA sequence being cut

Raises:

ValueError – If cutsite is None

Returns:

A tuple of two tuples, each containing the type of end (‘5’’, ‘3’’, or ‘blunt’) and the sequence of the overhang. The first tuple is for the left end, second for the right end.

Return type:

tuple[tuple[str, str], tuple[str, str]]

>>> from Bio.Restriction import NotI
>>> x = _Dseq("ctcgGCGGCCGCcagcggccg")
>>> x.get_cutsites(NotI)
[((6, -4), NotI)]
>>> ends_from_cutsite(x.get_cutsites(NotI)[0], x)
(("5'", 'ggcc'), ("5'", 'ggcc'))
pydna.assembly2.restriction_ligation_overlap(seqx: Dseqrecord, seqy: Dseqrecord, enzymes=RestrictionBatch, partial=False, allow_blunt=False) list[Tuple[int, int, int]][source]

Assembly algorithm to find overlaps that would result from restriction and ligation.

Like in sticky and gibson, the order matters (see example below of partial overlap)

Parameters:
  • seqx (_Dseqrecord) – The first sequence

  • seqy (_Dseqrecord) – The second sequence

  • enzymes (RestrictionBatch) – The enzymes to use

  • partial (bool) – Whether to allow partial overlaps

  • allow_blunt (bool) – Whether to allow blunt ends

Returns:

A list of overlaps between the two sequences

Return type:

list[SequenceOverlap]

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.assembly2 import restriction_ligation_overlap
>>> from Bio.Restriction import EcoRI, RgaI, DrdI, EcoRV
>>> x = Dseqrecord("ccGAATTCaa")
>>> y = Dseqrecord("aaaaGAATTCgg")
>>> restriction_ligation_overlap(x, y, [EcoRI])
[(3, 5, 4)]
>>> restriction_ligation_overlap(y, x, [EcoRI])
[(5, 3, 4)]

Partial overlap, note how it is not symmetric

>>> x = Dseqrecord("GACTAAAGGGTC")
>>> y = Dseqrecord("AAGCGATCGCAAGCGATCGCAA")
>>> restriction_ligation_overlap(x, y, [RgaI, DrdI], partial=True)
[(6, 5, 1), (6, 15, 1)]
>>> restriction_ligation_overlap(y, x, [RgaI, DrdI], partial=True)
[]

Blunt overlap, returns length of the overlap 0

>>> x = Dseqrecord("aaGATATCcc")
>>> y = Dseqrecord("ttttGATATCaa")
>>> restriction_ligation_overlap(x, y, [EcoRV], allow_blunt=True)
[(5, 7, 0)]
>>> restriction_ligation_overlap(y, x, [EcoRV], allow_blunt=True)
[(7, 5, 0)]
pydna.assembly2.combine_algorithms(*algorithms: Callable[[Dseqrecord, Dseqrecord, int], list[Tuple[int, int, int]]]) Callable[[Dseqrecord, Dseqrecord, int], list[Tuple[int, int, int]]][source]

Combine assembly algorithms, if any of them returns a match, the match is returned.

This can be used for example in a ligation where you want to allow both sticky and blunt end ligation.

pydna.assembly2.blunt_overlap(seqx: Dseqrecord, seqy: Dseqrecord, limit=None) list[Tuple[int, int, int]][source]

Assembly algorithm to find blunt overlaps. Used for blunt ligation.

It basically returns [(len(seqx), 0, 0)] if the right end of seqx is blunt and the left end of seqy is blunt (compatible with blunt ligation). Otherwise, it returns an empty list.

Parameters:
  • seqx (_Dseqrecord) – The first sequence

  • seqy (_Dseqrecord) – The second sequence

  • limit (int) – There for compatibility, but it is ignored

Returns:

A list of overlaps between the two sequences

Return type:

list[SequenceOverlap]

>>> from pydna.assembly2 import blunt_overlap
>>> from pydna.dseqrecord import Dseqrecord
>>> x = Dseqrecord("AAAAAA")
>>> y = Dseqrecord("TTTTTT")
>>> blunt_overlap(x, y)
[(6, 0, 0)]
pydna.assembly2.common_sub_strings(seqx: Dseqrecord, seqy: Dseqrecord, limit=25) list[Tuple[int, int, int]][source]

Assembly algorithm to find common substrings of length == limit. see the docs of the function common_sub_strings_str for more details. It is case insensitive.

>>> from pydna.dseqrecord import Dseqrecord
>>> x = Dseqrecord("TAAAAAAT")
>>> y = Dseqrecord("CCaAaAaACC")
>>> common_sub_strings(x, y, limit=5)
[(1, 2, 6), (1, 3, 5), (2, 2, 5)]
pydna.assembly2.gibson_overlap(seqx: Dseqrecord, seqy: Dseqrecord, limit=25)[source]

Assembly algorithm to find terminal overlaps (e.g. for Gibson assembly). The order matters, we want alignments like:

``` seqx: oooo——xxxx seqy: xxxx——oooo Product: oooo——xxxx——oooo

Not like:

seqx: oooo——xxxx seqy: xxxx——oooo Product (unwanted): oooo ```

Parameters:
  • seqx (_Dseqrecord) – The first sequence

  • seqy (_Dseqrecord) – The second sequence

  • limit (int) – Minimum length of the overlap

Returns:

A list of overlaps between the two sequences

Return type:

list[SequenceOverlap]

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.assembly2 import gibson_overlap
>>> x = Dseqrecord("ttactaAAAAAA")
>>> y = Dseqrecord("AAAAAAcgcacg")
>>> gibson_overlap(x, y, limit=5)
[(6, 0, 6), (7, 0, 5)]
>>> gibson_overlap(y, x, limit=5)
[]
pydna.assembly2.sticky_end_sub_strings(seqx: Dseqrecord, seqy: Dseqrecord, limit: bool = False)[source]

Assembly algorithm for ligation of sticky ends.

For now, if limit 0 / False (default) only full overlaps are considered. Otherwise, partial overlaps are also returned.

Parameters:
  • seqx (_Dseqrecord) – The first sequence

  • seqy (_Dseqrecord) – The second sequence

  • limit (bool) – Whether to allow partial overlaps

Returns:

A list of overlaps between the two sequences

Return type:

list[SequenceOverlap]

Ligation of fully overlapping sticky ends, note how the order matters

>>> from pydna.dseq import Dseq
>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.assembly2 import sticky_end_sub_strings
>>> x = Dseqrecord(Dseq.from_full_sequence_and_overhangs("AAAAAA", 0, 3))
>>> y = Dseqrecord(Dseq.from_full_sequence_and_overhangs("AAAAAA", 3, 0))
>>> sticky_end_sub_strings(x, y, limit=False)
[(3, 0, 3)]
>>> sticky_end_sub_strings(y, x, limit=False)
[]

Ligation of partially overlapping sticky ends, specified with limit=True

>>> x = Dseqrecord(Dseq.from_full_sequence_and_overhangs("AAAAAA", 0, 2))
>>> y = Dseqrecord(Dseq.from_full_sequence_and_overhangs("AAAAAA", 3, 0))
>>> sticky_end_sub_strings(x, y, limit=False)
[]
>>> sticky_end_sub_strings(x, y, limit=True)
[(4, 0, 2)]
pydna.assembly2.zip_match_leftwards(seqx: SeqRecord, seqy: SeqRecord, match: Tuple[int, int, int]) Tuple[int, int, int][source]

Starting from the rightmost edge of the match, return a new match encompassing the max number of bases. This can be used to return a longer match if a primer aligns for longer than the limit or a shorter match if there are mismatches. This is convenient to maintain as many features as possible. It is used in PCR assembly.

>>> seq = _Dseqrecord('AAAAACGTCCCGT')
>>> primer = _Dseqrecord('ACGTCCCGT')
>>> match = (13, 9, 0) # an empty match at the end of each
>>> zip_match_leftwards(seq, primer, match)
(4, 0, 9)

Works in circular molecules if the match spans the origin: >>> seq = _Dseqrecord(‘TCCCGTAAAAACG’, circular=True) >>> primer = _Dseqrecord(‘ACGTCCCGT’) >>> match = (6, 9, 0) >>> zip_match_leftwards(seq, primer, match) (10, 0, 9)

pydna.assembly2.zip_match_rightwards(seqx: Dseqrecord, seqy: Dseqrecord, match: Tuple[int, int, int]) Tuple[int, int, int][source]

Same as zip_match_leftwards, but towards the right.

pydna.assembly2.seqrecord2_uppercase_DNA_string(seqr: SeqRecord) str[source]

Transform a Dseqrecord to a sequence string where U is replaced by T, everything is upper case and circular sequences are repeated twice. This is used for PCR, to support primers with U’s (e.g. for USER cloning).

pydna.assembly2.primer_template_overlap(seqx: Dseqrecord | Primer, seqy: Dseqrecord | Primer, limit=25, mismatches=0) list[Tuple[int, int, int]][source]

Assembly algorithm to find overlaps between a primer and a template. It accepts mismatches. When there are mismatches, it only returns the common part between the primer and the template.

If seqx is a primer and seqy is a template, it represents the binding of a forward primer. If seqx is a template and seqy is a primer, it represents the binding of a reverse primer, where the primer has been passed as its reverse complement (see examples).

Parameters:
  • seqx (_Dseqrecord | _Primer) – The primer

  • seqy (_Dseqrecord | _Primer) – The template

  • limit (int) – Minimum length of the overlap

  • mismatches (int) – Maximum number of mismatches (only substitutions, no deletion or insertion)

Returns:

A list of overlaps between the primer and the template

Return type:

list[SequenceOverlap]

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.primer import Primer
>>> from pydna.assembly2 import primer_template_overlap
>>> template = Dseqrecord("AATTAGCAGCGATCGAGT", circular=True)
>>> primer = Primer("TTAGCAGC")
>>> primer_template_overlap(primer, template, limit=8, mismatches=0)
[(0, 2, 8)]

This actually represents the binding of the primer GCTGCTAA (reverse complement) >>> primer_template_overlap(template, primer, limit=8, mismatches=0) [(2, 0, 8)] >>> primer_template_overlap(primer, template.reverse_complement(), limit=8, mismatches=0) [] >>> primer_template_overlap(primer.reverse_complement(), template, limit=8, mismatches=0) []

pydna.assembly2.fill_left(seq: Dseq) Dseq[source]

Fill the left overhang of a sequence with the complementary sequence.

pydna.assembly2.fill_right(seq: Dseq) Dseq[source]

Fill the right overhang of a sequence with the complementary sequence.

pydna.assembly2.fill_dseq(seq: Dseq) Dseq[source]

Fill the overhangs of a sequence with the complementary sequence.

pydna.assembly2.reverse_complement_assembly(assembly: list[Tuple[int, int, _Location | None, _Location | None]], fragments: list[Dseqrecord]) list[Tuple[int, int, _Location | None, _Location | None]][source]

Complement an assembly, i.e. reverse the order of the fragments and the orientation of the overlaps.

pydna.assembly2.filter_linear_subassemblies(linear_assemblies: list[list[Tuple[int, int, _Location | None, _Location | None]]], circular_assemblies: list[list[Tuple[int, int, _Location | None, _Location | None]]], fragments: list[Dseqrecord]) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Remove linear assemblies which are sub-assemblies of circular assemblies

pydna.assembly2.remove_subassemblies(assemblies: list[list[Tuple[int, int, _Location | None, _Location | None]]]) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Filter out subassemblies, i.e. assemblies that are contained within another assembly.

For example:

[(1, 2, ‘1[8:14]:2[1:7]’), (2, 3, ‘2[10:17]:3[1:8]’)] [(1, 2, ‘1[8:14]:2[1:7]’)]

The second one is a subassembly of the first one.

pydna.assembly2.assembly2str(assembly: list[Tuple[int, int, _Location | None, _Location | None]]) str[source]

Convert an assembly to a string representation, for example: ((1, 2, [8:14], [1:7]),(2, 3, [10:17], [1:8])) becomes: (‘1[8:14]:2[1:7]’, ‘2[10:17]:3[1:8]’)

The reason for this is that by default, a feature ‘[8:14]’ when present in a tuple is printed to the console as SimpleLocation(ExactPosition(8), ExactPosition(14), strand=1) (very long).

pydna.assembly2.assembly2str_tuple(assembly: list[Tuple[int, int, _Location | None, _Location | None]]) str[source]

Convert an assembly to a string representation, like ((1, 2, [8:14], [1:7]),(2, 3, [10:17], [1:8]))

pydna.assembly2.assembly_has_mismatches(fragments: list[Dseqrecord], assembly: list[Tuple[int, int, _Location | None, _Location | None]]) bool[source]

Check if an assembly has mismatches. This should never happen and if so it returns an error.

pydna.assembly2.assembly_is_circular(assembly: list[Tuple[int, int, _Location | None, _Location | None]], fragments: list[Dseqrecord]) bool[source]

Based on the topology of the locations of an assembly, determine if it is circular. This does not work for insertion assemblies, that’s why assemble takes the optional argument is_insertion.

pydna.assembly2.assemble(fragments: list[Dseqrecord], assembly: list[Tuple[int, int, _Location | None, _Location | None]], is_insertion: bool = False) Dseqrecord[source]

Generate a Dseqrecord from an assembly and a list of fragments.

pydna.assembly2.annotate_primer_binding_sites(input_dseqr: Dseqrecord, fragments: list[Dseqrecord]) Dseqrecord[source]

Annotate the primer binding sites in a Dseqrecord.

pydna.assembly2.edge_representation2subfragment_representation(assembly: list[Tuple[int, int, _Location | None, _Location | None]], is_circular: bool) list[Tuple[int, _Location | None, _Location | None]][source]

Turn this kind of edge representation fragment 1, fragment 2, right edge on 1, left edge on 2 a = [(1, 2, ‘loc1a’, ‘loc2a’), (2, 3, ‘loc2b’, ‘loc3b’), (3, 1, ‘loc3c’, ‘loc1c’)] Into this: fragment 1, left edge on 1, right edge on 1 b = [(1, ‘loc1c’, ‘loc1a’), (2, ‘loc2a’, ‘loc2b’), (3, ‘loc3b’, ‘loc3c’)]

pydna.assembly2.subfragment_representation2edge_representation(assembly: list[Tuple[int, _Location | None, _Location | None]], is_circular: bool) list[Tuple[int, int, _Location | None, _Location | None]][source]

Turn this kind of subfragment representation fragment 1, left edge on 1, right edge on 1 a = [(1, ‘loc1c’, ‘loc1a’), (2, ‘loc2a’, ‘loc2b’), (3, ‘loc3b’, ‘loc3c’)] Into this: fragment 1, fragment 2, right edge on 1, left edge on 2 b = [(1, 2, ‘loc1a’, ‘loc2a’), (2, 3, ‘loc2b’ ‘loc3b’), (3, 1, ‘loc3c’, ‘loc1c’)]

pydna.assembly2.get_assembly_subfragments(fragments: list[Dseqrecord], subfragment_representation: list[Tuple[int, _Location | None, _Location | None]]) list[Dseqrecord][source]

From the fragment representation returned by edge_representation2subfragment_representation, get the subfragments that are joined together.

Subfragments are the slices of the fragments that are joined together

For example: ```

–A–

TACGTAAT

–B–

TCGTAACGA

Gives: TACGTAA / CGTAACGA ` To reproduce: ` a = Dseqrecord(‘TACGTAAT’) b = Dseqrecord(‘TCGTAACGA’) f = Assembly([a, b], limit=5) a0 = f.get_linear_assemblies()[0] print(assembly2str(a0)) a0_subfragment_rep =edge_representation2subfragment_representation(a0, False) for f in get_assembly_subfragments([a, b], a0_subfragment_rep):

print(f.seq)

# prints TACGTAA and CGTAACGA ```

Subfragments: cccccgtatcgtgt, atcgtgtactgtcatattc

pydna.assembly2.extract_subfragment(seq: Dseqrecord, start_location: Location, end_location: Location) Dseqrecord[source]

Extract a subfragment from a sequence for an assembly, given the start and end locations of the subfragment.

pydna.assembly2.is_sublist(sublist: list, my_list: list, my_list_is_cyclic: bool = False) bool[source]

Returns True if argument sublist is a sublist of argument my_list (can be treated as cyclic), False otherwise.

Examples

>>> is_sublist([1, 2], [1, 2, 3], False)
True
>>> is_sublist([1, 2], [1, 3, 2], False)
False

# See the case here for cyclic lists >>> is_sublist([3, 1], [1, 2, 3], False) False >>> is_sublist([3, 1], [1, 2, 3], True) True

pydna.assembly2.circular_permutation_min_abs(lst: list) list[source]

Returns the circular permutation of lst with the smallest absolute value first.

Examples

>>> circular_permutation_min_abs([1, 2, 3])
[1, 2, 3]
>>> circular_permutation_min_abs([3, 1, 2])
[1, 2, 3]
class pydna.assembly2.Assembly(frags: list[Dseqrecord], limit: int = 25, algorithm: Callable[[Dseqrecord, Dseqrecord, int], list[Tuple[int, int, int]]] = common_sub_strings, use_fragment_order: bool = True, use_all_fragments: bool = False)[source]

Bases: object

Assembly of a list of DNA fragments into linear or circular constructs. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.

The assembly contains a directed graph, where nodes represent fragments and edges represent overlaps between fragments. : - The node keys are integers, representing the index of the fragment in the input list of fragments. The sign of the node key represents the orientation of the fragment, positive for forward orientation, negative for reverse orientation. - The edges contain the locations of the overlaps in the fragments. For an edge (u, v, key):

  • u and v are the nodes connected by the edge.

  • key is a string that represents the location of the overlap. In the format:

‘u[start:end](strand):v[start:end](strand)’. - Edges have a ‘locations’ attribute, which is a list of two FeatureLocation objects, representing the location of the overlap in the u and v fragment, respectively. - You can think of an edge as a representation of the join of two fragments.

If fragment 1 and 2 share a subsequence of 6bp, [8:14] in fragment 1 and [1:7] in fragment 2, there will be 4 edges representing that overlap in the graph, for all possible orientations of the fragments (see add_edges_from_match for details): - (1, 2, ‘1[8:14]:2[1:7]’) - (2, 1, ‘2[1:7]:1[8:14]’) - (-1, -2, ‘-1[0:6]:-2[10:16]’) - (-2, -1, ‘-2[10:16]:-1[0:6]’)

An assembly can be thought of as a tuple of graph edges, but instead of representing them with node indexes and keys, we represent them as u, v, locu, locv, where u and v are the nodes connected by the edge, and locu and locv are the locations of the overlap in the first and second fragment. Assemblies are then represented as: - Linear: ((1, 2, [8:14], [1:7]), (2, 3, [10:17], [1:8])) - Circular: ((1, 2, [8:14], [1:7]), (2, 3, [10:17], [1:8]), (3, 1, [12:17], [1:6])) Note that the first and last fragment are the same in a circular assembly.

The following constrains are applied to remove duplicate assemblies: - Circular assemblies: the first subfragment is not reversed, and has the smallest index in the input fragment list.

use_fragment_order is ignored.

  • Linear assemblies:
    • Using uid (see add_edges_from_match) to identify unique edges.

Parameters:
  • frags (list) – A list of Dseqrecord objects.

  • limit (int, optional) – The shortest shared homology to be considered, this is passed as the third argument to the algorithm function. For certain algorithms, this might be ignored.

  • algorithm (function, optional) – The algorithm used to determine the shared sequences. It’s a function that takes two Dseqrecord objects as inputs, and will get passed the third argument (limit), that may or may not be used. It must return a list of overlaps (see common_sub_strings for an example).

  • use_fragment_order (bool, optional) – It’s set to True by default to reproduce legacy pydna behaviour: only assemblies that start with the first fragment and end with the last are considered. You should set it to False.

  • use_all_fragments (bool, optional) – Constrain the assembly to use all fragments.

Examples

from assembly2 import Assembly, assembly2str from pydna.dseqrecord import Dseqrecord

example_fragments = (

Dseqrecord(‘AacgatCAtgctcc’, name=’a’), Dseqrecord(‘TtgctccTAAattctgc’, name=’b’), Dseqrecord(‘CattctgcGAGGacgatG’, name=’c’),

)

asm = Assembly(example_fragments, limit=5, use_fragment_order=False) print(‘Linear ===============’) for assembly in asm.get_linear_assemblies():

print(’ ‘, assembly2str(assembly))

print(‘Circular =============’) for assembly in asm.get_circular_assemblies():

print(’ ‘, assembly2str(assembly))

# Prints Linear ===============

(‘1[8:14]:2[1:7]’, ‘2[10:17]:3[1:8]’) (‘2[10:17]:3[1:8]’, ‘3[12:17]:1[1:6]’) (‘3[12:17]:1[1:6]’, ‘1[8:14]:2[1:7]’) (‘1[1:6]:3[12:17]’,) (‘2[1:7]:1[8:14]’,) (‘3[1:8]:2[10:17]’,)

Circular =============

(‘1[8:14]:2[1:7]’, ‘2[10:17]:3[1:8]’, ‘3[12:17]:1[1:6]’)

classmethod assembly_is_valid(fragments: list[Dseqrecord | Primer], assembly: list[Tuple[int, int, _Location | None, _Location | None]], is_circular: bool, use_all_fragments: bool, is_insertion: bool = False) bool[source]

Returns True if the assembly is valid, False otherwise. See function comments for conditions tested.

add_edges_from_match(match: Tuple[int, int, int], u: int, v: int, first: Dseqrecord, secnd: Dseqrecord)[source]

Add edges to the graph from a match returned by the algorithm function (see pydna.common_substrings). For format of edges (see documentation of the Assembly class).

Matches are directional, because not all algorithm functions return the same match for (u,v) and (v,u). For example, homologous recombination does but sticky end ligation does not. The function returns two edges: - Fragments in the orientation they were passed, with locations of the match (u, v, loc_u, loc_v) - Reverse complement of the fragments with inverted order, with flipped locations (-v, -u, flip(loc_v), flip(loc_u))/

format_assembly_edge(graph_edge: tuple[int, int, str]) Tuple[int, int, _Location | None, _Location | None][source]

Go from the (u, v, key) to the (u, v, locu, locv) format.

get_linear_assemblies(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Get linear assemblies, applying the constrains described in __init__, ensuring that paths represent real assemblies (see assembly_is_valid). Subassemblies are removed (see remove_subassemblies).

node_path2assembly_list(cycle: list[int], circular: bool) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]
Convert a node path in the format [1, 2, 3] (as returned by _nx.cycles.simple_cycles) to a list of all

possible assemblies.

There may be multiple assemblies for a given node path, if there are several edges connecting two nodes, for example two overlaps between 1 and 2, and single overlap between 2 and 3 should return 3 assemblies.

get_unique_linear_paths(G_with_begin_end: MultiDiGraph, max_paths=10000) list[list[int]][source]

Get unique linear paths from the graph, removing those that contain the same node twice.

get_possible_assembly_number(paths: list[list[int]]) int[source]

Get the number of possible assemblies from a list of node paths. Basically, for each path passed as a list of integers / nodes, we calculate the number of paths possible connecting the nodes in that order, given the graph (all the edges connecting them).

get_circular_assemblies(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Get circular assemblies, applying the constrains described in __init__, ensuring that paths represent real assemblies (see assembly_is_valid).

format_insertion_assembly(assembly: list[Tuple[int, int, _Location | None, _Location | None]]) list[Tuple[int, int, _Location | None, _Location | None]] | None[source]

Sorts the fragment representing a cycle so that they represent an insertion assembly if possible, else returns None.

Here we check if one of the joins between fragments represents the edges of an insertion assembly The fragment must be linear, and the join must be as indicated below

format_insertion_assembly_edge_case(assembly: list[Tuple[int, int, _Location | None, _Location | None]]) list[Tuple[int, int, _Location | None, _Location | None]][source]

Edge case from https://github.com/manulera/OpenCloning_backend/issues/329

get_insertion_assemblies(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Assemblies that represent the insertion of a fragment or series of fragment inside a linear construct. For instance, digesting CCCCGAATTCCCCGAATTC with EcoRI and inserting the fragment with two overhangs into the EcoRI site of AAAGAATTCAAA. This is not so much meant for the use-case of linear fragments that represent actual linear fragments, but for linear fragments that represent a genome region. This can then be used to simulate homologous recombination.

assemble_linear(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[Dseqrecord][source]

Assemble linear constructs, from assemblies returned by self.get_linear_assemblies.

assemble_circular(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[Dseqrecord][source]

Assemble circular constructs, from assemblies returned by self.get_circular_assemblies.

assemble_insertion(only_adjacent_edges: bool = False) list[Dseqrecord][source]

Assemble insertion constructs, from assemblies returned by self.get_insertion_assemblies.

get_locations_on_fragments() dict[int, dict[str, list[Location]]][source]

Get a dictionary where the keys are the nodes in the graph, and the values are dictionaries with keys left, right, containing (for each fragment) the locations where the fragment is joined to another fragment on its left and right side. The values in left and right are often the same, except in restriction-ligation with partial overlap enabled, where we can end up with a situation like this:

GGTCTCCCCAATT and aGGTCTCCAACCAA as fragments

# Partial overlap in assembly 1[9:11]:2[8:10] GGTCTCCxxAACCAA CCAGAGGGGTTxxTT

# Partial overlap in 2[10:12]:1[7:9] aGGTCTCCxxCCAATT tCCAGAGGTTGGxxAA

Would return {

1: {‘left’: [7:9], ‘right’: [9:11]}, 2: {‘left’: [8:10], ‘right’: [10:12]}, -1: {‘left’: [2:4], ‘right’: [4:6]}, -2: {‘left’: [2:4], ‘right’: [4:6]}

}

assembly_uses_only_adjacent_edges(assembly, is_circular: bool) bool[source]

Check whether only adjacent edges within each fragment are used in the assembly. This is useful to check if a cut and ligate assembly is valid, and prevent including partially digested fragments. For example, imagine the following fragment being an input for a digestion and ligation assembly, where the enzyme cuts at the sites indicated by the vertical lines:

```

x y z

——-|-------|——-|———

```

We would only want assemblies that contain subfragments start-x, x-y, y-z, z-end, and not start-x, y-end, for instance. The latter would indicate that the fragment was partially digested.

class pydna.assembly2.PCRAssembly(frags: list[Dseqrecord | Primer], limit=25, mismatches=0)[source]

Bases: Assembly

An assembly that represents a PCR, where fragments is a list of primer, template, primer (in that order). It always uses the primer_template_overlap algorithm and accepts the mismatches argument to indicate the number of mismatches allowed in the overlap. Only supports substitution mismatches, not indels.

get_linear_assemblies(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Get linear assemblies, applying the constrains described in __init__, ensuring that paths represent real assemblies (see assembly_is_valid). Subassemblies are removed (see remove_subassemblies).

get_circular_assemblies(only_adjacent_edges: bool = False)[source]

Get circular assemblies, applying the constrains described in __init__, ensuring that paths represent real assemblies (see assembly_is_valid).

get_insertion_assemblies(only_adjacent_edges: bool = False)[source]

Assemblies that represent the insertion of a fragment or series of fragment inside a linear construct. For instance, digesting CCCCGAATTCCCCGAATTC with EcoRI and inserting the fragment with two overhangs into the EcoRI site of AAAGAATTCAAA. This is not so much meant for the use-case of linear fragments that represent actual linear fragments, but for linear fragments that represent a genome region. This can then be used to simulate homologous recombination.

class pydna.assembly2.SingleFragmentAssembly(frags: [<class 'pydna.dseqrecord.Dseqrecord'>], limit=25, algorithm=common_sub_strings)[source]

Bases: Assembly

An assembly that represents the circularisation or splicing of a single fragment.

get_circular_assemblies(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

Get circular assemblies, applying the constrains described in __init__, ensuring that paths represent real assemblies (see assembly_is_valid).

get_insertion_assemblies(only_adjacent_edges: bool = False, max_assemblies: int = 50) list[list[Tuple[int, int, _Location | None, _Location | None]]][source]

This could be renamed splicing assembly, but the essence is similar

get_linear_assemblies()[source]

Get linear assemblies, applying the constrains described in __init__, ensuring that paths represent real assemblies (see assembly_is_valid). Subassemblies are removed (see remove_subassemblies).

pydna.assembly2.common_function_assembly_products(frags: list[Dseqrecord], limit: int | None, algorithm: Callable, circular_only: bool, filter_results_function: Callable | None = None) list[Dseqrecord][source]

Common function to avoid code duplication. Could be simplified further once SingleFragmentAssembly and Assembly are merged.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • limit (int or None) – Minimum overlap length required, or None if not applicable

  • algorithm (Callable) – Function that determines valid overlaps between fragments

  • circular_only (bool) – If True, only return circular assemblies

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.gibson_assembly(frags: list[Dseqrecord], limit: int = 25, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for Gibson assembly.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • limit (int, optional) – Minimum overlap length required, by default 25

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.in_fusion_assembly(frags: list[Dseqrecord], limit: int = 25, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for in-fusion assembly. This is the same as Gibson assembly, but with a different name.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • limit (int, optional) – Minimum overlap length required, by default 25

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.fusion_pcr_assembly(frags: list[Dseqrecord], limit: int = 25, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for fusion PCR assembly. This is the same as Gibson assembly, but with a different name.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • limit (int, optional) – Minimum overlap length required, by default 25

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.in_vivo_assembly(frags: list[Dseqrecord], limit: int = 25, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for in vivo assembly (IVA), which relies on homologous recombination between the fragments.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • limit (int, optional) – Minimum overlap length required, by default 25

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.restriction_ligation_assembly(frags: list[Dseqrecord], enzymes: list[_AbstractCut], allow_blunt: bool = True, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for restriction ligation assembly: * Finds cutsites in the fragments * Finds all products that could be assembled by ligating the fragments based on those cutsites * Will NOT return products that combine an existing end with an end generated by the same enzyme (see example below)

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • enzymes (list[_AbstractCut]) – List of restriction enzymes to use

  • allow_blunt (bool, optional) – If True, allow blunt end ligations, by default True

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

Examples

In the example below, we plan to assemble a plasmid from a backbone and an insert, using the EcoRI and SalI enzymes. Note how 2 circular products are returned, one contains the insert (acgt) and the desired part of the backbone (cccccc), the other contains the reversed insert (tgga) and the cut-out part of the backbone (aaa).

>>> from pydna.assembly2 import restriction_ligation_assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> from Bio.Restriction import EcoRI, SalI
>>> backbone = Dseqrecord("cccGAATTCaaaGTCGACccc", circular=True)
>>> insert = Dseqrecord("ggGAATTCaggtGTCGACgg")
>>> products = restriction_ligation_assembly([backbone, insert], [EcoRI, SalI], circular_only=True)
>>> products[0].seq
Dseq(o22)
TCGACccccccGAATTCaggtG
AGCTGggggggCTTAAGtccaC
>>> products[1].seq
Dseq(o19)
AATTCaaaGTCGACacctG
TTAAGtttCAGCTGtggaC

Note that passing a pre-cut fragment will not work.

>>> restriction_products = insert.cut([EcoRI, SalI])
>>> cut_insert = restriction_products[1]
>>> restriction_ligation_assembly([backbone, cut_insert], [EcoRI, SalI], circular_only=True)
[]

It also works with a single fragment, for circularization:

>>> seq = Dseqrecord("GAATTCaaaGAATTC")
>>> products =restriction_ligation_assembly([seq], [EcoRI])
>>> products[0].seq
Dseq(o9)
AATTCaaaG
TTAAGtttC
pydna.assembly2.golden_gate_assembly(frags: list[Dseqrecord], enzymes: list[_AbstractCut], allow_blunt: bool = True, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for Golden Gate assembly. This is the same as restriction ligation assembly, but with a different name. Check the documentation for restriction_ligation_assembly for more details.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • enzymes (list[_AbstractCut]) – List of restriction enzymes to use

  • allow_blunt (bool, optional) – If True, allow blunt end ligations, by default True

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

Examples

See the example for restriction_ligation_assembly.

pydna.assembly2.ligation_assembly(frags: list[Dseqrecord], allow_blunt: bool = False, allow_partial_overlap: bool = False, circular_only: bool = False) list[Dseqrecord][source]

Returns the products for ligation assembly, as inputs pass the fragments (digested if needed) that will be ligated.

For most cases, you probably should use restriction_ligation_assembly instead.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • allow_blunt (bool, optional) – If True, allow blunt end ligations, by default False

  • allow_partial_overlap (bool, optional) – If True, allow partial overlaps between sticky ends, by default False

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

Examples

In the example below, we plan to assemble a plasmid from a backbone and an insert, using the EcoRI enzyme. The insert and insertion site in the backbone are flanked by EcoRI sites, so there are two possible products depending on the orientation of the insert.

>>> from pydna.assembly2 import ligation_assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> from Bio.Restriction import EcoRI
>>> backbone = Dseqrecord("cccGAATTCaaaGAATTCccc", circular=True)
>>> backbone_cut = backbone.cut(EcoRI)[1]
>>> insert = Dseqrecord("ggGAATTCaggtGAATTCgg")
>>> insert_cut = insert.cut(EcoRI)[1]
>>> products = ligation_assembly([backbone_cut, insert_cut])
>>> products[0].seq
Dseq(o22)
AATTCccccccGAATTCaggtG
TTAAGggggggCTTAAGtccaC
>>> products[1].seq
Dseq(o22)
AATTCccccccGAATTCacctG
TTAAGggggggCTTAAGtggaC
pydna.assembly2.assembly_is_multi_site(asm: list[list[Tuple[int, int, _Location | None, _Location | None]]]) bool[source]

Returns True if the assembly is a multi-site assembly, False otherwise.

pydna.assembly2.gateway_assembly(frags: list[Dseqrecord], reaction_type: str, greedy: bool = False, circular_only: bool = False, multi_site_only: bool = False) list[Dseqrecord][source]

Returns the products for Gateway assembly / Gateway cloning.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to assemble

  • reaction_type (str) – Type of Gateway reaction, either ‘BP’ or ‘LR’

  • greedy (bool, optional) – If True, use greedy gateway consensus sites, by default False

  • circular_only (bool, optional) – If True, only return circular assemblies, by default False

  • multi_site_only (bool, optional) – If True, only return products that where 2 sites recombined. Even if input sequences contain multiple att sites (typically 2), a product could be generated where only one site recombines. That’s typically not what you want, so you can set this to True to only return products where both att sites recombined.

Returns:

List of assembled DNA molecules

Return type:

list[_Dseqrecord]

Examples

Below an example with dummy Gateway sequences, composed with minimal sequences and the consensus att sites.

>>> from pydna.assembly2 import gateway_assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> attB1 = "ACAACTTTGTACAAAAAAGCAGAAG"
>>> attP1 = "AAAATAATGATTTTATTTGACTGATAGTGACCTGTTCGTTGCAACAAATTGATGAGCAATGCTTTTTTATAATGCCAACTTTGTACAAAAAAGCTGAACGAGAAGCGTAAAATGATATAAATATCAATATATTAAATTAGATTTTGCATAAAAAACAGACTACATAATACTGTAAAACACAACATATCCAGTCACTATGAATCAACTACTTAGATGGTATTAGTGACCTGTA"
>>> attR1 = "ACAACTTTGTACAAAAAAGCTGAACGAGAAACGTAAAATGATATAAATATCAATATATTAAATTAGATTTTGCATAAAAAACAGACTACATAATACTGTAAAACACAACATATGCAGTCACTATG"
>>> attL1 = "CAAATAATGATTTTATTTTGACTGATAGTGACCTGTTCGTTGCAACAAATTGATAAGCAATGCTTTCTTATAATGCCAACTTTGTACAAAAAAGCAGGCT"
>>> seq1 = Dseqrecord("aaa" + attB1 + "ccc")
>>> seq2 = Dseqrecord("aaa" + attP1 + "ccc")
>>> seq3 = Dseqrecord("aaa" + attR1 + "ccc")
>>> seq4 = Dseqrecord("aaa" + attL1 + "ccc")
>>> products_BP = gateway_assembly([seq1, seq2], "BP")
>>> products_LR = gateway_assembly([seq3, seq4], "LR")
>>> len(products_BP)
2
>>> len(products_LR)
2

Now let’s understand the multi_site_only parameter. Let’s consider a case where we are swapping fragments between two plasmids using an LR reaction. Experimentally, we expect to obtain two plasmids, resulting from the swapping between the two att sites. That’s what we get if we set multi_site_only to True.

>>> attL2 = 'aaataatgattttattttgactgatagtgacctgttcgttgcaacaaattgataagcaatgctttcttataatgccaactttgtacaagaaagctg'
>>> attR2 = 'accactttgtacaagaaagctgaacgagaaacgtaaaatgatataaatatcaatatattaaattagattttgcataaaaaacagactacataatactgtaaaacacaacatatccagtcactatg'
>>> insert = Dseqrecord("cccccc" + attL1 + "ccc" + attL2 + "cccccc", circular=True)
>>> backbone = Dseqrecord("ttttt" + attR1 + "aaa" + attR2, circular=True)
>>> products = gateway_assembly([insert, backbone], "LR", multi_site_only=True)
>>> len(products)
2

However, if we set multi_site_only to False, we get 4 products, which also include the intermediate products where the two plasmids are combined into a single one through recombination of a single att site. This is an intermediate of the reaction, and typically we don’t want it:

>>> products = gateway_assembly([insert, backbone], "LR", multi_site_only=False)
>>> print([len(p) for p in products])
[469, 237, 232, 469]
pydna.assembly2.common_function_integration_products(frags: list[Dseqrecord], limit: int | None, algorithm: Callable) list[Dseqrecord][source]

Common function to avoid code duplication for integration products.

Parameters:
  • frags (list[_Dseqrecord]) – List of DNA fragments to integrate

  • limit (int or None) – Minimum overlap length required, or None if not applicable

  • algorithm (Callable) – Function that determines valid overlaps between fragments

Returns:

List of integrated DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.common_handle_insertion_fragments(genome: Dseqrecord, inserts: list[Dseqrecord]) list[Dseqrecord][source]

Common function to handle / validate insertion fragments.

Parameters:
  • genome (_Dseqrecord) – Target genome sequence

  • inserts (list[_Dseqrecord] or _Dseqrecord) – DNA fragment(s) to insert

Returns:

List containing genome and insert fragments

Return type:

list[_Dseqrecord]

pydna.assembly2.common_function_excision_products(genome: Dseqrecord, limit: int | None, algorithm: Callable) list[Dseqrecord][source]

Common function to avoid code duplication for excision products.

Parameters:
  • genome (_Dseqrecord) – Target genome sequence

  • limit (int or None) – Minimum overlap length required, or None if not applicable

  • algorithm (Callable) – Function that determines valid overlaps between fragments

Returns:

List of excised DNA molecules

Return type:

list[_Dseqrecord]

pydna.assembly2.homologous_recombination_integration(genome: Dseqrecord, inserts: list[Dseqrecord], limit: int = 40) list[Dseqrecord][source]

Returns the products resulting from the integration of an insert (or inserts joined through in vivo recombination) into the genome through homologous recombination.

Parameters:
  • genome (_Dseqrecord) – Target genome sequence

  • inserts (list[_Dseqrecord]) – DNA fragment(s) to insert

  • limit (int, optional) – Minimum homology length required, by default 40

Returns:

List of integrated DNA molecules

Return type:

list[_Dseqrecord]

Examples

Below an example with a single insert.

>>> from pydna.assembly2 import homologous_recombination_integration
>>> from pydna.dseqrecord import Dseqrecord
>>> homology = "AAGTCCGTTCGTTTTACCTG"
>>> genome = Dseqrecord(f"aaaaaa{homology}ccccc{homology}aaaaaa")
>>> insert = Dseqrecord(f"{homology}gggg{homology}")
>>> products = homologous_recombination_integration(genome, [insert], 20)
>>> str(products[0].seq)
'aaaaaaAAGTCCGTTCGTTTTACCTGggggAAGTCCGTTCGTTTTACCTGaaaaaa'

Below an example with two inserts joined through homology.

>>> homology2 = "ATTACAGCATGGGAAGAAAGA"
>>> insert_1 = Dseqrecord(f"{homology}gggg{homology2}")
>>> insert_2 = Dseqrecord(f"{homology2}cccc{homology}")
>>> products = homologous_recombination_integration(genome, [insert_1, insert_2], 20)
>>> str(products[0].seq)
'aaaaaaAAGTCCGTTCGTTTTACCTGggggATTACAGCATGGGAAGAAAGAccccAAGTCCGTTCGTTTTACCTGaaaaaa'
pydna.assembly2.homologous_recombination_excision(genome: Dseqrecord, limit: int = 40) list[Dseqrecord][source]

Returns the products resulting from the excision of a fragment from the genome through homologous recombination.

Parameters:
  • genome (_Dseqrecord) – Target genome sequence

  • limit (int, optional) – Minimum homology length required, by default 40

Returns:

List containing excised plasmid and remaining genome sequence

Return type:

list[_Dseqrecord]

Examples

Example of a homologous recombination event, where a plasmid is excised from the genome (circular sequence of 25 bp), and that part is removed from the genome, leaving a shorter linear sequence (32 bp).

>>> from pydna.assembly2 import homologous_recombination_excision
>>> from pydna.dseqrecord import Dseqrecord
>>> homology = "AAGTCCGTTCGTTTTACCTG"
>>> genome = Dseqrecord(f"aaaaaa{homology}ccccc{homology}aaaaaa")
>>> products = homologous_recombination_excision(genome, 20)
>>> products
[Dseqrecord(o25), Dseqrecord(-32)]
pydna.assembly2.cre_lox_integration(genome: Dseqrecord, inserts: list[Dseqrecord]) list[Dseqrecord][source]

Returns the products resulting from the integration of an insert (or inserts joined through cre-lox recombination among them) into the genome through cre-lox integration.

Also works with lox66 and lox71 (see pydna.cre_lox for more details).

Parameters:
  • genome (_Dseqrecord) – Target genome sequence

  • inserts (list[_Dseqrecord] or _Dseqrecord) – DNA fragment(s) to insert

Returns:

List of integrated DNA molecules

Return type:

list[_Dseqrecord]

Examples

Below an example of reversible integration and excision.

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.assembly2 import cre_lox_integration, cre_lox_excision
>>> from pydna.cre_lox import LOXP_SEQUENCE
>>> a = Dseqrecord(f"cccccc{LOXP_SEQUENCE}aaaaa")
>>> b = Dseqrecord(f"{LOXP_SEQUENCE}bbbbb", circular=True)
>>> [a, b]
[Dseqrecord(-45), Dseqrecord(o39)]
>>> res = cre_lox_integration(a, [b])
>>> res
[Dseqrecord(-84)]
>>> res2 = cre_lox_excision(res[0])
>>> res2
[Dseqrecord(o39), Dseqrecord(-45)]

Below an example with lox66 and lox71 (irreversible integration). Here, the result of excision is still returned because there is a low probability of it happening, but it’s considered a rare event.

>>> lox66 = 'ATAACTTCGTATAGCATACATTATACGAACGGTA'
>>> lox71 = 'TACCGTTCGTATAGCATACATTATACGAAGTTAT'
>>> a = Dseqrecord(f"cccccc{lox66}aaaaa")
>>> b = Dseqrecord(f"{lox71}bbbbb", circular=True)
>>> res = cre_lox_integration(a, [b])
>>> res
[Dseqrecord(-84)]
>>> res2 = cre_lox_excision(res[0])
>>> res2
[Dseqrecord(o39), Dseqrecord(-45)]
pydna.assembly2.cre_lox_excision(genome: Dseqrecord) list[Dseqrecord][source]

Returns the products for CRE-lox excision.

Parameters:

genome (_Dseqrecord) – Target genome sequence

Returns:

List containing excised plasmid and remaining genome sequence

Return type:

list[_Dseqrecord]

Examples

Below an example of reversible integration and excision.

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.assembly2 import cre_lox_integration, cre_lox_excision
>>> from pydna.cre_lox import LOXP_SEQUENCE
>>> a = Dseqrecord(f"cccccc{LOXP_SEQUENCE}aaaaa")
>>> b = Dseqrecord(f"{LOXP_SEQUENCE}bbbbb", circular=True)
>>> [a, b]
[Dseqrecord(-45), Dseqrecord(o39)]
>>> res = cre_lox_integration(a, [b])
>>> res
[Dseqrecord(-84)]
>>> res2 = cre_lox_excision(res[0])
>>> res2
[Dseqrecord(o39), Dseqrecord(-45)]

Below an example with lox66 and lox71 (irreversible integration). Here, the result of excision is still returned because there is a low probability of it happening, but it’s considered a rare event.

>>> lox66 = 'ATAACTTCGTATAGCATACATTATACGAACGGTA'
>>> lox71 = 'TACCGTTCGTATAGCATACATTATACGAAGTTAT'
>>> a = Dseqrecord(f"cccccc{lox66}aaaaa")
>>> b = Dseqrecord(f"{lox71}bbbbb", circular=True)
>>> res = cre_lox_integration(a, [b])
>>> res
[Dseqrecord(-84)]
>>> res2 = cre_lox_excision(res[0])
>>> res2
[Dseqrecord(o39), Dseqrecord(-45)]