# Designing Primers for a Kozak sequence library

```python %%capture # Install pydna (only when running on Colab) import sys if 'google.colab' in sys.modules: %pip install pydna[clipboard,download,express,gel] teemi ``` > ⚠️ This notebook uses the extra dependency `teemi`. Run it in google Colab, or in an environment where you install `teemi` as well as `pydna`. In this notebook we explore the combinatorial space of the most abundant kozak sequences and make repair-primers for the experiments. We will use pydna to simulate and the CRIPSR experiment and the homology directed repair with the oligoes we make. ### About Kozak sequences Kozak sequences are short 5′-end hexamer motifs (often GCC(A/G)CC) flanking the start codon in eukaryotic mRNAs that enhance translation initiation. They are interesting because they help determine how efficiently a gene is translated into protein—strong Kozak motifs boost translation, while weak ones can limit it. This makes them important for gene regulation in biotechnology. In this case imagine we are using K. phaffi or your favourite protein production host. Happy bioengineering ### Combinatorial space Now, we want to limit our search space therefore we restrict the compinatorial space to the following combinations which were the most abundant nucleotides in the PWM analysis ```python position1 = ["C", "T", "A"] position2 = ["C", "A", "G"] position3 = [ "A", "G"] position4 = ["C", "T", "A"] position5 = ["C", "G"] position6 = ["C"] nucleotide_list = [position1, position2, position3, position4, position5, position6] ``` ```python from teemi.design.combinatorial_design import get_combinatorial_list ``` ```python # make all combinations kozak = get_combinatorial_list(nucleotide_list) print(f'{len(kozak)} combinations generated') kozak[:5] ``` 108 combinations generated [('C', 'C', 'A', 'C', 'C', 'C'), ('C', 'C', 'A', 'C', 'G', 'C'), ('C', 'C', 'A', 'T', 'C', 'C'), ('C', 'C', 'A', 'T', 'G', 'C'), ('C', 'C', 'A', 'A', 'C', 'C')] ```python # Make them into strings def make_to_string(list_of_list): all_combinations_as_str = [] nuc_seq = '' for sp in list_of_list: for seq in sp: nuc_seq += seq all_combinations_as_str.append(nuc_seq) nuc_seq = '' return all_combinations_as_str all_combinations_as_str = make_to_string(kozak) all_combinations_as_str[:5] ``` ['CCACCC', 'CCACGC', 'CCATCC', 'CCATGC', 'CCAACC'] # Making primers for homology directed repair with oligoes This is a dummy example where we wanna test kozak sequences for GFP gene that have been integrated into K. phaffi ```python from pydna.dseqrecord import Dseqrecord from pydna.crispr import cas9, protospacer from pydna.genbank import Genbank # initalize your favourite gene gb = Genbank("myself@email.com") # Tell Genbank who you are! gene = gb.nucleotide("LN515608.1") # Synthetic construct for Aequorea victoria partial gfp gene for GFP target_dseq = Dseqrecord(gene) print(target_dseq) ``` Dseqrecord circular: False size: 735 ID: LN515608.1 Name: LN515608 Description: Synthetic construct for Aequorea victoria partial gfp gene for GFP Number of features: 4 /molecule_type=DNA /topology=linear /data_file_division=SYN /date=03-MAR-2015 /accessions=['LN515608'] /sequence_version=1 /keywords=[''] /source=synthetic construct /organism=synthetic construct /taxonomy=['other sequences', 'artificial sequences'] /references=[Reference(title='XerC-mediated DNA inversion at the inverted repeats of the UU172-phase-variable element of Ureaplasma parvum serovar 3', ...), Reference(title='Direct Submission', ...)] Dseq(-735) AGTA..CTAG TCAT..GATC ```python promoter_region = Dseqrecord('GACGCACCAATCTAGCACAGGCACAGTGTTAACTAGATCTCAACCCTTACCCAAGTCAGAGCCGCAGAGATTGGCAACAAACTCTAGAAACCCGGGGCACGAGGACAATATGAGCTGTGCAGGCTGGTCGAGACTCGTCTAGTTGGTATTACGGTACTAGACGTCGTTGTATCCTTAGGGGACTAGAGTCAGGTAGGTAATAGGGGGTTCCCCTATCTATTATATTTAACTAGTGATACCTTCTCGAACTGTGTGAGCTGCTGCCTCAGCGAATTTCGTTCTGGACCggTACGTGTGT') full_seq = promoter_region + target_dseq full_seq ``` Dseqrecord(-1033) this sgRNA we know works super well from previous experiments ```python sgRNA = 'AGCGAATTTCGTTCTGGAC' ``` Let's simulate how it cuts our construct. ```python # Choose guides guide = ["AGCGAATTTCGTTCTGGAC"] # Create an enzyme object with the protospacer enzyme = cas9(guide[0]) # Simulate the cut with enzyme1 print('cutting with guide:', full_seq.cut(enzyme)) ``` cutting with guide: (Dseqrecord(-284), Dseqrecord(-749)) Let's desing the 5 prime and 3 primer end of the repair oligoes. We want to keep them at around ~ 60 to make it more affordable to synthesize. ```python # This is 6 bases directly upstream of the gene five_prime = promoter_region[-50:-20] # This is directly downstream of the gene three_prime = target_dseq[:30] print(f'Five prime end of the repair oligos is {len(five_prime)} bases') print(f'Three prime end of the repair oligos is {len(three_prime)} bases') ``` Five prime end of the repair oligos is 30 bases Three prime end of the repair oligos is 30 bases ```python # Making these into a list five_prime_list = [[str(five_prime.seq)]] * len(all_combinations_as_str) three_prime_list = [str(three_prime.seq)] * len(all_combinations_as_str) # to string five_prime_list = make_to_string(five_prime_list) three_prime_list = make_to_string(three_prime_list) # making a dataframe import pandas as pd my_dict = {'five':five_prime_list, "kozak":all_combinations_as_str, "three":three_prime_list} kozak_df = pd.DataFrame(my_dict) kozak_df['primer'] = kozak_df['five'] + kozak_df['kozak'] + kozak_df['three'] kozak_df ```

	five	kozak	three	primer
0	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	CCACCC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCACCCAGTAAAGGAG...
1	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	CCACGC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCACGCAGTAAAGGAG...
2	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	CCATCC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCATCCAGTAAAGGAG...
3	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	CCATGC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCATGCAGTAAAGGAG...
4	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	CCAACC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCAACCAGTAAAGGAG...
...	...	...	...	...
103	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	AGGCGC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGCGCAGTAAAGGAG...
104	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	AGGTCC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGTCCAGTAAAGGAG...
105	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	AGGTGC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGTGCAGTAAAGGAG...
106	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	AGGACC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGACCAGTAAAGGAG...
107	CTGTGTGAGCTGCTGCCTCAGCGAATTTCG	AGGAGC	AGTAAAGGAGAAGAACTTTTCACTGGAGTT	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGAGCAGTAAAGGAG...

108 rows × 4 columns

```python from Bio.Seq import Seq # making them into pydna Dseqrecord oligos = [] for i, row in kozak_df.iterrows(): repair_oligo = Dseqrecord( Seq(row['primer']), id=f"kozak_repair_oligo_{i+1}", name=f"Repair oligo {i+1} for kozak experiment", description="Designed repair oligo from DataFrame" ) oligos.append(repair_oligo) ``` ```python print(f'{oligos[0].name} : {repair_oligo.seq} ') print(f'{oligos[0].name} length : {len(repair_oligo.seq)} ') ``` Repair oligo 1 for kozak experiment : CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGAGCAGTAAAGGAGAAGAACTTTTCACTGGAGTT Repair oligo 1 for kozak experiment length : 66 ```python from pydna.assembly2 import in_vivo_assembly # Example: Loop through all repair oligos and assemble each with full_sequence assembled_KOs = [] for i, repair_oligo in enumerate(oligos, 1): products = in_vivo_assembly((full_seq.cut(enzyme)[0], repair_oligo, full_seq.cut(enzyme)[1]), limit=30) assembled_KOs.append(products[0]) # Now assembled_KOs contains all your assemblies for p in products: print(p) ``` Dseqrecord circular: False size: 1019 ID: id Name: name Description: description Number of features: 4 /molecule_type=DNA Dseq(-1019) GACG..CTAG CTGC..GATC # They seem to repair as expected - so lets buy them from IDT ```python oligo_data = [] for oligo in oligos: oligo_data.append({ "Oligo Name": oligo.id, "Sequence": str(oligo.seq), # Optional IDT columns (set your preferred defaults) "Scale": "25nm", "Purification": "STD" }) idt_df = pd.DataFrame(oligo_data) idt_df ```

	Oligo Name	Sequence	Scale	Purification
0	kozak_repair_oligo_1	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCACCCAGTAAAGGAG...	25nm	STD
1	kozak_repair_oligo_2	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCACGCAGTAAAGGAG...	25nm	STD
2	kozak_repair_oligo_3	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCATCCAGTAAAGGAG...	25nm	STD
3	kozak_repair_oligo_4	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCATGCAGTAAAGGAG...	25nm	STD
4	kozak_repair_oligo_5	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGCCAACCAGTAAAGGAG...	25nm	STD
...	...	...	...	...
103	kozak_repair_oligo_104	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGCGCAGTAAAGGAG...	25nm	STD
104	kozak_repair_oligo_105	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGTCCAGTAAAGGAG...	25nm	STD
105	kozak_repair_oligo_106	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGTGCAGTAAAGGAG...	25nm	STD
106	kozak_repair_oligo_107	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGACCAGTAAAGGAG...	25nm	STD
107	kozak_repair_oligo_108	CTGTGTGAGCTGCTGCCTCAGCGAATTTCGAGGAGCAGTAAAGGAG...	25nm	STD

108 rows × 4 columns