pydna.genbank

This module provides a class for downloading sequences from genbank called Genbank and an function that does the same thing called genbank.

The function can be used if the environmental variable pydna_email has been set to a valid email address. The easiest way to do this permanantly is to edit the pydna.ini file. See the documentation of pydna.open_config_folder()

class pydna.genbank.Genbank(users_email: str, *, tool: str = 'pydna')[source]

Bases: object

Class to facilitate download from genbank. It is easier and quicker to use the pydna.genbank.genbank() function directly.

Parameters:

users_email (string) – Has to be a valid email address. You should always tell Genbanks who you are, so that they can contact you.

Examples

>>> from pydna.genbank import Genbank
>>> gb=Genbank("bjornjobb@gmail.com")
>>> rec = gb.nucleotide("LP002422.1")   # <- entry from genbank
>>> print(len(rec))
1
nucleotide(item: str, seq_start: int | None = None, seq_stop: int | None = None, strand: Literal[1, 2] = 1) GenbankRecord[source]

This method downloads a genbank nuclotide record from genbank. This method is cached by default. This can be controlled by editing the pydna_cached_funcs environment variable. The best way to do this permanently is to edit the edit the pydna.ini file. See the documentation of pydna.open_config_folder()

Item is a string containing one genbank accession number for a nucleotide file. Genbank nucleotide accession numbers have this format:

A12345 = 1 letter + 5 numerals
AB123456 = 2 letters + 6 numerals

The accession number is sometimes followed by a point and version number

BK006936.2

Item can also contain optional interval information in the following formats:

BK006936.2 REGION: complement(613900..615202)
NM_005546 REGION: 1..100
NM_005546 REGION: complement(1..100)
21614549:1-100
21614549:c100-1
21614549 1-100
21614549 c100-1

It is useful to set an interval for large genbank records to limit the download time. The items above containing interval information and can be obtained directly by looking up an entry in Genbank and setting the Change region shown on the upper right side of the page. The ACCESSION line of the displayed Genbank file will have the formatting shown.

Alternatively, seq_start and seq_stop can be set explicitly to the sequence intervals to be downloaded.

If strand is 2. “c”, “C”, “crick”, “Crick”, “antisense”,”Antisense”, “2”, 2, “-” or “-1”, the antisense (Crick) strand is returned, otherwise the sense (Watson) strand is returned.

Result is returned as a pydna.genbankrecord.GenbankRecord object.

References

pydna.genbank.genbank(accession: str = 'CS570233.1', *args, **kwargs) GenbankRecord[source]

Download a genbank nuclotide record.

This function takes the same paramenters as the :func:pydna.genbank.Genbank.nucleotide method. The email address stored in the pydna_email environment variable is used. The easiest way set this permanantly is to edit the pydna.ini file. See the documentation of pydna.open_config_folder()

if no accession is given, a very short Genbank entry is used as an example (see below). This can be useful for testing the connection to Genbank.

Please note that this result is also cached by default by settings in the pydna.ini file. See the documentation of pydna.open_config_folder()

LOCUS       CS570233                  14 bp    DNA     linear   PAT 18-MAY-2007
DEFINITION  Sequence 6 from Patent WO2007025016.
ACCESSION   CS570233
VERSION     CS570233.1
KEYWORDS    .
SOURCE      synthetic construct
  ORGANISM  synthetic construct
            other sequences; artificial sequences.
REFERENCE   1
  AUTHORS   Shaw,R.W. and Cottenoir,M.
  TITLE     Inhibition of metallo-beta-lactamase by double-stranded dna
  JOURNAL   Patent: WO 2007025016-A1 6 01-MAR-2007;
            Texas Tech University System (US)
FEATURES             Location/Qualifiers
     source          1..14
                     /organism="synthetic construct"
                     /mol_type="unassigned DNA"
                     /db_xref="taxon:32630"
                     /note="This is a 14bp aptamer inhibitor."
ORIGIN
        1 atgttcctac atga
//