Tapir is currently indexing what is a suffiently representative sample of reference known and annotated DNA for our research and usage of sequencing data. Although we are increasing the size of that reference set relatively quickly (it has been multiplied by 2 every 2-3 months), do not hesitate to contact us if you would like us to immediately add specific reference DNA of interest, or if you would like general information about the site.

We are working on a comprehensive search interface to look up if an organism of interest is already indexed. In the meantime, the following table gives an overview of the sources of reference DNA.

Sources of refererence DNA
  Name # references # DNA bases
HIV 4,053 36,471,153
Phage genomes 1,078 59,538,128
Viral genomes 3,464 64,859,892
Bacterial genomes 747 2,418,028,337
Bacterial genes 5,218,077 4,963,568,551
Bacterial genomes (NCBI) 4,693 8,584,324,670
Viral genomes (NCBI) 1,750 60,637,755
Fungi 202,270 298,736,207
Human Microbiome sequences 1,653,700 1,490,442,185
Plasmids 159,705 132,800,479
Virii 78,630 65,110,952
Homo sapiens (Hg19) 3,134 2,844,000,504
Mus musculus 305 2,745,142,291
Plant (RefSeq) 558,267 8,622,349,159
Invertebrates (Genbank) 1,123,813 18,429,666,992
Protozoa (Genbank) 47,275 1,997,449,553
Fungi (Genbank) 200 242,402,709