Access Content

Genomic Annotations


In collaboration with colleagues from Princeton University we have been studying the genome of Herpesvirus 5. By visiting this IBM DB2-based repository you can browse through and download the results from our analyses.

Additionally, we make available automatically-generated annotations for the proteomes of more than 120 complete genomes. This genomic repository can be accessed by visiting this web site. Soon, and from the same location, we will be making available a DB2-based system that will allow users to interact with these data and to retrieve information by constructing complex queries.




Bio-Dictionaries for Individual Genomes


In a number of publications, we have presented and discussed the idea of the Bio-Dictionary: the latter is a collection of recurrent amino acid combinations (='seqlets') which completely cover the sequence space defined by the biggest possible collection of amino acid sequences. Normally, we recompute the contents of the Bio-Dictionary on a regular basis, typically once a year. For several research activities, it is also very useful to compute such amino acid combinations by processing smaller input datasets such as the proteome of an individual genome. The following list provides access to Bio-Dictionaries computed from the proteomes of the corresponding genomes:

ARCHAEAL GENOMES
  • M. jannaschii DSM2661 - 1,715 proteins processed (1,318,340 a.a. in total)

  • M. thermoautotrophicum Delta H - 1,869 proteins processed (8,015,019 a.a. in total)

  • A. fulgidus DSM4304 - 2,407 proteins processed (1,350,529 a.a. in total)

  • P. horikoshii OT3 - 2,064 protein processed (1,048,035 a.a. in total)

  • A. pernix K1 - 2,694 proteins processed (3,099,660 a.a. in total)




  • BACTERIAL GENOMES
  • H. influenzae KW20 - 1,709 proteins processed (6,475,343 a.a. in total)

  • M. genitalium G37 - 480 proteins processed (1,048,890 a.a. in total)

  • Synechocystis sp. PCC6803 - 3,169 proteins processed - 2,931,688 in total)

  • M. pneumoniae M129 - 677 proteins (1,257,574 a.a. in total)

  • E. coli K12_MG1655 - 4,289 proteins (5,260,277 a.a. in total)

  • H. pylori 26695 - 1,565 proteins (515,475 a.a. in total)

  • B. burgdorferi B31 - 1,255 proteins (5,685,228 a.a. in total)

  • A. aeolicus VF5 - 1,522 proteins (1,229,662 a.a. in total)

  • T. pallidum Nichols - 1,031 proteins (3,677,786 a.a. in total)

  • R. prowazekii Madrit E - 834 proteins (2,545,129 a.a. in total)

  • C. pneumoniae CWL029 - 1,052 proteins (3,413,340 a.a. in total)

  • T. maritima MSB8 - 1,846 proteins (11,101,845 a.a. in total)