One Thousand Microbial Genomes

GenBank, the central public database for genomic information, just completed its thousandth microbial genome. An interesting Nature News article discusses how this isn’t nearly enough for many evolutionary scientists and microbiologists. According to one scientist: “There have been four billion years of evolution and we can really benefit from having some of that information in our databases.”
There are a few interesting projects going on now to try and sample more of that diversity, including:
- The Human Microbiome Project, which aims to sequence and characterize all of the bacteria living in or on the human body.
- The Beijing Genomics Institute’s Ten Thousand Microbial Genomes Project, which aims to sequence, you guessed it, ten thousand strains of bacteria, archaea, fungi, algae, and viruses from many different environments in China.
- The DOE Joint Genome Institute ambitious plan to sequence all 15,000 known bacterial strains that can survive in the laboratory.
- The BioWeatherMap Initiative which hopes to use a crowd-sourced approach to collecting and sequencing environmental samples from all over, in order to understand the distribution and spread of bacteria and viruses in our world.
- Not for microbes, but an interesting project nonetheless, the Personal Genome Project aims to sequence the genomes of thousands of individual people, in order to improve sequencing technology and better understand how our genetic code affects our health.
What will all of this information be used for? From Nature News:
All these new genomes should improve researchers’ understanding of the evolution, physiology and metabolic capacity of microbes, says Eisen. They will also help match DNA sequences to their proper species from large-scale, high-throughput metagenomic studies from environmental samples, and ultimately contribute in the fields of synthetic biology and genetic engineering.
In synthetic biology, we use genomic information as a starting off point for the design of new biological pathways, and more genomic data tied to more evolutionary, metabolic, and biochemical information will vastly improve our ability to design new systems. Using gene synthesis technology, we can make genes that come from any organism, even those that cannot be cultured in the lab, and express it in an organism that is easy and safe to work with in the lab, like E. coli. We can thus pick and choose genes out of the genomic data from multiple organisms in order to design a pathway the way we want.
Most of the time, however, we can’t really tell how the genes will work together from only the sequence information. We can identify genes that do the same thing in different species, but which one will work the best in our chosen strain? At this point, the best way to know is still trial-and-error. A great recent paper by Travis Bayer from Chris Voigt’s lab used just such an approach to find which of 89 enzymes was the best at producing methyl halides (an important precursor for the production of fuels and other valuable chemicals) in E. coli. They started by mining the database of sequenced genomes for enzymes that matched known methyl halide thransferases, synthesizing the genes of a large representative set, and then testing all of them until they found the best. This kind of large-scale synthetic approach, in conjunction with more and better genomic data and bioinformatics tools, will be vital to the ability of synthetic biology to achieve its goals.