Showing posts from 2012

Quick update - our paper is out

The contents of this blog + more detailed analyses and text are available in our BMC Microbiology paper here: One reviewer had a very interesting suggestion -- they asked us to add the average bootstrap scores to our heat map figure so that readers could get a sense of sequences that may be "novel" -- that is, that the RDPII-NBC hadn't seen before.  See the PLoS ONE article by Lan et al (  As a result you can clearly see that the greengenes training set is the most diverse and best at capturing the diversity found within the honey bee gut (Figure 2A).  That said, a fair number of unique sequences (>1000 out of ~4000) are still unclassified using this training set.  The classifications improve with the addition of honey bee gut specific sequences as do average bootstrap scores (Figure 2B).  Also interesting, and expected, is that trainin

The utility of bacterial nomenclature

Lately I've been thinking a lot about the "culture" behind bacterial nomenclature.  Of course there are the extremes:  Shigella  and  Escherichia coli  are classified as different genera but often the phenotype used to characterize these strains is horizontally transmitted (a plasmid harboring pathogenicity determinants) [ 1 ].  Then you've got the case of  Wolbachia pipientis,  the bug inside a bug that my lab currently investigates.  Researchers in the field have decided not to name each strain found in each distinct host, regardless of the divergence between strains [ 2 ] (but see [ 3 ] for an interesting counterpoint).  Obviously, the species concept in bacteria is extremely difficult to define and has been reviewed at length elsewhere [ 4-6 ].  What is absolutely true is that the markers we use to characterize diversity in the environment (be it the rRNA, core proteins, or enzymes) are simply that – markers.  They do not tell us whether or not these organisms a

Creating a bee-specific database

Arguably, 454 pyrosequencing has revolutionized the field of microbial ecology.  Where it was once costly to generate libraries of a few hundred 16S rRNA gene sequences, 454 pyrosequencing allows researchers to deeply probe a microbial community at relatively little cost per sequence.  The ultimate goal of 454 pyrosequencing amplicon studies is to characterize a microbial community, either in terms of composition (DNA) or activity (RNA).  A large number of groups have been using the Ribosomal Database Project's Naïve Bayesian Classifier (RDP-NBC) to achieve this goal (Wang et al., 2007). The advantages are numerous but I'll list a few of the practical ones here: classification is straightforward (putting sequences in their taxonomic context), efficient (especially when considering tens of thousands of sequences) and does not require full length 16S sequences (making it an appropriate tool for pyrosequencing studies).  However, the NBC relies on an accurate training set – on

Putting our honey bee data in context

Our group recently addressed the effect of within-colony genetic diversity on the associated microbial community of the honeybee Apis melliera [ 1 ] .  We obtained more than 70,000 pyrosequences from samples of whole worker bees, worker guts, and from bee bread taken from 22 colonies (n= 12 colonies were genetically diverse; n=10 colonies were genetically uniform).  Our research found that the honey bee colonies benefit from the promiscuous mating of queens; diverse colonies were characterized by a reduction in potential pathogens and enrichment for possible probiotic species.  We used well established approaches for clustering (based on 97% sequence identity using average neighbor) and  de novo  classifying short pyrosequences [ 2-5 ] as these sequences are arguably too short for robust phylogenetic analyses [ 6 ] .  Our approach used the Naïve Bayesian Classifier trained on the Arb-Silva dataset and targeted diversity in the V1-V2 region. The utility of this approach is that we cou