The contents of this blog + more detailed analyses and text are available in our BMC Microbiology paper here:
One reviewer had a very interesting suggestion -- they asked us to add the average bootstrap scores to our heat map figure so that readers could get a sense of sequences that may be "novel" -- that is, that the RDPII-NBC hadn't seen before. See the PLoS ONE article by Lan et al (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0032491). As a result you can clearly see that the greengenes training set is the most diverse and best at capturing the diversity found within the honey bee gut (Figure 2A). That said, a fair number of unique sequences (>1000 out of ~4000) are still unclassified using this training set. The classifications improve with the addition of honey bee gut specific sequences as do average bootstrap scores (Figure 2B). Also interesting, and expected, is that training sets based on the 16S clone library data generated previously by others was unable to fully classify the diversity found in the 454 dataset (Figure 2C). Both the drop in confidence -- as observed by a drop in average bootstrap scores across the board -- and inability of the RDPII-NBC to classify 650 of these sequences.