Putting our honey bee data in context

Our group recently addressed the effect of within-colony genetic diversity on the associated microbial community of the honeybee Apis melliera [1]. We obtained more than 70,000 pyrosequences from samples of whole worker bees, worker guts, and from bee bread taken from 22 colonies (n= 12 colonies were genetically diverse; n=10 colonies were genetically uniform). Our research found that the honey bee colonies benefit from the promiscuous mating of queens; diverse colonies were characterized by a reduction in potential pathogens and enrichment for possible probiotic species. We used well established approaches for clustering (based on 97% sequence identity using average neighbor) and de novo classifying short pyrosequences [2-5] as these sequences are arguably too short for robust phylogenetic analyses [6]. Our approach used the Naïve Bayesian Classifier trained on the Arb-Silva dataset and targeted diversity in the V1-V2 region. The utility of this approach is that we could, through alignment of the 16S rRNA gene, make hypotheses as to what these organisms in the bee gut may be doing, how they might be interacting, without a priori expectations of community composition.

In an important earlier study in 2007, Cox-Foster et al. published a phylogenetic framework for classifying the bacteria that are associated with honey bees [7]. The framework includes 8 phylotypes named Bifidobacterium, γ-1, γ-2, β-1, α-1, α-2, firm-4 and firm-5. Unlike our de novo approach, these phylotypes were generated based on their groupings on a phylogenetic tree. The method of analysis employed by Cox-Foster et al. differs from ours in that the diversity within each clade is not predetermined – that is, no % identity threshold is used in generating these groupings – and therefore taxa grouped into a single clade may be highly divergent (that is, below the traditional 97% identity threshold utilized in the field). We analyzed the sequences generated in the earlier study, and computationally formed % identity clusters to explore the amount of diversity within a clade by progressively clustering the sequences at higher divergence levels using complete linkage clustering (as implemented in RDP Classifier; Table 1) or nearest neighbor clustering (as implemented in blastclust –b T, -L 0.9). Indeed, each clade holds a relatively large amount of diversity; sequences within each clade are between 3- and 10% divergent (Table 1). According to the methods utilized in Mattila et al. (2012), and using a 97% identity threshold that is more typical for the field, these clades would be considered to harbor numerous species and/or genera.

Table 1. The number of clusters generated by complete linkage clustering (or nearest neighbor clustering in parenthesis) of the 8 clades characterized in Cox-Foster et al. [7] as a function of percent identity. Subclusters within clades suggest that these groupings are quite diverse and likely contain several different species/genera.

Phylotype	90%	93%	95%	97%
Alpha-1	1 (3)	1 (3)	1 (3)	2 (3)
Alpha2-1	1 (2)	1 (2)	1 (2)	2 (4)
Alpha-2-2	1 (1)	1 (1)	3 (1)	5 (3)
Beta	2 (7)	2 (7)	3 (8)	5 (8)
Bifido	1 (2)	1 (2)	1 (2)	1 (2)
Firm-4	1 (2)	2 (2)	3 (4)	5 (5)
Firm-5	1 (2)	2 (2)	2 (2)	4 (2)
Gamma-1	2 (6)	2 (6)	4 (6)	5 (7)
Gamma-2	1 (3)	1 (3)	1 (3)	1 (3)

Below, we compare our data and our approach to that developed by Cox-Foster et al. [7] to contrast the level of diversity that is estimated by both approaches. We used blastn to identify which of our top 13 most prevalent OTUs (sequences that cluster at 97% identity) corresponded to the 8 clades mentioned above. If our top OTUs were 100% identical to a sequence representative considered to be part of one the 8 clades, we noted this in Table 2 below.

Table 2. Representative OTUs generated by Mattila et al., 2012, their % prevalence in the bee gut (ranked in terms of abundance) and top blast hit accession numbers from the Cox-Foster et al., 2007 phylotypes. Where a particular OTU does not find homologs within the 8 phylotype framework proposed by Cox-Foster et al., we indicate that result with N/A.

*Classification (Mattila et al.,* 2012)**	*Prevalence in the bee gut (Mattila et al.,* 2012)**	Top blast hit (nr/nt)	Phylotype
*Succinivibrionaceae*	38.8%	HE613303	Gamma-1
*Bowmanella*	14.3%	DQ837611	Gamma-2
*Oenococcus*	14.1%	HE613310	Firm-5
*Paralactobacillus*	10.2%	HM113331	Firm-4
*Unclass. Colwelliaceae*	6.4%	HM111973	Gamma-1
*Bifidobacterium*	4.7%	HM113282	Bifido
*Shimazuella*	3.2%	HE613282	Firm-5
*Enterobacter*	1.2%	JF208675	N/A
*Laribacter*	1.0%	JQ437500	Beta
*Saccharibacter*	0.92%	JQ437507	Alpha-2.1
*Rummeliibacillus*	0.52%	HM111947	Firm-5
*Atopobacter*	0.28%	HM113352	Firm-4
*Escherichia/Shigella*	0.17%	HE582599	N/A

It is important to note that the 13 OTU representatives in Table 2 do not represent our complete dataset; of the 1,019 OTUs we reported in Mattila et al., 2012, only 358/1019 find homologs (>98% ID) in honey bee datasets previously published. Furthermore, Mattila et al., 2012 takes the step of classifying and analyzing these sequences at a finer taxonomic scale. Two fundamental questions remain to be addressed: 1) is this diversity relevant to honey bee health? and 2) is the level of divergence revealed by our study for the honeybee microbiome important for the function and stability of this community? Our study suggests that fine scale diversity within the bacterial community (at 97% ID) may be important to the health of a colony; community diversity using this metric correlated with host genetic diversity and with the prevalence of low frequency pathogens Melissococcus and Paenibacillus.

References

1. Mattila HR, Rios D, V.E. W-S, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7: e32962.

2. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, et al. (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75: 7537-7541.

3. Staley JaAK (1985) Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annual Review of Microbiology 39: 321-346.

4. Werner JJ, Koren O, Hugenholtz P, DeSantis TZ, Walters WA, et al. (2012) Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys. ISME J 6: 94-103.

5. Yzerman E, den Boer J, Caspers M, Almal A, Worzel B, et al. (2010) Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains. Bmc Genomics 11.

6. Min XJ, Hickey DA (2007) Assessing the effect of varying sequence length on DNA barcoding of fungi. Mol Ecol Notes 7: 365-373.

7. Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, et al. (2007) A metagenomic survey of microbes in honey bee colony collapse disorder. Science 318: 283-287.