Monday, April 2, 2012

Putting our honey bee data in context


Our group recently addressed the effect of within-colony genetic diversity on the associated microbial community of the honeybee Apis melliera [1].  We obtained more than 70,000 pyrosequences from samples of whole worker bees, worker guts, and from bee bread taken from 22 colonies (n= 12 colonies were genetically diverse; n=10 colonies were genetically uniform).  Our research found that the honey bee colonies benefit from the promiscuous mating of queens; diverse colonies were characterized by a reduction in potential pathogens and enrichment for possible probiotic species.  We used well established approaches for clustering (based on 97% sequence identity using average neighbor) and de novo classifying short pyrosequences [2-5] as these sequences are arguably too short for robust phylogenetic analyses [6].  Our approach used the Naïve Bayesian Classifier trained on the Arb-Silva dataset and targeted diversity in the V1-V2 region. The utility of this approach is that we could, through alignment of the 16S rRNA gene, make hypotheses as to what these organisms in the bee gut may be doing, how they might be interacting, without a priori expectations of community composition.

In an important earlier study in 2007, Cox-Foster et al. published a phylogenetic framework for classifying the bacteria that are associated with honey bees [7]. The framework includes 8 phylotypes named Bifidobacterium, γ-1, γ-2, β-1, α-1, α-2, firm-4 and firm-5.  Unlike our de novo approach, these phylotypes were generated based on their groupings on a phylogenetic tree.  The method of analysis employed by Cox-Foster et al. differs from ours in that the diversity within each clade is not predetermined – that is, no % identity threshold is used in generating these groupings – and therefore taxa grouped into a single clade may be highly divergent (that is, below the traditional 97% identity threshold utilized in the field).  We analyzed the sequences generated in the earlier study, and computationally formed % identity clusters to explore the amount of diversity within a clade by progressively clustering the sequences at higher divergence levels using complete linkage clustering (as implemented in RDP Classifier; Table 1) or nearest neighbor clustering (as implemented in blastclust –b T, -L 0.9).  Indeed, each clade holds a relatively large amount of diversity; sequences within each clade are between 3- and 10% divergent (Table 1).  According to the methods utilized in Mattila et al. (2012), and using a 97% identity threshold that is more typical for the field, these clades would be considered to harbor numerous species and/or genera.

Table 1. The number of clusters generated by complete linkage clustering (or nearest neighbor clustering in parenthesis) of the 8 clades characterized in Cox-Foster et al. [7] as a function of percent identity.  Subclusters within clades suggest that these groupings are quite diverse and likely contain several different species/genera.  

Phylotype
90%
93%
95%
97%
Alpha-1
1 (3)
1 (3)
1 (3)
2 (3)
Alpha2-1
1 (2)
1 (2)
1 (2)
2 (4)
Alpha-2-2
1 (1)
1 (1)
3 (1)
5 (3)
Beta
2 (7)
2 (7)
3 (8)
5 (8)
Bifido
1 (2)
1 (2)
1 (2)
1 (2)
Firm-4
1 (2)
2 (2)
3 (4)
5 (5)
Firm-5
1 (2)
2 (2)
2 (2)
4 (2)
Gamma-1
2 (6)
2 (6)
4 (6)
5 (7)
Gamma-2
1 (3)
1 (3)
1 (3)
1 (3)

Below, we compare our data and our approach to that developed by Cox-Foster et al. [7] to contrast the level of diversity that is estimated by both approaches.  We used blastn to identify which of our top 13 most prevalent OTUs (sequences that cluster at 97% identity) corresponded to the 8 clades mentioned above.  If our top OTUs were 100% identical to a sequence representative considered to be part of one the 8 clades, we noted this in Table 2 below.

Table 2. Representative OTUs generated by Mattila et al., 2012, their % prevalence in the bee gut (ranked in terms of abundance) and top blast hit accession numbers from the Cox-Foster et al., 2007 phylotypes.  Where a particular OTU does not find homologs within the 8 phylotype framework proposed by Cox-Foster et al., we indicate that result with N/A.
Classification (Mattila et al., 2012)
Prevalence in the bee gut (Mattila et al., 2012)
Top blast hit (nr/nt)
Phylotype
Succinivibrionaceae
38.8%
HE613303
Gamma-1
Bowmanella
14.3%
DQ837611
Gamma-2
Oenococcus
14.1%
HE613310
Firm-5
Paralactobacillus
10.2%
HM113331
Firm-4
Unclass. Colwelliaceae
6.4%
HM111973
Gamma-1
Bifidobacterium
4.7%
HM113282
Bifido
Shimazuella
3.2%
HE613282
Firm-5
Enterobacter
1.2%
JF208675
N/A
Laribacter
1.0%
JQ437500
Beta
Saccharibacter
0.92%
JQ437507
Alpha-2.1
Rummeliibacillus
0.52%
HM111947
Firm-5
Atopobacter
0.28%
HM113352
Firm-4
Escherichia/Shigella
0.17%
HE582599
N/A


It is important to note that the 13 OTU representatives in Table 2 do not represent our complete dataset; of the 1,019 OTUs we reported in Mattila et al., 2012, only 358/1019 find homologs (>98% ID) in honey bee datasets previously published.  Furthermore, Mattila et al., 2012 takes the step of classifying and analyzing these sequences at a finer taxonomic scale. Two fundamental questions remain to be addressed: 1) is this diversity relevant to honey bee health? and 2) is the level of divergence revealed by our study for the honeybee microbiome important for the function and stability of this community?  Our study suggests that fine scale diversity within the bacterial community (at 97% ID) may be important to the health of a colony; community diversity using this metric correlated with host genetic diversity and with the prevalence of low frequency pathogens Melissococcus and Paenibacillus.

 References