The utility of bacterial nomenclature

Lately I've been thinking a lot about the "culture" behind bacterial nomenclature. Of course there are the extremes: Shigella and Escherichia coli are classified as different genera but often the phenotype used to characterize these strains is horizontally transmitted (a plasmid harboring pathogenicity determinants) [1]. Then you've got the case of Wolbachia pipientis, the bug inside a bug that my lab currently investigates. Researchers in the field have decided not to name each strain found in each distinct host, regardless of the divergence between strains [2] (but see [3] for an interesting counterpoint). Obviously, the species concept in bacteria is extremely difficult to define and has been reviewed at length elsewhere [4-6]. What is absolutely true is that the markers we use to characterize diversity in the environment (be it the rRNA, core proteins, or enzymes) are simply that – markers. They do not tell us whether or not these organisms are similar in function, phenotype, or genomic structure. For example, organisms with nearly identical 16S rRNA genes (such as Shigella and Escherichia) can exhibit dramatically different phenotypes during infection. However, I have yet to find a set of organisms for which 16S rRNA gene divergence doesn’t correlate with genomic divergence. That is to say, although 16S rRNA gene similarity may obfuscate genomic divergence, differences at this locus necessarily correspond to genomic differences (please do respond if you know of a counter example!).

So, what is the point of naming bacterial species if we don’t have a species concept? The fundamental utility of nomenclature is to be certain that groups are discussing or researching the same organism: what Ralph Isberg’s lab calls Legionella pneumophila Philadelphia 1 should be the same strain (well, taking generations in two different labs into account) as what was sequenced back in 2004 [7]. Most labs utilize the 16S rRNA gene to taxonomically classify organisms. Now, this is a tricky proposal to begin with because we don’t have a good sense of a bacterial species concept. That said, a generally agreed upon threshold for the divergence between species based on the 16S rRNA is 3%. That is, we expect organisms that are of the same “species” to be 97% similar or more at that locus. For example, if I isolated an organism with >97% identity to a Legionella strain, I’d name it after the known species.

What about when exploring relatively “novel” (read: underexplored) groups of bacteria? I think it’s interesting to consider how researchers in that specific field deal with the taxonomic task ahead of them: do you lumping groups together or do you split them? Do you come up with novel names or do you name them after isolates? Let’s consider the honey bee gut microbiota, something our group has been investigating recently. There are some clades, or phylogenetically related groups, that have been considered important by others in the field [In a previous blog post I presented a phylogeny of all near-full length 16S sequences in Genbank within this framework: here]. As you can tell from the phylogeny, these groups are quite diverse. In fact, the percent divergence within each of these clades – based on the 16S rRNA gene – is quite large (above 10% by nearest neighbor clustering). So, these clades clearly represent something above the species level – perhaps the family or order level. Interestingly, there are several isolated strains that clade with these groups (Table 1), many isolated from non-honey bee sources, suggesting that they may not be as bee-specific as previously thought. Recently, two new genus and species names were proposed for the so-called Beta and Gamma-1 groups [8] – importantly, beyond Enterobacetriaceae bacterium Acj204, there are no previously named isolates within these two clades. Because within these groups there are bacterial isolates that can be studied with regards to their metabolic capabilities (in some cases, their genome sequences have been completed, see ncbi accession #CP001562), we can begin to determine whether or not there are functional differences relevant in the classification of an organism as either Commensalibacter intestini or Saccharibacter florica. For example, the pathogen Bartonella henselae sequence CP00156 (B. henselae) clades with the alpha-1 sequences, a group that often is found in honey bee colonies although the fitness effects on the host are unclear. Is there a difference between Enterobacteriaceae bacterium Acj204 and Candidatus Gilliamella apicola? Clearly, the relevance of the taxonomic designation below the family level for these bee-specific groups remains to be determined.

Bee-specific group name	Strain taxonomic designation	Where isolated?
Alpha-2.2	Saccharibacter floricola strain S-877	Pollen
Alpha-2.1	Commensalibacter intestini strain A911	Drosophila melanogaster
Alpha-1	Bartonella grahamii as4aup	Mouse gut
Firm-5	Lactobacillus apis strain 1F1	Honey bee
Gamma-1	Enterobacteriaceae bacterium Acj204	Honey bee