Sunday, November 10, 2013

Does your inner scientist wear heels or a beard?

What should a scientist look like?

A pair of articles was published this morning in our local paper (the Herald Times).  In it are details of the many inequities faced by female scientists at Indiana University.  In those articles, some startling statistics were published, including the percentage of female faculty in the sciences at IU (~10%) and directly relevant to me, the inequity in salary (in Biology, the seven female full professors make ~$131,000 while the 22 male full professors make ~161,000).  In addition, there were some poignant anecdotes from female professors: "I never had children" said one, "I or my partner. We never had time we could take out of our careers without feeling like we might lose what we had gained."   It's nigh time to address the root of this insidious problem.  It cannot be fixed by hiring more women -- there are few of us to begin with and additionally, we all agree we want to hire both the best and most diverse faculty (so does Harvard, Stanford,'s hard to compete for those few).  In addition, this problem cannot be fixed by simply providing maternity leave.  Family leave should be provided to faculty, male or female, but even if it is provided, you need to feel you are supported in taking leave by your department.  In order to overcome the gender gap in the sciences we need to provide work-life friendly policies but ALSO change everyone's perceptions as to what a scientist looks like, does with their spare time, or wears to the lab/office.  That goes for both the folks currently occupying upper ranks in the academy and the would be scientists themselves (altering a young girl's depiction of a scientist to include someone that would look like her).

 I'm a relatively young, hispanic, female scientist.  Barbie "I can be computer engineer" doll aside, I'm certainly not anyone's stereotype of a biologist or computer scientist (indeed, a well meaning, but perhaps socially maladroit coworker once told me that I "don't look like a scientist").  Since so very few women enter science, and computer science in specific, perhaps the view that my clumsy colleague holds is the norm.  In this NYTimes blog post, Catherine Rampell suggests that try as we might to expose young girls with an aptitude for STEM fields to the subject matter, without tech-savy, scientist role models, these budding minds will choose other professions.  I agree whole-heartedly that providing female scientist role models to both male and female students will help to curb our biased stereotypes.  

I'll continue with an anecdote.  My son's 1st grade classroom had an opportunity the other day - a colleague of mine, in Informatics, offered to teach those 6-7yr olds some computing (through the increasingly popular Scratch platform).  After hearing about this computing club, the teacher asked the students to please raise their hand if they were interested in joining.  Not a single girl participated.  Even in 1st grade! Perhaps you don't find that so surprising -- after all, few women enter computer science as an undergraduate major , particularly striking since the gender bias in college is actually inverse (>50% women).   At this point, I should let you in on one important detail: my Informatics colleague is female and own daughter (let's call her Samantha) refused to participate in the computing club.  How could that be? Hasn't Samantha, for her entire life, had her own mother as an example of what a computer scientist is?  The sad truth is that Samantha's peer group had a bigger influence on her decision than the positive exemplar provided by her mother; when queried, Samantha's reason for not joining the coding club was that "none of the other girls raised their hands."  

We must provide both role models and an inclusive environment

There's that old logic puzzle (see about a child and their dad being taken in to the ER and the physician attending proclaiming "I cannot operate on my son" - the punch line being that women can be doctors as well.   It takes folks a sad amount of time to figure this out.  If we continue to imagine the scientist as an old, white, male, that is what will populate our ranks.  Regrettably, my inner scientist sometimes sees this reflection in the mirror - it fills me with all sorts of doubts about whether or not I "belong" in science (Impostor Syndrome).  How can we alter the perceptions of others if we cannot see ourselves as filling these roles? How can we convince young, aspiring scientists to enter the track if they cannot conceive of a multidimensional, multifaceted scientist?

In a topsy-turvey counter example that proves the point, while a faculty member at Wellesley College in the Biological Sciences, I had a laboratory full of bright, female students.  My son, then 3, was asked by one of these students: "what do you want to be when you grow up?"  His response: "I'd like to be a scientist, but I'm not a girl."  Let's show the world that scientists are not who they think we are.

PS - some fun links to continue the discussions below

Tuesday, November 5, 2013

Transcriptional Regulation of Culex pipiens Mosquitoes by Wolbachia influences Cytoplasmic Incompatibility

An intriguing title for an article published in PLoS Pathogens on Halloween.  Needless to say, many of us in the Wolbachia community anxiously await the discovery of the mechanism behind Cytoplasmic Incompatibility.  For those of you who aren't Wolbachia-philes, the short of it is that this bacterium has figured out a nifty way to spread through insect populations. First, it's transmitted via the germ line -- so that means eggs come pre-loaded with Wolbachia from infected mothers.  Second, Wolbachia affect reproduction in a variety of ways but the most common is that infected females can mate with uninfected or infected males.  However, if you are an uninfected female -- if you don't carry the bacterium -- you cannot mate with infected males.  This drops the fecundity of uninfected females and allows Wolbachia, and the host carrying it, to spread.  There are also some interesting incompatibilities that result when hosts are infected with two different Wolbachia -- sometimes the crosses are compatible and sometimes they are not.  This has led the Wolbachia research community to come up with all sorts of complex, mathematical models and explanations for the observed data.

So, what is the molecular mechanism behind these incompatible crosses? How does Wolbachia prevent embryos from hatching when an infected male mates with an uninfected female?  Lots of elegant work has been published by both the Sullivan and Frydman labs suggesting that Wolbachia both associated with cytoskeletal elements (microtubules) and alter the cell cycle progression.  This cell cycle defect has been correlated with cytoplasmic incompatibility.  So, it makes sense to focus on the cell cycle when you're going after CI.

The Pinto et al. paper in PLoS Pathogens this last month takes a candidate gene approach to the CI phenotype.  They start with the Drosophila melanogaster gene grauzone (grau).  This gene is pretty darn important to the fly -- mutants are sterile and lay eggs with aberrant chromosomal segregations which arrest in development at metaphase II.  Pinto et al.  figure out that this gene is over-expressed in Wolbachia-infected mosquitos -- by almost 2 fold at some time points -- compared to uninfected mosquitos (see Figure 1 below from the paper, A,B are females and males, respectively while C,D are their reproductive organs).

Figure 1. Transcription analysis of CPIJ005623 in the Culex pipiens complex.

I think it's neat to find host genes that are differentially expressed when a bacterium invades.  Transcriptomics has been done before in Wolbachia infected cell lines but many of the candidates identified were immunity genes, seemingly irrelevant to the CI phenotype.   Even more interesting is the fact that this gene is important to female reproduction and specifically, meiosis.

Pinto et al. then ask whether by suppressing the grau homolog in mosquitos, they could replicate the CI phenotype.  I guess the idea is that this factor could be the "rescue" factor or the "key" that is produced by the female when crossed with an infected male.  An alternative hypothesis would be that this factor is upregulated in infected insects as a result of Wolbachia increasing the mitotic activity of the germ line cells, a result reported by the Frydman lab a few years ago.  Anyway, they go with hypothesis #1 and use RNAi knockdowns (KD) of this gene to then observe what happens if we cross KDed infected females with infected males?  Interestingly, they observed an increase in the percentage of unhatched embryos, compared to a LacZ control (see Figure 2D below):

Figure 2. Knockdown analysis of CPIJ005623 in C. molestus Italy females.

However, they ddidn't include some important experiments here -- how do we know that this grau knockdown is specific to the CI phentoype? Wouldn't it have been important to cross these KDed infected females with an uninfected male?  What if the observed increase in unhatched embryos is similar?  Since it seems that Wolbachia actually increase the expression of grau, wouldn't you want to take an uninfected female, overexpress grau, and cross her to an infected male?

Moving on, next, they do some interesting crosses. As it turns out, the Wobachia they are working on exists as many different strains in mosquitos, each exhibiting an interesting incompatibilities.  For example, two of their Wolbachia strains, wPel and wItaly, cannot be crossed to each other -- that is, infected females or males from either background do not produce viable embryos (no eggs hatch).  Pinto et al. knockdown grau in wPel infected males and crossed them to wItaly infected females, they also performed a KD in wPel and in wItaly and crossed these two KDed lines.  The results: no difference - no viable embryos.  BUT, they do observe a statistically significant change in the number of embryos reaching stage II and III (which, I forgot to mention, was the opposite phenotype observed for infected KDed females crossed with infected males - Compare Figure 4C below to Figure 2D above).

Figure 4. Knockdown analysis of CPIJ005623 in C. pipiens males.

So it seems like Wolbachia CI-like effects can be modulated by host grau expression (although clearly this isn't the entire "rescue" or the "key" story).

The final part of this paper is a bit disappointing.  As has been done time and time again, the researchers attempt to identify genomic differences in the Wolbachia strains infecting these mosquitos to determine what may be the factor that is changing expression of grau.  Disappointingly, although they identify some regions present in some strains and absent from others, they don't go far enough to establishing mechanism.  In specific, they point out that a transcriptional regulator, which they call wtrM, is present in wPipMol but absent in wPipPel.  What is their evidence that this transcriptional regulator is altering grau expression? Sadly, none. They show it is expressed in ovaries -- note: this is not that surprising since this is where Wolbachia actually hang out.  They go out on a limb and say that wtrM is actually secreted by Wolbachia and modulates host gene expression, so it presumably makes its way to the nucleus and actually binds to host DNA.  No evidence is presented for this presumed activity -- no chromatin immunoprecipitation, no DNA footprinting, not even nuclear localization when expressed in their mosquitos.

wtrM, it turns out, is a pretty well conserved XRE transcriptional regulator  -- that is, a Xenobiotic Response Element.  It's found within the Rickettsiales -- there are even homologs in Anaplasma, Ehrlichia, and Bartonella.  What do XRE's do, you ask? Well, they are known to respond to environmental stimuli in many systems but are probably most famous for their involvement in phage response. It is therefore not so surprising that Pinto et al. find this gene associated with the Wolbachia prophage.  That's not to say that the prophage isn't interesting -- it sure darn is! -- but the connection between this specific XRE element and grau is tenuous, at best.

Thursday, September 5, 2013

It's DDIG Season Again

The NSF Doctoral Dissertation Improvement Grant deadline is almost upon us (October 10, 2013).  What is funded by the DDIG FOA? From the NSF website: “Allowable items include travel to specialized facilities or field research locations and professional meetings, use of specialized research equipment, purchase of supplies and services not otherwise available, the hiring of field or laboratory assistants, fees for computerized or other forms of data, and rental of environmental chambers or other research facilities.”  Importantly, you CANNOT use the DDIG funds for stipends or tuitions.

Note:  If your PI already has a grant on this topic, you are not likely to be funded; If “existing funds” are available for the proposed work, you are disqualified.  The important point being that these proposals are used to gauge the independence of the PhD candidate from the PI’s major research thrust.

Why else consider writing a DDIG? The fame …The glory…The practice and most importantly…Excellent odds (25-30% funding rate)

Last Spring I was able to participate in the review process for one of these clusters (Evolution) within DEB.  It was an interesting experience and I'm putting together a workshop for students at my institution to pass on pearls of wisdom gained.  In the meantime, and with the help of colleagues, I've got a list of helpful links for those preparing a DDIG.  See below and enjoy!

1. The NSF DDIG Page

2. Links to the two sections of NSF through which DDIG's are funded: DEB and the Behavior cluster at IOS.

3. Student resources at the Indiana University Biology Department for DDIG submissions
*warning, some of the links on that page are out of date but the advice documents are still useful

4. Successful DDIG's from Stonybrook

5. One and Two posts from Joan Strassman's blog. 

Wednesday, September 4, 2013

What? There are bacteria where?

There have been some interesting, thought provoking publications of late concerning the last vestiges of "sterility" if you will, with regards to bacterial colonization.  Remember back when folks thought the healthy lung was sterile? Yeah, out the window with that hypothesis (see here, and here).  Remember urine being sterile? Nope, you must've missed this and this.  Is nothing sacred, people! Are bacteria truly everywhere![1]

In their publication, "Mother knows best: the universality of maternal microbial transmission", Seth Bordenstein and his student Lisa Funkhouser review for us the evidence in support of the vertical transmission strategy for host-associated bacteria.  While the invertebrates are undoubtedly best known for maternal transmission (the vent giant clams from my youth are highlighted by Funkhouser and Bordenstein as are the pea aphids), they present the evidence currently in hand to support a decidedly unsterile environment in utero.  Jesus, now I really need a drink.

However, the idea that any environment can be kept clean of these amazing metabolic machines persists.  Take, for example, the honey bee.  These little buggers develop in a soup of glandular secretions (potentially antibacterial), sugars (high osmolarity, likely selective) and other bee associated stuff (cool pic of these maggots below):

credit: waugsberg, wikipedia commons

So, either bees have some seriously powerful mechanism to prevent any bacterial growth in these little pools of sugar (which would be some amazing stuff) or instead, this environment is highly selective for some ingenious microbe to colonize.  Some folks sampled these larvae through development and via absence of PCR evidence concluded that the larvae, at least the young ones, are sterile.  I can hear my PhD advisor Colleen Cavanaugh in the back of my head: "absence of evidence is not evidence of absence".  Indeed, a recent publication using surface sterilized larvae and three selective media for culture (anaerobically), found that larvae, even the tiniest ones, are colonized (albeit with not very "many" - at least as detected by their culture methods.)  

I think that it's probable the pattern of colonization from "birth" through development is likely to persist in the honey bee, as it does in mammals, although inoculation may be much earlier than initially predicted for both.

 [1] <in case you need to be hit over the head with it, that was tongue in cheek>

Monday, July 29, 2013

PhyBin: tree binning by topology

Just submitted our first paper to PeerJ - the awesome new, open access journal aimed at tipping the entire publishing establishment on its head.  I'm looking forward to a smooth review process -- hopefully as sleek and helpful as the submission process. <UPDATE: our paper was accepted.  I will post a link to the paper once it's online!>

Our manuscript presents PhyBin, a computer program aimed at binning precomputed sets of trees in Newick format, a file format produced by the majority of tree building software.  As we assert in the manuscript, PhyBin is a utility rather than a complete solution; it can serve as a component in many genomics pipelines, and provides a useful addition to the landscape of tools for dissecting and visualizing large numbers of trees.  After the user applies their chosen ortholog prediction and tree-building algorithms, PhyBin offers a quick way to visualize and browse the different evolutionary histories, either binned by topology and sorted by bin size, or in the form of a full hierarchical clustering based on Robinson-Foulds (RF) distance: i.e. a tree of trees.

In the manuscript, we explore to functionalities in PhyBin: 1) the ability to bin trees with identical topologies and 2) the ability to cluster similar trees by RF distance.  Lots of folks interested in the "landscape" of topologies produced by orthologous genes across a genome use RF distance as a measure of topological similarity.  What is RF distance?  It is essentially the number of different steps you'd have to take to create one tree out of another -- it's the edit distance between two topologies.  So, according to the original Robinson-Foulds publication, for example, the trees below (trees 1 and 2) are edit distance 2 apart because in order to convert one to the other, you must collapse a node and then reform it.

PhyBin does some pretty neat pre-processing of trees to facilitate comparissons. For example, you can set a branch length threshold to collapse branches that are essentially noise in your dataset (say, from very closely related taxa).  It also checks your dataset for number of taxa and is quite robust to file formatting.  Then what PhyBin does is calculate the edit distance for a large group of trees (a distance matrix) and then also displays these distances as a tree of trees - as a dendrogram that links each tree in the dataset to each other based on the edit distances between them (how you'd get from one to another).  In our manuscript, we used a set of orthologs generated from 10 published Wolbachia genomes.  Here's what the dendrogram looks like for those 508 trees without (A) and with (B) clustering by RF distance. In the figure, you can see that many of the trees in the Wolbachia ortholog set are similar, they cluster into 9 large clusters, many of which support the monophyly of Wolbachia supergrops but others (in fact, a good fraction!) which do not.  

Want to download it?  Check it out here

Saturday, June 22, 2013

Like Farts in the Wind

I recently heard some interesting 2nd hand advice from Matt Welsh, whose blog has plenty of advice/rants/raves for tenure track junior faculty.   Matt ended up leaving academia for Google but with regards to writing grants and getting funding, he's said "Focus on writing papers, send out proposals like farts in the wind"; basically, spam granting agencies without a lot of thought.

For some reason this really bothers me.  It is effectively concluding that grantsmanship and actual intellectual merit cannot be accurately measure by the review process, that funding is effectively a crapshoot and you just need to play the game lots and lots in order to get funded.   It's succumbing to the view that folks reviewing your proposals are not going to know much about your area or understand the significance. Papers, on the other hand, tend to be reviewed by researchers more close to your area of expertise and this is where you need to spend your time polishing and honing your message.

Now, after many rounds of NSF review, perhaps I need to stop deluding myself and conclude the same.  But I think that the low funding rate has a lot to do with the perception of randomness in the review process.  Many meritorious proposals just cannot be funded at the current rate (although around 20% for the whole of NSF, it varies widely per program).

Perhaps it's time to get "gassy."

Friday, June 14, 2013

From the horses mouth to the fly (on the wall)'s ear: phrases overheard at an NIH section meeting and what a beginning investigator can glean from them

This is a continuation of my blog post yesterday concerning a recent NIH section meeting.  I was there as an Early Career Reviewer -- a great opportunity to learn about the process and listen in on the discussions.  While most reviewers get assigned many more, I was assigned only 4 applications to review. This gave me plenty of time to listen to what folks were saying and really pay attention to the conversation going on about applications being discussed.  Each application gets between 30-10 minutes of discussion (depending on time of day, enthusiasm or discordance among reviewers, etc). Not much can be generalized across all applications we reviewed.  However, in the R01 category , I think several general pieces of advice can be gleaned - at least in the GVE context - with regards to what works an what doesn't (FYI, R01's are those large, major awards -- after you get one of these, you are no longer a "new investigator").  

Note: Although the names of panelists serving on an NIH section are made public, I am keeping confidential the identity of the folks who uttered these phrases and, in addition, I've paraphrased these comments if they would in any way reveal the application under consideration during the discussion.

"What exactly is this proposal aiming to do?"
Make sure this is crystal clear from your aims, as written.  Not so good at conveying this information in prose?  Use a graphic, if you have to!  It can be helpful to sit down and actually think about every step of the project, the kind of data it will produce, and what you will need to analyze the data.  Present that in your application (with appropriate citations or letters of support, if necessary).  

"Most of my enthusiasm for this proposal comes from the papers cited therein, not from the proposal itself" and  "I could not figure out how it could be important to do X"
If you are excited about your project, find ways to engage your reader as well. In your writing, convey your excitement about your work - this will be infectious (in a good way).  Even if you are working on what you think is an important system, remember that your reviewers are coming from a relatively broad audience and they may not agree! My PhD advisor Colleen Cavanaugh always said we should aim to sell our work to our extended family -- if you can convince your uncle that he should fund you, you can convince anyone.

"This is a system I really wanted to love...but"
It's not enough to rely on the "cool" factor of your system or how sexy the topic is in the literature.  Reviewers are intelligent folks with lots of background necessary to find -- and expose -- the gaping holes in your experimental design, background, and understanding.  It may help to have a colleague read the proposal ahead of time (yes, that means writing it ahead of time).

"This is a fishing expedition without any hypothesis"
While reviewers recognize that hypothesis-generating aims are important, without any framework of expectations, it is difficult to ascertain whether or not your strategy will work.  It is always useful to consider the kinds of data that your project would produce (even if exploratory) and how those would be analyzed.  This would allow you to present potential hypothesis based on expected results.  For example (and this is a purely fictional example), instead of "Describing the microbiome associated with subway seats" try "Do commuters carry their microflora to work?"  Although the study is still exploratory, this framing allows a reviewer to see where  you might take this project and potential, downstream hypothesis testing.

A couple of final tidbits to my fellow New Investigators -- those having never been awarded a major grant from NIH:

Be adventurous: "safe"projects and problems may not yield the highest scores
Be enthusiastic without appearing naive: don't over-interpret the literature 
Don't be afraid to involve collaborators or consultants: unlike your well-established colleagues you are as of yet, untested.  Like it or not, you have much to prove. You DO NOT KNOW ALL.  It is a good idea to ask for letters of collaboration or support from folks in areas where you haven't published or using techniques you plan on learning or using for the first time.  Even if you think you know what you are doing, if you don't have a proven track record (read: publication record) of doing it, you should consider a letter of support.  
Proofread...and then proofread again...and then have someone else proofread: Grantsmanship, although not scored, can make reviewers angry. If they are having a hard time understanding your aims due to writing issues, this can only detract from your overall impression.

So get to it! New R01's are due October 5th (Renewals July 5th)

Thursday, June 13, 2013

NIH is not spelled "NSF"

I've often wondered what it is like on an NIH study section, and how it differs from an NSF panel, so I happily agreed to serve this past week on my first Genetic Variation and Evolution (GVE) study section.  

What is an NIH study section like? 

Well, on the face of it, much like an NSF panel.  Before you come to the meeting, you are given a set of applications to review.  Like the NSF, your review of each individual application is based on certain criteria and you are asked to comment on these criteria.  In an NSF review you first score the proposal (Excellent, Very Good, Good, Fair, or Poor).  At NSF, criteria you should comment on or focus on in your review are "Intellectual Merit" and "Broader Impacts" (with strengths and weaknesses for each).   You also give a summary statement at the end of your review that should reflect your score.

 For the NIH, the scored criteria are "Significance", "Investigator", "Innovation", "Approach", and "Environment".  Unlike the NSF, the NIH has tried to make these scores quantitative and comparable across study sections. So,scores of 1-3 are "good" or "high" (contrary to the numerical trend), scores of 4-6 are supposed to be "average" (an application may be of high importance but weaknesses bring down the overall impact or the application topic is of moderate importance), and scores of 7-9 are reserved for applications with serious problems or of low or no importance.  Each of the criteria are scored in this way (so you can get a sense of whether or not the approach is something the panel didn't like or the significance was seen as lacking.  You also write statements reflecting the strengths and weakness for each and you write an "overall impact" section - akin to the summary statement of NSF.  Finally, you provide your "overall" score -- this is what will be used to rank the applications during the section meeting.

You submit your reviews electronically -- like the NSF.  One big deviation from that of the NSF panel review as that after the deadline, you actually get to see the reviews of others and alter your scores and reviews.  This was interesting to me for two reasons: I wondered if folks would generally get bullied into lowering their scores (giving worse scores) or raising their scores (giving better scores).  I also wondered if clearly good or clearly bad proposals would get unanimously consistent scores.  The Scientific Review Officer also assumes a role at this point, guiding folks to match their text (in their reviews) with their scores -- for example, if you only wrote negative comments, why did you give the application a "3"? Or, if you had only glowing things to say about the proposal, why did you rank it at "5"?

Then we all arrive in a hotel at a predetermined location.  Like NSF panels, a relatively large group (25 of us in the case of GVE) meet in one room over a few days to discuss a large set of proposals.  Unlike NSF, NIH sections don't discuss all applications! Let me repeat that -- in case you've never applied to the NIH - if you score poorly, your application won't be discussed.  The only feedback you will get is from the reviews and those scores (which, granted, can be significant).  Many colleagues have told me that if you are not discussed, it is very likely that you will never get that grant funded.  That is because of the "three two strikes" rule at NIH -- you can only submit the same application 2 times.  What's the likelihood you'll go from not-discussed, to discussed or funded in three rounds without the feedback actually provided by a discussion?  Who decides what gets discussed? The Scientific Review Officer -- and this is largely based on how the applications scored.  However, anyone on the study section can "rescue" an application from "triage" by asking for it to be discussed. 

There's a wonderful aspect to the triaging of "bad" applications -- as a reviewer, it means you don't have to rush through a huge pile of proposals, or waste time/energy on clearly poor applications, or spend over 2 days stuck in a room away from your work/family/friends/etc.  However, as an applicant, I rather like that NSF provides pretty substantial feedback even to applications that are not the top of the heap -- although I know that some NSF panels do effectively "triage" poor scoring proposals, most of the applications get discussed at NSF while a large fraction are being triaged at NIH.

Ok, so now we've submitted our reviews, we have looked over the reviews of our colleagues and are in the conference room bright and early awaiting the start of the section. What happens?

- Application discussion order generally follows scores (best up first)
- An application to be discussed is brought up 
- Everyone takes a minute to read the abstract and aims 
- Assigned reviewers present their preliminary scores
- Each reviewer discusses strengths and weaknesses
- The application is opened up to discussion by all
- Human subjects, vertebrate animals, and biohazards are discussed
- Assigned reviewers are asked again what their scores will be (often these change after discussion).  These scores provide a voting "range"  -- that is to say, if the three reviewers ranked your application overall as 2,4 and 5 then everyone on the panel can vote within the range (2-5) or, they have to raise their hand and announce that they will vote out of the range (either above or below -- although this fact remains private).  Although you could imagine a situation where folks feel odd about voting out of range, or somehow peer pressured into the range, this didn't seem to be the case on the GVE panel at all.  In addition, you don't have to say why you are voting out of range, but simply state that you are.
- Everyone casts a vote
- The budget and other non-scoring criteria are discussed

Remember: The only people reading your full proposal are those assigned to review it!!! That is, only THREE people.  Unless they go out of their way to download and read the full proposal <unlikely>, everyone else ONLY READS THE ABSTRACT AND AIMS!! Make those pages good, folks! Make them readable, stand on their own, and capture the imagination.

Come back tomorrow for some nuggets of advice I've learned from being on this panel (albeit from a current unfunded beginning investigator)

Monday, April 22, 2013

Thinking about sampling scales in host-associated microbiota

A recent publication by RT Jones, L.G. Sanchez, and N Feirer: "A cross-taxon analysis of insect-associated bacterial diversity" started me thinking about how we sample insects.  You can see the article here at PLoSONE:

In this publication, the authors perform a large collection of diverse insects and then subsequently a 16S rRNA gene amplicon analysis.  They use the V1-V2 region and 454 pyrosequencing.  They use whole insects as the source of material for the DNA extractions and perform a set of bioinformatics processing steps downstream in an attempt to reduce artifacts that may lead to inflation of diversity estimates in their samples.   From their published methods, this seems to be the order in which reads were processed:
1.  Reads are truncated (to compare across the same 16S positions) and removed “low quality reads” using QIIME’s default settings – I am not sure if amplicon noise removal was used as implemented therein nor did they mention any alignment to any 16S model.
2.  Sequences are binned at 97% identity using uclust and the authors picked the most abundant sequence for downstream analyses
3. Then, phylotypes that did not represent at least 1% of community membership within the sample were removed.  I should point out that from their methods, at this point, it seems that each sample is a different total size so that this 1% will necessarily be a different total number of sequences removed from each. 
4. Any sample that does that have at least 500 sequences is removed.  Perhaps because that number was small – they say the minimum was 61 in some cases.

In the end, across all samples, they were left with 477 unique “phylotypes” (read: 97% identity OTUs).  Their goals in this study were to identify links between diet and microbiome composition and also analyze the diversity of microbes found associated with insects of different, diverse genera.  They do not find any effect of diet, at least based on their taxonomic classification of hosts (morphological) and published understandings of what these hosts might consume.  See Figure 2 from the paper below.  Let’s explore whether or not, given the sampling, this a surprising result.

Figure 2. Bray-Curtis cluster of insect species based on their associated bacterial communities (all 477 bacterial phylotypes used for Bray-Curtis analysis) and Z-scores of the 96 most abundant bacterial phylotypes with lowest scores in light blue and highest scores in dark blue.
Each column is a unique bacterial phylotype. Phylotypes are arranged according to taxonomic classification. Insect species are identified by a four-letter code (Table 1) with the first letter indicating the order, as follows: (C) Coleoptera, (D) Diptera, (H) Hemiptera, (I) Isoptera, (L) Lepidoptera, (N) Neuroptera, (S) Siphonoptera, and (Y) Hymenoptera.

Sampling of insects

How many bacterial cells do you suppose cover the surfaces of a single insect? How many live within their cells in specialized organs or within their reproductive tracts?  Estimates based on microscopic observations and qPCR data on single copy functional genes suggest that insects infected with bacterial symbionts harbor upwards of 109 bacterial cells (for that one, specific symbiont) [1, 2] and indeed, 200,000 Buchnera are estimated to be transmitted in each single generation [3].  Ok, so if your insect happens to be infected with a bacterial symbiont, your dataset will include this organism in abundance (as found by the study reviewed here for several of their samples – some of which were >90% Wolbachia).  Also, these kinds of symbioses tend to be very species specific – a single rRNA gene phylotype is likely to be found for these bacteria.  How many other bacterial cells do you think there are on and in this host?  Do you think you can effectively sample them in the presence of an over-abundant single phylotype?

Now let’s consider the fact that when most of us sample insects, we grind up the entire body. This isn’t just a laziness issue, it’s due to the fact that insects are small, difficult to dissect, and often preserved in ways that make it even more difficult to perform the dissection post-mortem.  How many different habitats do you think we are combining into a single sample when we grind up entire insects? At least, from my perspective (Wolbachia-centric and all), the reproductive tract and the gut are distinct and that’s not considering possible specialized organs.  When we sampled the honey bee gut vs. the entire body of the bee, we saw considerable differences in terms of both diversity sampled and in the composition of the community, largely due to the rank abundance distribution of the reads and the ability to deeply sample the gut vs. a diverse set of habitats on the honey bee [4]. Now consider that in the Jones et al study, we got 500 sequences from each insect.  Do we still think it reasonable to perform the kind of frequency-based OTU-culling performed here and by others?  

Consider that for sampling the human microbiome, we very deeply probe – 2,000 sequences or more – individual stool samples (containing a subset of the gut, at best).  How many OTUs do you think we would find if we ground up an entire human and used 1g of that homogenate to run a PCR and produced 500 sequences?

To be fair, the authors are aware of the sampling limitation of their work – the fact that they used whole insects – and address it head on here:

“Our research, however, has two limitations that may restrict our ability to detect the effect of diet on bacterial community composition: 1) the broad diversity of insect samples, and 2) the use of whole insects. A study focused on a less diverse group of insects with varied diets may be better suited to resolve the effects of taxonomy and diet on bacterial communities. By using whole insects, we maximized our detection of insect-associated bacteria but also included endosymbionts (i.e. intracellular bacteria) in our analyses.”

I would argue that by using whole insects they maximized their ability to detect very abundant symbionts but they do not sample all habitats on the host evenly.

What does frequency-based culling do to our data? Why do it?

One of the first studies that explored potential contaminants in the “rare-biosphere” is the E.coli sequencing paper by Kunin et al published back in 2009 [5].  In that study, the authors sequence a laboratory strain of E. coli (a mock community of a single organism with multiple rRNA operons).  They find multiple OTUs that are not classified as E. coli and pass the QC filtration steps used by others previously.  In their own words:

"These contaminants represent only 0.03% of the reads obtained in the present study and suggest that all PCR-based surveys that use broad-specificity primers will likely suffer from similar low-level background contamination, a point worth bearing in mind when interpreting rare biosphere data."[5]

How do we incorporate these results in our data interpretation from natural communities? How do we bias our dataset when we set an arbitrary threshold for how many times an OTU must occur before we consider it “real”?

Several studies following the Kunin et al analysis chose not to perform a frequency-based cull (or at least did not report it in their methods): the Dominguez-Bello infant study used 2,000 reads per sample [6], the Caporaso study used 3.1 million reads, deciding that 2,000 was sufficient to characterize that environment [7].  Some other authors have taken advantage of the rank abundance curve to quality filter their reads without completely deleting the rare biosphere [8], and others have pointed out that severe amplicon noise filtration, and indeed any diversity metric has to make some assumptions about the underlying population from which we are sampling (an unknown) and that amplicon noise reduction algorithms may not be based on actual read characteristics and can bias the data [9, 10].

All noise filtration pipelines not strictly focused on Phred scores and read-specific parameters make assumptions about the underlying, unsampled community.  By removing OTUs that don’t exist at 1% or greater in your single sample you are necessarily making an assumption about the diversity you expect to find in that habitat and biasing the results.  It’s one thing to consider removing singletons - reads that occur only once and may be contamination - but quite another to consider only OTUs at a larger, arbitrary frequency/percentage. 

One of the results in the Jones et al paper really highlights the fact that they did not deeply sample their insects.  Consider the termite (Isoptera:Termitidae) called Coptotermes formosanus.  This wood-eating termite is host to many diverse bacteria and protists – it’s often used in microbiology courses focusing on symbiosis due to the neat morphologies and behaviors you can observe under the microscope after dissection of the hind gut.  Others who have more deeply sampled the termite gut (admittedly, a different species) have found between 300 - 700 bacterial OTUs present there (depending on the primer set used see [11]), and a metagenomic sampling of Coptotermes formosanus itself also revealed a large amount of bacterial diversity [12].  Although we don’t have the raw number of OTUs found in the Coptotermes sample in the Jones et al study you can see the average richness in Table 2 is listed as 5.83.  This is extremely low based on our understanding of the termite gut and expectations from these natural environments.