Sunday, November 9, 2014

Validation of RNAseq Experiments by qPCR?

This post is in response to a couple of twitter discussions regarding whether its "useful" to do qPCR validation of RNAseq hits:

As preliminary data for a talk can I graph transcript counts of particular genes (transcriptome data) instead of qPCR (haven't done it yet)?
— Sciencegurl (@sciencegurlz0) October 27, 2014

In response to the that I posted a response I made in response to a similar query from a reviewer for a recent manuscript (original available here). Based on the fact that this came up a couple times in two weeks, I thought that I'd try to be more clear on my thoughts (and put them somewhere easier for others to find). This is how we address this issue experimentally, and in response to reviewer requests. Feel free to use any of these arguments yourself but your mileage with your supervisor/manuscript or grant reviewers may vary. Most importantly, if you have some suggestions/data/papers that we should include please comment below and I'll try to keep this post up to date.

The Answer We Gave:

We had considered performing qPCR studies to ‘re-validate’ some of our gene-expression findings but there is little evidence that qPCR analyses from the same samples will add any extra utility to our data so we decided to eschew those experiments. Previous studies have shown extremely close correlations between qPCR and RNAseq data [1-4]. Ideally, we would re-validate our findings (potentially by qPCR) in a separate cohort of samples, but due to the difficulty in accessing these samples, those experiments are not possible at this time.

What Do We Mean By Validation of RNAseq Results?

RNAseq, like microarrays before them generate a lot of data. Ideally you can determine the levels of every gene/transcript/exon in the genome, and given proper experimental design determine a large number of significantly differentially expressed genes. To follow up these findings, we often want to test how valid those observations are. In my interpretation, this can mean a few different things:

Are these transcripts really differentially expressed in these samples (technical reproducibility)?
Are these transcripts really generally differentially expressed in other samples (biological reproducibility)?
Do these transcriptional changes represent phenotypic differences (significance)?

For the third question, I'd say you'd want to perform some non-transcriptional response such as a western blot, enzymatic or cell based assay to show that the transcriptional change has some biological phenotype. Ideally, you may even go farther and manipulate the expression of a gene of interest back to the control condition, to test the hypothesis that a change in that gene is causative of some particular phenotype.

Why was qPCR the traditional validation experiment from microarray studies?

Normally though, the question of validation stems from one of the first two questions. This is probably based on prior work with microarrays. Microarrays were/are great tools for transcriptomic analysis (though I am hard pressed to think of a reason to do them in lieuw of RNAseq). One major problem with microarrays was probe bias. What this means is that there was a limited number of hybridization probes in a microarray experiment, and its possible that this probe may not be representative of the transcript as a whole. Furthermore, not all transcripts of interest may be present on the microarray of choice. As a result, the standard in the field was to examine transcript abundance by qPCR to technically validate microarray results.

How similar are RNAseq and qPCR results?

Probe bias, poor sensitivity and reduced linear range are not as problematic in RNAseq experiments, since the entire transcript is assessed in a more or less unbiased manner [5]. Several studies have compared RNAseq results to qPCR data, and have found excellent correlation between these methods [1-4]. In cases where there are discrepancies exist, I would argue that it is most likely due to bias in the qPCR experiment (which has its own probe-bias based on what region of the cDNA is amplified). Therefore, it is unlikely to yield new information, and when it does, the information is probably worse than the quality of RNAseq data.

Under What Conditions Would qPCR Be a Good Validation Method?

This isn't to say qPCR isn't useful, we use it all the time in my lab. Its a great tool for looking at a small number of genes in samples for example. But when would it be good to use in the context of validating a RNAseq experiment? I would argue that it is most useful in the case where you have independent samples from those that you did your RNAseq studies on. For example, maybe for economical reasons, you examined 5 control samples and 5 drug-treated samples but you have another 20 samples available. qPCR would be a great way to test whether the differences observed are also true in separate samples, thus answering question #2. I think this is especially important as in my experience a lot of RNAseq studies are underpowered to answer the questions asked (if you want a quick and easy way to check the power of your experimental design I like Scotty).

References

Thanks to Matthew MacManes (@peroMHC) and Alejandro Montenegro (@aemonten) for spurring this discussion, and sending me towards a nice paper by Timothy Hughes [6], which is a great summary of similar, more general issues.

Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy a S, et al. (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7: 843–847. doi:10.1038/nmeth.1503.
Asmann YW, Klee EW, Thompson EA, Perez E a, Middha S, et al. (2009) 3’ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genomics 10: 531. doi:10.1186/1471-2164-10-531.
Wu AR, Neff NF, Kalisky T, Dalerba P, Treutlein B, et al. (2014) Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods 11: 41–46. doi:10.1038/nmeth.2694.
Shi Y, He M (2014) Differential gene expression identified by RNA-Seq and qPCR in two sizes of pearl oyster (Pinctada fucata). Gene 538: 313–322. doi:10.1016/j.gene.2014.01.031.
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10: 57–63. doi: 10.1038/nrg2484.
Hughes TR (2009) “Validation” in genome-scale research. J Biol 8: 3. doi:10.1186/jbiol104.

Sunday, February 26, 2012

Future Bridges Lab Rules version 0.1

Lab Rules

Version 0.1.3 on July 14, 2012 by Dave Bridges

Remember when you were growing up and you would say, well when I’m older I will (or won’t) do that.
I have been thinking of that, for my future when I run my own group.
It is fairly easy (and as a bit of a blowhard I do this all the time) to say I would do this, or I would do that.
I think posting this publicly will encourage me to stick to these rules.
Below are some roles and a bit of rationale and caveats.
This is the first version of this post but future versions will include links to the previous versions.
The version numbering is described in the Lab Policies README.
Check out the GitHub Repository for more granular changes.

Supervision of Trainees

Trainee-Advisor Contract: Both myself and trainees (either mine, or co-supervised trainees) will read, discuss and sign a contract describing our roles and responsibilities both as a trainee/employee and a mentor. This will include data dissemination/publishing rules, expectations of productivity, note keeping and time commitment, rules for dealing with other members both in my group and in collaborations, rules for sharing of reagents and data, rules for adjucating disagreements and grounds and procedures for termination. These rules will be in conformity with any institutional rules. Exceptions can be discussed and the agreement can be modified throughout the term of the relationship. I will post a generic version of this agreement in a publicly viewable location.
Online Presence: All trainees will appear on the laboratory website and write a blurb about their research interests and goals. Trainees will be strongly encouraged to blog, tweet and otherwise engage in social networking tools regarding their research and the work of others, but this is not required. Links to their publicly available social network profiles will be posted on the laboratory website.
Open Access Policy: Trainees will be made aware of the open publishing, dissemination, software and data/reagent sharing policies of the laboratory at the outset and will have to agree to these standards.

Reagents, Software and Tools

Software Usage: Wherever possible, free open source software will be used for data acquisition, analysis and dissemination. Exceptions will be made if necessary, but trainees will be encouraged to use/incorporate/development free tools.
Software Development: If software, scripts or the like are generated they will be released under a permissible open source license such as CC-BY and the license will be attached explicitly to the source code. Scripts and programs will be uploaded to a public revision control database such as GitHub or similar (my GitHub profile is here).
Publishing of Protocols and Scripts: When not present in the published article, detailed step by step protocols, data analysis scripts and other things which cannot fit into either methods and materials sections or supplementary materials will be posted online and linked to the publication’s online presence (post or as a comment on the paper’s website).
Protocol Sharing: Protocols will be made available online in a wiki format in a publicly available location, whether they have been published on or not. Editing will be restricted to laboratory members and collaborators.
Reagent and Tool Sharing: Reagents generated by my group will be shared upon request without condition (aside from potential restrictions placed by other collaborators, funding agencies and the institution). These reagents will be shipped with an explicit statement of free use/sharing/modification. Once a reagent sharing license is generated/identified it will be linked to in this document. This policy includes unpublished reagents and will never require attribution as a condition. If a reagent is obtained from another group and modified, we will offer the modified reagent back to the originator immediately.

Publishing and Data Dissemination

Open Access Journals: I believe that all work should be available to the public to read, evaluate and discuss. I am strongly against the mentality that data/knowledge should be restricted to experts and the like. I will therefore send all papers in which I am corresponding author and have supervised the majority of the work to journals (or their equivalent) which are publicly available. The two major caveats will be for work in which I am a minor (less than 50% effort) collaborator and the primary group leader wants to submit the work elsewhere. This will not exempt any potential major impact publications, no matter how awesome they may be. Delayed open access does not count in this respect.
Open Peer Review: Journals will be selected which publish non-anonymous reviewer comments alongside the articles whenever possible. If this is not done, and if permissible by the publisher and/or reviewers I will re-post the reviewer comments online without any modifications.
Public Forum for Article Discussion: Although I will encourage discussion of articles to occur at the point of publication (for example via the posting of comments directly at the website of the publisher), I will also provide a publicly available summary of every published finding from which I am an author (corresponding or not) and allow commenting at that point too. This discussion post will also link to or contain the reviewer and editor comments where possible. This summary might be a blog post, a facebook post or a google plus post or anything else that might come up in the future. If I am not the primary author or corresponding author I will encourage the first or corresponding author to write the post and link/quote that directly.
Presentations: All presentations of published data will be posted on an online repository such as Slideshare or something similar. My slideshare profile is here. If unpublished or preliminary data is presented privately and then is later published, then those slides will be presented upon publication. Similar to papers, an online blog post or the like will also accompany that upload. If audio or video of the presentation is available, that will be uploaded as well.
Data Sets: All datasets, once published will be made available in manipulable (preferably non-proprietary) formats for further analysis. Based on the scheme set out by the Linked Data Research Center Laboratory, all data will be provided at level 2 or above.

Dave' Blog

Pages