Saturday, March 31, 2012

What Should Be Done about Reproducibility

A recent Commentary and linked editorial in Nature regarding reproducible science (or rather the lack thereof in science) has been troubling me for a few days now. The article brings to light a huge problem in the current academic science enterprise.

What am I talking about?

In the comment, two former Amgen researchers describe some of the efforts of that company to reproduce "landmark studies" in cancer biology. Amgen had a team of about a hundred researchers called the reproducibility team and their job was to test new basic science findings prior to investing in following up these targets. Shockingly, according to the authors, only 6/53 of these landmark studies were actually reproduced. When things were not reproduced they contacted the authors to attempt to work through the potential problems. This is an incredibly dismal 11% reproducibility rate!

Could it really be that bad?

The first problem is what exactly is meant by reproducibility. In the commentary the authors acknowledge that they did attempt to use additional models in the validation process and that technical issues may have under-lied some of these differences. They also point out that their sample set is biased with respect to the findings. These were often novel and cutting edge type findings and typically more surprising than the general research finding. Also, their definition of reproducibility is unclear. If researcher says drug X has a 10 fold effect on something and the Amgen guys say it has a 3X effect on the process is that a reproducible finding. My initial reaction was that the 89% were thing where the papers said something like thing X does thing Y and there was no evidence supporting that. We don't know, and in a bit of an ironic twist, since no data is provided (either which papers were good and which were bad, or within those, which findings were good and bad) this commentary could be considered both unscientific and non-reproducible itself (also we are awfully close to April Fools Day).

So there is some bad papers out there, who cares?

Reproducibility is at the heart of everything we do as scientists. No one cares if you did something once and for reasons you cant really explain, were never able to do it again. If something is not replicable and reproducble for all intents and purposes it should be ignored. We need measures of these to be able to evaluate research claims, and we need context specificity to understand the breadth of claims. Ill toss out few reasons why this problem really matters both to those of us who do science, and to everyone else.

This is a massive waste of time and money

From the commentary:
Some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis.
Wow, really? Whole fields have been built on these? In a way I don't feel bad for these fields at all. If you are going to work in a field, and are never going to bother even indirectly testing the axioms on which your field is built then you are really not so good at the science. If you are going to rely on everyone else being correct and never test it then your entire research enterprise might as well be made from tissue paper. More importantly, if you are on top of these things you are going to waste time and money figuring out not to follow this up. Hopefully this is the more common case. This really goes back to the difficulty in publishing negative data to let people know which conditions work and which don't.

The reward system for science is not in sync with the goals of the enterprise

Why are people publishing things that they know only happen one out of six times? Why are they over-extending their hypotheses and why are they reluctant to back away from their previous findings? All of these things are because we are judged for jobs and for tenure and for grants on our ability to do these things. The person who spends 3 years proving that a knockout mouse model does not actually extend lifespan walks away with nothing, the one who shows it (even if done incorrectly) gets a high impact paper and a job. Even if it didn't take an unreasonable amount time and effort to publish non-reproducible data, the risk of insulting another researcher or not contributing anything new might be enough to prevent this. Until the rewards of publishing negative or contravening data are on par with the effort, people just won't do it.

This reflects really poorly on science and scientists

Science is always and probably has always been under some type of "attack". Science as an entity and scientists as their representatives need to not shirk this off or ignore it. We have to deal with this problem head-on, whether it be at the review level or at the post-publication level. People who are distrustful of science are rightful to point at this and say, why are we giving tens of billions of dollars to the NIH when they are 89% wrong. Why not just give that money to Amgen, who seem to be the ones actually searching for the truth (not that they will share that data with anyone else).

Can anything be done?

The short answer is its really going to be difficult and its going to rely on a lot of moving parts. Reviewers should (and in my experience do) ask for explicit reprodicibility statements in the papers. This can go farther, if someone says this blot is representative of 5 experiments then there is no reason the other 4 couldnt be put in the supplement. If they looked at 100 cells and show just one, then why cant the rest be quantified in some way. Post-publication, there should be open (ie not just in lab meetings) discussion of papers and the problems and where they match or mismatch with the rest of the literature. Things like blogs and the Faculty of 1000 are great, but how often have you seen a negative F1000 review? Finally, eventually there ought to be some type of network of research findings. If I am reading a paper, and I would like to know what other results agree or disagree with this, it would be fantastic to get there in a reasonable way. This is probably the most complicated, as it requires not only publication of disagreeing findings, but also some network to link them together.

Begley, C., & Ellis, L. (2012). Drug development: Raise standards for preclinical cancer research Nature, 483 (7391), 531-533 DOI: 10.1038/483531a

Creative Commons License
What Should Be Done about Reproducibility by Dave Bridges is licensed under a Creative Commons Attribution 3.0 Unported License.