Showing posts with label publishing. Show all posts

Saturday, April 19, 2014

Preprints: Trying Something New in Publishing

As a trainee, having my papers reviewed by experts in the field has been both a frustrating and positive experience. It has been positive, in that in nearly every case my publications have been improved by the process. The enhancements from little embarrassing typos to new ways of conceptualizing our data, and certainly these papers are better for it.

On the other hand, some times it takes forever. One paper went through 18 rounds of submission/resubmission, lasting over 3 years. Another took almost 3 years and 10 submissions. Some of these delays were certainly self-inflicted but in general it takes a really long time for papers to work through their pipeline. Steven Royle recently looked at this more rigorously for the papers his group has published here. In his experience, the average has been about 9 months

This can be bad for the careers of those involved, and for those for whom the data might help. To get around this, we are trying something new with our next paper. We submitted it to bioRxiv as a preprint. The paper can be found here, so go ahead and take a look, I'll wait. The posted version is identical to the submitted version, which was sent to a normal peer-reviewed journal.

What do I hope to gain?

This has been covered really well over the internet including in science, at Haldane's Sieve, and in PLOS Biology. I hope that this will give people in my field a chance to read our work earlier. I also hope that the people who may be interested in reading it will provide some feedback. This paper, like all of our papers gets informally reviewed by colleagues and lab members before it goes out. By putting it out online, I would like a broader audience to be empowered to take a look and give us their thoughts, before the 'final' version is done.

What are the downsides?

There are a couple, we could be scooped, or it could affect our ability to publish it in another journal. For the latter, we used SHERPA/ROMEO to pick a journal that has an established policy that pre-prints are acceptable. As far as getting scooped, I am even less concerned about that. The data is freely available for this paper on GitHub for anyone to use and I think the risk of scooping is dramatically overstated in science.

So take a look, and let us know here or at the paper what you think.

Thursday, July 19, 2012

Why Isn't There Anonymous Post-Publication Peer Review

If pre-publication review is anonymous, and it almost always is why isnt there anonymous post-publication peer review? If there is a benefit to anonymous review, then isn't it odd that the Faculty of 1000 and most journals commenting/letter to editor mechanisms require the submitter to provide a real name and appointment. Would post-publication review of articles suck less if it were anonymous?

Saturday, March 31, 2012

What Should Be Done about Reproducibility

A recent Commentary and linked editorial in Nature regarding reproducible science (or rather the lack thereof in science) has been troubling me for a few days now. The article brings to light a huge problem in the current academic science enterprise.

What am I talking about?

In the comment, two former Amgen researchers describe some of the efforts of that company to reproduce "landmark studies" in cancer biology. Amgen had a team of about a hundred researchers called the reproducibility team and their job was to test new basic science findings prior to investing in following up these targets. Shockingly, according to the authors, only 6/53 of these landmark studies were actually reproduced. When things were not reproduced they contacted the authors to attempt to work through the potential problems. This is an incredibly dismal 11% reproducibility rate!

Could it really be that bad?

The first problem is what exactly is meant by reproducibility. In the commentary the authors acknowledge that they did attempt to use additional models in the validation process and that technical issues may have under-lied some of these differences. They also point out that their sample set is biased with respect to the findings. These were often novel and cutting edge type findings and typically more surprising than the general research finding. Also, their definition of reproducibility is unclear. If researcher says drug X has a 10 fold effect on something and the Amgen guys say it has a 3X effect on the process is that a reproducible finding. My initial reaction was that the 89% were thing where the papers said something like thing X does thing Y and there was no evidence supporting that. We don't know, and in a bit of an ironic twist, since no data is provided (either which papers were good and which were bad, or within those, which findings were good and bad) this commentary could be considered both unscientific and non-reproducible itself (also we are awfully close to April Fools Day).

So there is some bad papers out there, who cares?

Reproducibility is at the heart of everything we do as scientists. No one cares if you did something once and for reasons you cant really explain, were never able to do it again. If something is not replicable and reproducble for all intents and purposes it should be ignored. We need measures of these to be able to evaluate research claims, and we need context specificity to understand the breadth of claims. Ill toss out few reasons why this problem really matters both to those of us who do science, and to everyone else.

This is a massive waste of time and money

From the commentary:

Some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis.

Wow, really? Whole fields have been built on these? In a way I don't feel bad for these fields at all. If you are going to work in a field, and are never going to bother even indirectly testing the axioms on which your field is built then you are really not so good at the science. If you are going to rely on everyone else being correct and never test it then your entire research enterprise might as well be made from tissue paper. More importantly, if you are on top of these things you are going to waste time and money figuring out not to follow this up. Hopefully this is the more common case. This really goes back to the difficulty in publishing negative data to let people know which conditions work and which don't.

The reward system for science is not in sync with the goals of the enterprise

Why are people publishing things that they know only happen one out of six times? Why are they over-extending their hypotheses and why are they reluctant to back away from their previous findings? All of these things are because we are judged for jobs and for tenure and for grants on our ability to do these things. The person who spends 3 years proving that a knockout mouse model does not actually extend lifespan walks away with nothing, the one who shows it (even if done incorrectly) gets a high impact paper and a job. Even if it didn't take an unreasonable amount time and effort to publish non-reproducible data, the risk of insulting another researcher or not contributing anything new might be enough to prevent this. Until the rewards of publishing negative or contravening data are on par with the effort, people just won't do it.

This reflects really poorly on science and scientists

Science is always and probably has always been under some type of "attack". Science as an entity and scientists as their representatives need to not shirk this off or ignore it. We have to deal with this problem head-on, whether it be at the review level or at the post-publication level. People who are distrustful of science are rightful to point at this and say, why are we giving tens of billions of dollars to the NIH when they are 89% wrong. Why not just give that money to Amgen, who seem to be the ones actually searching for the truth (not that they will share that data with anyone else).

Can anything be done?

The short answer is its really going to be difficult and its going to rely on a lot of moving parts. Reviewers should (and in my experience do) ask for explicit reprodicibility statements in the papers. This can go farther, if someone says this blot is representative of 5 experiments then there is no reason the other 4 couldnt be put in the supplement. If they looked at 100 cells and show just one, then why cant the rest be quantified in some way. Post-publication, there should be open (ie not just in lab meetings) discussion of papers and the problems and where they match or mismatch with the rest of the literature. Things like blogs and the Faculty of 1000 are great, but how often have you seen a negative F1000 review? Finally, eventually there ought to be some type of network of research findings. If I am reading a paper, and I would like to know what other results agree or disagree with this, it would be fantastic to get there in a reasonable way. This is probably the most complicated, as it requires not only publication of disagreeing findings, but also some network to link them together.

Begley, C., & Ellis, L. (2012). Drug development: Raise standards for preclinical cancer research Nature, 483 (7391), 531-533 DOI: 10.1038/483531a

What Should Be Done about Reproducibility by Dave Bridges is licensed under a Creative Commons Attribution 3.0 Unported License.

Sunday, February 26, 2012

Future Bridges Lab Rules version 0.1

Lab Rules

Version 0.1.3 on July 14, 2012 by Dave Bridges

Remember when you were growing up and you would say, well when I’m older I will (or won’t) do that.
I have been thinking of that, for my future when I run my own group.
It is fairly easy (and as a bit of a blowhard I do this all the time) to say I would do this, or I would do that.
I think posting this publicly will encourage me to stick to these rules.
Below are some roles and a bit of rationale and caveats.
This is the first version of this post but future versions will include links to the previous versions.
The version numbering is described in the Lab Policies README.
Check out the GitHub Repository for more granular changes.

Supervision of Trainees

Trainee-Advisor Contract: Both myself and trainees (either mine, or co-supervised trainees) will read, discuss and sign a contract describing our roles and responsibilities both as a trainee/employee and a mentor. This will include data dissemination/publishing rules, expectations of productivity, note keeping and time commitment, rules for dealing with other members both in my group and in collaborations, rules for sharing of reagents and data, rules for adjucating disagreements and grounds and procedures for termination. These rules will be in conformity with any institutional rules. Exceptions can be discussed and the agreement can be modified throughout the term of the relationship. I will post a generic version of this agreement in a publicly viewable location.
Online Presence: All trainees will appear on the laboratory website and write a blurb about their research interests and goals. Trainees will be strongly encouraged to blog, tweet and otherwise engage in social networking tools regarding their research and the work of others, but this is not required. Links to their publicly available social network profiles will be posted on the laboratory website.
Open Access Policy: Trainees will be made aware of the open publishing, dissemination, software and data/reagent sharing policies of the laboratory at the outset and will have to agree to these standards.

Reagents, Software and Tools

Software Usage: Wherever possible, free open source software will be used for data acquisition, analysis and dissemination. Exceptions will be made if necessary, but trainees will be encouraged to use/incorporate/development free tools.
Software Development: If software, scripts or the like are generated they will be released under a permissible open source license such as CC-BY and the license will be attached explicitly to the source code. Scripts and programs will be uploaded to a public revision control database such as GitHub or similar (my GitHub profile is here).
Publishing of Protocols and Scripts: When not present in the published article, detailed step by step protocols, data analysis scripts and other things which cannot fit into either methods and materials sections or supplementary materials will be posted online and linked to the publication’s online presence (post or as a comment on the paper’s website).
Protocol Sharing: Protocols will be made available online in a wiki format in a publicly available location, whether they have been published on or not. Editing will be restricted to laboratory members and collaborators.
Reagent and Tool Sharing: Reagents generated by my group will be shared upon request without condition (aside from potential restrictions placed by other collaborators, funding agencies and the institution). These reagents will be shipped with an explicit statement of free use/sharing/modification. Once a reagent sharing license is generated/identified it will be linked to in this document. This policy includes unpublished reagents and will never require attribution as a condition. If a reagent is obtained from another group and modified, we will offer the modified reagent back to the originator immediately.

Publishing and Data Dissemination

Open Access Journals: I believe that all work should be available to the public to read, evaluate and discuss. I am strongly against the mentality that data/knowledge should be restricted to experts and the like. I will therefore send all papers in which I am corresponding author and have supervised the majority of the work to journals (or their equivalent) which are publicly available. The two major caveats will be for work in which I am a minor (less than 50% effort) collaborator and the primary group leader wants to submit the work elsewhere. This will not exempt any potential major impact publications, no matter how awesome they may be. Delayed open access does not count in this respect.
Open Peer Review: Journals will be selected which publish non-anonymous reviewer comments alongside the articles whenever possible. If this is not done, and if permissible by the publisher and/or reviewers I will re-post the reviewer comments online without any modifications.
Public Forum for Article Discussion: Although I will encourage discussion of articles to occur at the point of publication (for example via the posting of comments directly at the website of the publisher), I will also provide a publicly available summary of every published finding from which I am an author (corresponding or not) and allow commenting at that point too. This discussion post will also link to or contain the reviewer and editor comments where possible. This summary might be a blog post, a facebook post or a google plus post or anything else that might come up in the future. If I am not the primary author or corresponding author I will encourage the first or corresponding author to write the post and link/quote that directly.
Presentations: All presentations of published data will be posted on an online repository such as Slideshare or something similar. My slideshare profile is here. If unpublished or preliminary data is presented privately and then is later published, then those slides will be presented upon publication. Similar to papers, an online blog post or the like will also accompany that upload. If audio or video of the presentation is available, that will be uploaded as well.
Data Sets: All datasets, once published will be made available in manipulable (preferably non-proprietary) formats for further analysis. Based on the scheme set out by the Linked Data Research Center Laboratory, all data will be provided at level 2 or above.

Sunday, February 5, 2012

Chickens and Eggs

Yesterday two posts appeared in my feed both challenging the requirement of glamor mag (Nature, Science, Cell) level publications for career advancement. Michael Eisen (@mbeisen) wrote a post in response to this idea suggesting that this is not a criteria for hiring in his experience (he is referring to job applications where he is, at UC Berkeley as well experiences with his trainees). Key point:

My own lab provides several examples that demonstrate this reality. My graduate students have gone on to great postdocs and many have landed prestigious fellowships “despite” having only published in open access journals. More curiously, I have had four postdoctoral fellows go out onto the academic job market, who all got great jobs: at Wash U., Wisconsin, Idaho and Harvard Medical School. Not only did none of them have glamour mag publications from my lab. None of them had yet published the work on the basis of which they were hired! They got their interviews on the basis of my letters and their research statements, and got the jobs because they are great scientists who had done outstanding, as of yet unpublished, work. If anything demonstrates the fallacy of the glamour mag or bust mentality this is it.

In fact, as a co-founder of PLOS and a strong, vocal advocate of open science his group primarily publishes in PLOS journals. He hasn't been the last author on a non-PLOS paper since a PNAs paper in 1998, so he is certainly putting his science where his mouth is. Earlier in the day, William Gunn (@mrgunn) made a similar argument:

I'm starting to think that the plodding careerists who always raise the "but I have to publish in X journal for my career" criticism just need to be routed around. You shouldn't be in science because you want a stable career, you should be here because you can't fathom doing anything else.

Now these are both admirable positions to take. But as the title alludes to, this is a chicken and egg problem. If a postdoc decides to only publish in open access journals then he hopes that prospective departments and grant committees agree with his stance. If a faculty member takes this stance he hopes that grants and tenure/promotion committees agree. If tenure and promotion committees agree, then they hope granting agencies agree. If granting agencies agree then they hope that the public (or their foundations or government agencies agree). If any link breaks, then its a risk. As someone who agrees with this, I might find a department happy with this policy, but if my NIH study section isn't on board then I am in some trouble.

Taking this further, I thought why should this even only apply to open access and open review. Lets say I do all my research totally in the open, self-publishing it online either on my own site or on a pre-print server like ArXiv or the newer Faculty of 1000 Research and engaging in discussion on these forums. If I completely ignored journals entirely, would anyone accept this as being ok? I posted it on twitter, and there were positive responses, but that is really hard to imagine.

Without anonymous peer-review how could I (or the reader) be assured that controls were done properly and the context of the work was appropriately stated. If that is done via peer review posted with the article anonymously how could the reader be sure I didn't just delete the bad reviews or comments. If I post some data on a blog or pre-print server and some other person finds it, expands on it and publishes it in Science then do I have any right to feel aggrieved? Who should get the credit?

In an ideal world, things might work analogously to how Rosie Redfield (@rosieredfield) has addressed the arsenic life question. After posting an initial rebuttal online, Dr. Redfield did some experiments, engaged with the community about the data and put it all together. This was (to me at least) the first archetypical evidence of the open evaluation of a research claim. It was done in the open, public suggestions were incorporated, the work was posted to a preprint archive and then in the end.... it was submitted to Science.

Now this is not entirely fair to Dr. Redfield, Science was chosen as that was where the first arsenic paper was published and where her (and other) critiques were published. But if this, most open and public scientific re-evaluation, still needs glamour mag validation what hope does the rest of research have?

So who should mandate this? Various policies of public access have helped make publicly funded research open access and a new generation of scientists has shown more proclivity towards goal but who needs to take the first steps. Dr. Eisen suggests it has to be everyone. For this change to happen, primary researchers, group leaders, departments, granting agencies and the public all need to take that step, and leave those who are betrothed to impact factor chasing looking like relics of the past.

Chickens and Eggs by Dave Bridges is licensed under a Creative Commons Attribution 3.0 Unported License.

Saturday, January 15, 2011

Last Year in Science

I hope to put together a series of posts on papers from about a year ago. Quite often the context of a paper can get lost in the flurry surrounding the initial release of a paper. My hope is that I can provide a little bit of insight on these papers with a little bit more since publication. If you have any ideas for things that might be interesting to go over (again) just let me know. For now I'll try to read some of the glamor mags in my field (Cell, Cell Metabolism, Nature Cell Biology, Nature and Science) and see if anything strikes my interest.

Sunday, January 9, 2011

The Web of Data and Experimental Observations

A few things this week got me thinking about the idealized best way to think of experimental data. One was a technical problem I had been pondering. If i wanted to publish some experimental observation (not a paper just a single observation) what is the best way to do this. It got me thinking a little about the Semantic Web (or the Web of Data) and how it could related to 'wet' biology. I am far from an expert on any of these things, so feel free to make public your thoughts

The Web of Data and Ontologies

One of the new things the architechts of the internet have been concerning themselves lately is the semantic web. Some authoritative links are here, here and here. The idea is that there are lots of things out there which are data but the web considers mostly things that are documents. The world will be a better place when computers can make connections between these things. This involves two concepts, one of which is the thing and the other of which is the connection

Things on the Web

Not everything is on the web. I for example, am sitting in my living room and am definately not on the web. Therefore to locate me, I need some kind of identifier. These are called Uniform Resource Identifiers (URI). Mine could be something like http://davebridges.github.com#davebridges. URI's need to be unique and they need to be available on the internet. Anything could have a URI, and something could have several URI's. The key is that a URI should not belong to more than one thing. Things which have multiple URI's can be crossreferenced with specific vocabularies (ie owl:sameAs).

Connections (Ontologies)

Once things are on the internet, the basis of linked data is how these things relate to one another. For example, this blog post was created by me. So if there was some kind of explicit statement connecting this, any computer could figure out that I wrote this post, or inversely that this post was written by me. The connections are defined by specific vocabularies or ontologies. For example dublincore is a vocabulary about documents, and includes a term "creator". Therefore one could create a link between me and this post by writing something like this:

This Blog Post has a Creator named Dave Bridges

The important thing is that the ontology specifically defines the relationship between two URI's. Given this knowledge, a computer could generate the creator of the page, or all pages created by me.

How Would This Work in Science

What got me thinking about this was how it would be great to have defined vocabularies to describe experimental results. For example if there was an ontology that described a protein-protein interaction (there is, its at http://bioportal.bioontology.org/ontologies/39508), one could use, for example two PubMed links as URI's to could indicate a molecular interaction and the two proteins. Given a large enough catalog of these it would be possible to get a list of all molecular interactions for a particular protein.

What About Non-Cannonical Findings

I might talk about this later, but one thing important would be to not just be able to obtain a list of interactions, but also links to the specific data supporting (or refuting that point). Ideally this would go deeper than just a link to the paper, but maybe a link to a separate URI describing a particular experiment.

The things that got me thinking about this were a question i posted on BioStar, a blog post on MolBio Research Highlights and a paper at Nature Preceedings

Dave' Blog

Pages