Scientists Must Share Early and Share Often to Boost Citations

First Posted: Oct 03, 2013 10:34 PM EDT

“Publish or perish” is a well-known maxim within academia. It is introduced to researchers early in their careers, often by a PhD supervisor, keen for his or her students to start building a career.

By Timothy Smith, University of Melbourne

While we researchers tell ourselves that we publish for important and admirable reasons – to contribute to the global field of knowledge, to expand the field – deep down we all know that, at least in part, we are publishing for selfish reasons.

We want to contribute to our CV and expand our chances of getting a job and grant money. To survive in academia, one must “publish early and publish often”.

Unfortunately, this reality has contributed to a culture of secrecy, especially within the sciences. Being “scooped” is a fear that is often talked about and is usually a major argument against sharing the data underlying research paper.

But in these days of “connected science,” “eResearch” and all those other buzzwords and neologisms, can we really afford not to share data?

A paper published overnight by American researchers Heather Piwowar and Todd Vision in the open access journal PeerJ has finally reliably demonstrated what many data sharing advocates have been saying for a long time.

Far from hurting the ability to publish, sharing data in a public repository can actually lead to a tangible benefit to your publication record through increased citations.

Share and share alike

The benefits of data sharing are accepted, at least in theory, by the academic community, but data sharing rates are still low. In a bid to increase rates, research funders have begun to, if not mandate, then “strongly encourage” data sharing practises.

Both the US National Institutes of Health and National Science Foundation now require grant applicants to at least include a data sharing plan in their applications. In Australia, the National Health and Medical Research Council (NHMRC) is a signatory to a joint statement that says some nice things about data sharing, but only in the field of public health.

However, these policies do not yet mandate full data sharing, because, unlike open access publishing, open access data sharing is still seen by the people who generate the thing being made open as a potential threat.

Truly free and open data sharing will only occur when the people that generate the data feel comfortable sharing it. And that will only happen when it can be shown that sharing your data doesn’t hurt your ability to publish.

More citations, please

Citations of your work in other peer-review journal articles are an important factor in determining your track record. They can be thought of (rather crudely) as the intellectual equivalent of Facebook “likes”. Someone who publishes a lot, but isn’t cited often, is like that friend we all have that posts a lot but has nothing interesting to say.

Measuring both the number of papers that someone has published with the number of times that those papers have been cited (as in Hirsch’s h-index), can give a very useful indication of the impact and relevance of their work.

What Piwowar and Vision have shown is that papers that reported on research where the underlying data was made available in a public repository received 9% more citations than similar studies for which the data was not made available.

To arrive at this conclusion they analysed the citation counts of 10,555 papers on gene expression studies that created microarray data. These types of studies routinely generate large amounts of raw data by measuring the activity of sometimes thousands of different genes in multiple samples.

A quarter of the papers analysed in this experiment described studies that made data discoverable in one of the two most widely-used gene expression microarray repositories: the US National Center for Biotechnology Information’s Gene Expression Omnibus and the European Bioinformatics Institute’s ArrayExpress. The remaining 75% merely reported the outcomes of each study’s analysis. The underlying data was not made available for reuse in other studies or to confirm the veracity of the original claims.

By comparing the number of citations that papers with and without publicly shared data available, Piwowar and Vision demonstrated a small but significant increase in the number of times papers with data available were cited.

The “citation benefit” has been studied before, but these prior studies have suffered from a number of confounding factors. Citation rates can be affected by a number of things: the journal that published the paper, its impact factor, citation half-life and open access policy, to name a few.

The large size of this new study meant that these factors (43 in total) could be corrected for and the association between data availability and citation rate isolated with more accuracy.

These findings should go someway to helping convince academics that data sharing can have direct personal benefits as well as benefits to their field at large. Most interestingly, the benefit of increased third-party citations appears to persist (six years) beyond the window when most authors publish subsequent papers on the same underlying data (two years).

Sad as it may seem, personal interest may just be the thing that allows academia to transition, as Piwowar and Vision say, to a culture that simply expects data to be part of the published record.

Timothy Smith does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.

See Now: NASA's Juno Spacecraft's Rendezvous With Jupiter's Mammoth Cyclone

TagsOpen Data