2015-08-06

Cite your data

Science should be reproducible.  Citing your data as clearly and unambiguously as possible is as important a part of science communication as explaining your methodology.

It's not enough to say that you used FNL or GFS or CALIPSO data.  The software that produced FNL and GFS analyses and forecasts evolve over time.  New parameters or methods are introduced.  Bugs are fixed.   Satellite processing algorithms similarly evolve.

These changes are often not well documented.   For someone else to understand (and possibly reproduce) your work, they need to know exactly which version of the data you used for your work.



Data provenance--the history of the data and where you got it from--is key.  For instance, FNL analyses downloaded directly from NCEP between Dec 15, 2009 and Jan 14, 2015 contained many forecast fields set to zero.  To reduce confusion for neophyte users, the RDA stripped those misleading zero fields out of the FNL files we disseminate.  (This also saves us disk space and bandwith.)  Other data centers may save and distribute only the parameters their core users need.

Access time is also critically important.  Data can evolve as bugs are fixed or algorithms are improved.  These small changes may not always be identified by a change in version number.  But, it's possible to trace back data versions with complete provenance information, which includes access location and time.

These issues would cause confusion unless you cite not just the dataset and version number, but also the place you got it from and date you accessed it.

Convinced?  Now let's see how easy it can be to cite your data!

Scroll down to near the end of any RDA dataset home page, found at rda.ucar.edu/dsnnn.n, and you will see our user-friendly citation widget.

The user-friendly RDA data citation widget.
Select the citation format that you want in the pull-down menu and cut and paste the resulting display. If you use a citation management program, e.g. Endnote or BibTeX, click on the blue RIS or BibTeX buttons to download the citation in RIS or BibTeX formats.

Notice that many of our datasets have DOIs, Digital Object Identifiers.

It's so easy, you have no excuses to omit data citation.  Go forth and do reproducible data science!

No comments:

Post a Comment

This section is for people who want to discuss using our data holdings effectively. Moderators will delete irrelevant comments.