It's not enough to say that you used FNL or GFS or CALIPSO data. The software that produced FNL and GFS analyses and forecasts evolve over time. New parameters or methods are introduced. Bugs are fixed. Satellite processing algorithms similarly evolve.
These changes are often not well documented. For someone else to understand (and possibly reproduce) your work, they need to know exactly which version of the data you used for your work.
Data provenance--the history of the data and where you got it from--is key. For instance, FNL analyses downloaded directly from NCEP between Dec 15, 2009 and Jan 14, 2015 contained many forecast fields set to zero. To reduce confusion for neophyte users, the RDA stripped those misleading zero fields out of the FNL files we disseminate. (This also saves us disk space and bandwith.) Other data centers may save and distribute only the parameters their core users need.
Access time is also critically important. Data can evolve as bugs are fixed or algorithms are improved. These small changes may not always be identified by a change in version number. But, it's possible to trace back data versions with complete provenance information, which includes access location and time.
These issues would cause confusion unless you cite not just the dataset and version number, but also the place you got it from and date you accessed it.
Convinced? Now let's see how easy it can be to cite your data!
Scroll down to near the end of any RDA dataset home page, found at rda.ucar.edu/dsnnn.n, and you will see our user-friendly citation widget.
The user-friendly RDA data citation widget. |
Notice that many of our datasets have DOIs, Digital Object Identifiers.
It's so easy, you have no excuses to omit data citation. Go forth and do reproducible data science!
No comments:
Post a Comment
This section is for people who want to discuss using our data holdings effectively. Moderators will delete irrelevant comments.