2016-11-28

NCAR's Research Data Archive featured on UCARConnect 2

The November 2016 issue of UCARConnectData: The Currency of Science is now live.  The NCAR RDA is proud to have contributed to this issue.

RDA Data Specialist, Grace Peng, wrote Big Data, Big Planet
We’re experiencing a big data explosion both in cultural awareness and in penetration into many aspects of everyday life. How did we get here? What role do weather and climate play in our data moment?

Big data is characterized by its "Vs": Volume, Variety and Velocity. Data volume is “big” if it is too large to reside on or be processed with a personal computer.

Did you know that even in the pre-computer era, weather data was the original big data use case? It’s most useful when we have a lot of it and we need the data in real-time. Furthermore, we use a large variety of physical measurements to characterize the state of the atmosphere and ocean.
RDA manager, Steve Worley, and former intern, Sophie Hou, speak about the history of the RDA and scientific data in general in this video.

Sophie Hou has joined NCAR permanently as our new Data Curation & Stewardship Coordinator. Read her career profile about what led her to this exciting career.

UCARConnect is a (roughly) bimonthly publication of UCAR aimed at science enthusiasts of all ages, including K-12 students and teachers.

2016-11-18

GRIB2 file carpentry

We've learned from several users that the second WRF method in our post about How to work around operational changes to GFS/GDAS/FNL does not work.

Use the first method described.
First, you can work around the problem by removing the new vertical levels in the files after the GFS/GDAS change. Then all of your WPS input will be consistent. You can do this yourself with wgrib2, or our "Get a Subset" tool.
Here's the detailed recipe for those of you not familiar with wgrib2 tool for manipulating grib2 files.

We're going to use tip #4 and #4a from Wesley's Tricks for wgrib2 page to extract a range of records from a grib2 file.

Start by making a short inventory to identify the records that you want to keep and to remove.
wgrib2 gdas1.fnl0p25.2016051106.f00.grib2 > gdas1.fnl0p25.2016051106.f00.inv

wgrib2 gdas1.fnl0p25.2016051112.f00.grib2 > gdas1.fnl0p25.2016051112.f00.inv

Note that gdas1.fnl0p25.2016051106.f00.inv has 322 lines and gdas1.fnl0p25.2016051112.f00.inv has 352 lines. The extra 30 lines/records are in records 5-34.
We'll extract fields 1-4 and 35-352 separately and then recombine them with the UNIX command, cat.  It would be tedious to type in each and every file I want to convert. To save time, I put all the files I want to convert into one directory and ran this bash script.
#!/bin/bash

for file in gdas1*f00.grib2; do
  wgrib2 $file -for 1:4 -grib temp1
  wgrib2 $file -for 35:352 -grib temp2
  cat temp1 temp2 > $file.27
done

2016-10-31

NCAR RDA outage Nov 1, 2016

UPDATE Nov 1 15:11:39 MDT 2016: All RDA services are back in service. Thank you for your patience.

NCAR CISL central file disk systems will be down for system maintenance from 8:00 MDT on Tuesday, November 1, 2016 for approximately 4 hours.

All data downloads, including access through THREDDS servers, will be unavailable during this outage.

NCAR CISL supercomputers will be down for system maintenance from 8:00 MDT on Tuesday, November 1, 2016 for approximately 8 hours.

Custom data orders will be held and automatically restarted after the work is complete.

Change notice from the Daily B:
The Yellowstone, Geyser, and Caldera clusters will be unavailable from 8 a.m. to 6 p.m. MDT Tuesday, November 1, to allow CISL staff to apply a critical security patch. The GLADE file systems will be unavailable from 8 a.m. to noon.

A system reservation will be put in place 12 hours before the scheduled downtime. Users’ jobs with specified job times that overlap the reservation period will remain on hold until the system is restored to service. Running jobs that have not finished by 8 a.m. Tuesday will be killed and will need to be resubmitted after the maintenance period.

We apologize for any inconvenience this might cause. Users will be informed via the CISL Notifier service when the systems are returned to production.

2016-10-28

How to work around operational changes to GFS/GDAS/FNL

Operational models are moving targets.
Any changes in software for an operational analysis can result in spurious signals or shifts. Operational systems' software is frequently changed as they uncover bugs or biases and fix them; code segments are improved to better represent atmospheric phenomena. The changes are usually not announced ahead of time. The change log may be difficult for the non-expert user to decipher. Thus, operational analyses are not appropriate for compiling a long time series to study changes over time, e.g. to look for climate signals.
A handful of users in the last week have contacted rdahelp@ucar.edu about problems initializing WRF with FNL data spanning 2016-05-11. Why does WPS, the WRF Preprocessing System, crash right at this juncture?

The answer lies in the operational changes made to the GFS/GDAS system at NOAA NCEP. You can view the full list of changes to GFS/GDAS since 1991 including the announcement of the changes effective 2016-05-11 at 12 UTC.
- Addition of five layers in the upper stratosphere in gridded output
This change affects ds084.1 NCEP GFS 0.25 Degree, ds083.2 NCEP FNL 1.0 Degreeds083.3 NCEP FNL 0.25 Degree and ds335.0 Historical Unidata Internet Data Distribution (IDD) Gridded Model Data.

The first thing to do when things break is to examine your data. You can use wgrib2, or take a close look at the metadata for your input data set, as I've shown earlier, in WRF Vtable Carpentry.

Scroll down to the "Vertical Levels" section of the data set information page and click on "detailed metadata."

You will see that six parameters (HGT, TMP, RH, UGRD, VGRD, 03MR), many used by WRF, received five additional levels (1, 2, 3, 5, 7 mb) beginning 2016-05-11 at 12 UTC.


These changes do not affect the performance of ungrib.exe, but they may impact real.exe and wrf.exe.

The RDA team (rdahelp@ucar.edu) helps our data users with WRF WPS up to providing an accurate Vtable that describes the data obtained from us. The WRF team (wrfhelp@ucar.edu) picks up support from there. This problem spans both support groups.

I contacted WRF to coordinate our help and I am posting information beyond RDA's normal scope here as a service to our users so that all the information to get unstuck resides in one place.

First, you can work around the problem by removing the new vertical levels in the files after the GFS/GDAS change.  Then all of your WPS input will be consistent.  You can do this yourself with wgrib2, or our "Get a Subset" tool.

UPDATE Nov 18, 2016
The following method, provided by wrfhelp, does not work. You need to use the first method above. For details on how to achieve this using wgrib2, read GRIB2 file carpentry.

If you want to use all of the available data in a WRF run that spans a discontinuity of data:
  1. Run ungrib.exe as usual
  2. Using a namelist that accurately describes the number of vertical levels found in the input GRIB2 files before the change (27 in this case), run metgrid.exe and real.exe for the times before the change.
  3. Edit your namelist to the new number of vertical levels (32) and then run metgrid.exe and real.exe on the remainder of the GRIB2 input files for the period after the change.
  4. This generates a complete set of wrfbdy and wrfinput files for your entire WRF run.
  5. Run wrf.exe up until the discontinuity with 27 in the namelist.  
  6. Then edit the namelist for 32 vertical levels and restart wrf.exe for the remainder of the run.
Pat yourself on the back. You've just completed numerical integration of a complex system of differential equations around a discontinuity. ;-)

2016-10-18

Trip Report: Rocky Mountain Celebration of Women in Computing 2016

Last month, I attended Rocky Mountain Celebration of Women in Computing 2016 (RMCWiC) along with Dr Natasha Flyer and (soon-to-be Dr) Delilah Feng.  The 280+ registered attendees (including over 160 college students) overwhelmed the Hotel RL in downtown Salt Lake City.

Can you find the NCAR CISL representatives in this photo?
The Rocky Mountain Celebration of Women in Computing, an ACM (Association for Computing Machinery) Celebration event, is the biennial event for the Rocky Mountain Region that encourages the career interests of women in computing...
NCAR's Computational Information Systems Laboratory (CISL) jointly sponsors RMCWiC and we actively recruit visiting students and scientists at these meetings.

My favorite part was the poster session, where students presented their work.  The breadth, quality of work and enthusiasm were very high.  I enjoyed talking to the students so much, they ran out of dessert before I remembered they were serving it!
A very lively poster session.
Computing is very broad; even the work within CISL is very broad.  I lingered longest at the talks featuring embedded sensors, real-time data processing and data visualization.  When one student told me she was ready to tackle higher dimensional data problems, I was thrilled.
This student presented data visualization research about how to effectively convey uncertainty, a subject of high importance in meteorology and climatology.
If you are a student or recent PhD, consider applying to one of NCAR CISL's visitor programs.  We are particularly interested in hosting undergraduate and graduate students for our Summer Internships in Parallel Computational Science (SIParCS) program.  These summer internships pay local living expenses and cash stipends.  Furthermore, SIParCS students work with one or more mentors throughout their stay and may be awarded travel stipends to present their SIParCS work at conferences such as AGU and AMS.

I worked part-time or was a stay-at-home mom for many years before re-entering the scientific computing workforce.  Career re-entry candidates are also encouraged to apply.  Feel free to contact me about career re-entry opportunities offered by NSF, NASA and other government agencies.

2016-09-28

WRF Vtable Carpentry

When we announced the introduction of WRF-able data sets (Vtables auto-generated from RDA-collected metadata) in January 2016, WRF Preprocessing System (WPS) component, ungrib, was able to use the RDA Vtables without any hiccups.

Around the time of the release of WRF 3.8.0 in April 2016, rdahelp@ucar.edu began receiving reports of error messages when running ungrib with RDA-provided Vtables.  If the Vtable includes lines describing fields that are not contained in that particular input GRIB file, ungrib stops instead of ignoring that field (as it had done previously.)

WPS V3.8 Updates

The simple solution is for the user to edit the RDA-provided Vtable to remove the extraneous line(s) referring to fields not contained in the input GRIB file.

After investigation and consideration, wrfhelp and rdahelp decided not to change either the WPS code or the auto-generated Vtables to return to plug and play.  Here's why.

The RDA provides archives of analysis, forecast and reanalysis data sets.  Reanalysis data sets contain the same variables/parameters throughout the entire time series.  Operational analysis and forecast systems vary over time.  Parameters or vertical levels may (and should!) change over the time series.

With advances in computational power and modeling of physical processes, newer fields and levels can be added.  Bandwidth is limited.  Dropping vestigial parameters when they are replaced with better ones is necessary.

RDA-provided Vtables are complete.  They include lines describing every WRF-usable parameter found in the data set time series.  To see how parameters and vertical levels changed over time, click on the "Variables by dataset product" or "detailed metadata for level information" links from the data set home pages.

Consider the long-running NCEP analysis series, FNL, ds083.2. When it began in 1999, it was in GRIB (aka GRIB1) format. Beginning in 2007, it is offered in GRIB2 format. Several times, the model was changed and fields were added (and sometimes subtracted.)

View the vertical level information of long-running NCEP analysis series, FNL, ds083.2.  Select "detailed metadata" to view GRIB1 details; select "GRIB2 level table" to view GRIB2 details.

Defaults to "Parameter View".  Also explore "Vertical Level View."
Notice that the soil layers expanded from a single layer 10-200 cm below the surface to three layers in 2005
Vtable.RDA_ds083.2 contains lines describing the entire time series from 1999 to the present. GRIB1 codes are on the left; GRIB2 codes are on the right. Notice how there are lines with GRIB1 or GRIB2 codes only. (One set of columns is blank.) If you are using a GRIB2 file (2007 and later), then edit out the lines that pertain only to GRIB1 files (GRIB2 columns blank.) If you are using a GRIB1 file, delete the lines referring to parameters found only in the GRIB2 files.

Delete the lines that refer to variables not found in the file you are using.
Similarly, if you are using a 2005 and older GRIB1 file, delete the six lines referring to three layers of soil temperature or moisture between 10 and 200 cm. If you are using a 2005-05-31 and later GRIB1 or GRIB2 file, delete the two lines referring to Soil temperature/moisture 10-200 cm below ground.

These soil layers are found only in FNL files 2005-05-31 and later.  Delete for earlier dates.
In closing, "plug and play" is both a blessing and a curse.  We all want things that "just work" right out of the box.  But, your data should not be a black box.  You don't need to become an expert in the arcana of data syntax, but you should peek "under the hood" and understand what is in your data file.

To learn more about GRIB data sets, read our GRIB blog series, particularly Setting up to work with GRIB2. Use wgrib and wgrib2 to explore the contents of your GRIB or GRIB2 files. You might find a parameter you aren't already using but might be helpful. Or, you may see a lot of parameters and vertical layers that you don't need. In that case, may I suggest you Subset to save time and bandwidth?

2016-09-16

NCAR RDA partial outage September 19, 2016

The NCAR CISL High Performance Storage System (HPSS) will be down for Monday, September 19, 2016,  to prepare for the installation of Cheyenne, NCAR's new high-performance supercomputer.


Although the RDA's primary and backup copies of data are stored on HPSS, the most popular data, approximately a third of our holdings, are replicated on our web server for faster data access. Thus, most users will not be impacted.

Users requesting HPSS data will experience a delay, but need not do anything special.  HPSS data services will automatically resume upon completion of this work.
SCHEDULED OUTAGE NOTIFICATION

HPSS will be down next Monday Sept 19 from 8:30 AM to 6:30 PM for hardware and systems integration work to help prepare the way for HPSS support of Cheyenne.

START: Mon Sep 19 2016 8:30 AM MDT
END: Mon Sep 19 2016 6:30 PM MDT

2016-09-08

NCAR RDA outage September 14, 2016

UPDATE Wed Sep 14 14:13:35 MDT:
All RDA data services are back online. Thank-you for your patience.

UPDATE:
NCAR computers will be down for 10 hours, beginning Wed Sep 14 2016 6:00 AM MDT.  All RDA services will be down during that time.

START: Wed Sep 14 2016 6:00 AM MDT
END: Wed Sep 14 2016 16:00 AM MDT

NCAR CISL central file disk systems will be down for system maintenance from 7:00 on Wednesday September 14, 2016 until 10:00 MDT. All data downloads, including access through THREDDS servers, will be unavailable during this outage. Custom data orders will be held and automatically restarted after the work is complete.
START: Wed Sep 14 2016 7:00 AM MDT
END: Wed Sep 14 2016 10:00 AM MDT

SERVICES AFFECTED
GLADE Administration

2016-08-30

How to access a restricted data set

I alluded earlier to a few* exceptions to open access.  They are open to varying degrees due to restrictions from the data providers (not from RDA.)

To see what is available to you,
  • Click on "dashboard" (at the top of the RDA web portal, right next to "sign out"
  • Then click on "Edit/Change Profile"
  • The bottom half of the screen for an unaffiliated user will look something like this:
Restricted data sets at the RDA.
  • ECMWF Operational data is only accessible to users at UCAR member institutions in the U.S. and Canada, as well as users at all other U.S. universities and government institutions. Eligible users must agree to the ECMWF Terms of Use (TOU).
  • JRA products are available to all affiliated users.
  • COSMIC data access is granted only to affiliated users who have also registered at CDAAC.
  • ECMWF Data (Other Than Operational Data) is available to all registered users who agree to their TOU.
Suppose you want to compare how JRA-55 (ds628.0) and NCEP/NCAR Global Reanalysis (ds090.0) differ over your region of interest.  Read their data set Description pages.  Note that ds628.0 has sections labeled "Access Restrictions" and "Usage Restrictions" while ds090.0 does not.

JRA-55 Access and Usage Restriction information.
Click the "Data Access" tab for JRA-55 and you will see the screen below if you are nonaffiliated. If you are eligible (affiliated), you will be taken to a page with the Terms of Use. Read and accept the Terms of Use and you are then able to access the data.

Nonaffiliated users see this.
If you do have a valid and current affiliation, edit your profile.  Reply to the verification email we send, and then you will be able to read and accept the Terms of Use.
Edit your profile here.  We WILL confirm your email to be sure that it is current.
Some have been reluctant to update their email/affiliations when they leave their universities. Losing access to something you once had is a bummer. I still miss the Olympic-sized swimming pool at Berkeley.

But, users of restricted datasets need to reverify their status by replying to verification email notices sent out every six months. Thus, you don't lose anything by keeping your email and affiliation status up-to-date. In fact, you lose access to email notices that we send out about outages and important updates about data you have downloaded from us in the past.

* Only 78 of our 600+ data sets belong to one of the restricted classes of data sets.

2016-08-26

Yes, Ireland does have droughts: Notes from the 9th ACRE Workshop

The RDA hosts many climate reanalysis collections, yet only two in our archive are unique in that they extend beyond the modern (post-1948) radiosonde era: the NOAA-CIRES 20th Century Reanalysis (1851–2014; 20CR) and the ERA-20C (1900–2010) produced under ECMWF's ERA-CLIM project.  In order to produce a climate reconstruction dating back over a century, both of these reanalyses rely on assimilating very old surface and marine weather observations, which have been assembled into long-term observational datasets such as the International Surface Pressure Databank (ISPD) and the International Comprehensive Ocean-Atmosphere Dataset (ICOADS).  How do these historic observations find their way into a modern climate reconstruction such as 20CR and ERA-20C?

It all starts with the Atmospheric Circulation Reconstructions over the Earth (ACRE) initiative, which is an international collaboration between meteorological organizations, atmospheric scientists, librarians, and volunteers who are dedicated to the recovery and digitization of historic global weather observations.  By facilitating the process of rescuing, digitizing, and performing quality control of data from old weather log books, ACRE ensures that these important observations provide a better understanding of past climate, which in turn aids in modern climate applications and studies of climate impacts.

The 9th ACRE Workshop


ACRE holds annual workshops to discuss all things related to data rescue, and I attended the 9th ACRE Workshop and Historical Weather and Climate Data Forum, held June 20-24 at Maynooth University in Ireland.  The meeting agenda was populated with talks illustrating the full cycle of how historic data observations make their way from being discovered and digitized all the way to being assimilated into climate reconstructions and applications.

Most notably, the annual ACRE workshop is held in different locations around the world, and the local ACRE partners which host the workshop are featured prominently in the meeting agenda.  This not only brings attention to data rescue efforts in the region which hosts the workshop and encourages local groups to become more involved, but it also highlights the importance of how data rescue facilitated by ACRE leads to a better understanding of the past local climate and subsequent future impacts.  At the most recent meeting in June, climate scientists from Maynooth University and Met √Čireann, the Irish national weather service, gave illuminating talks on how they're using ACRE supported observations and reanalyses to understand long-term Irish precipitation trends, the Irish droughts of the 19th and 20th centuries, and how weather played a role in the Easter Rising of April 1916.


St. Patrick's College at Maynooth University provided a beautiful setting for the 9th ACRE Workshop

Other highlights included presentations on the following topics:
  • Dona Cuppett and Rick Crouthamel from the International Environmental Data Rescue Organization (IEDRO) volunteer their time to travel to developing countries around the world to assist in setting up a data rescue and digitization operation.  Their presentation described the logistical challenges involved in undertaking this effort, and underscored the importance of rescuing historic weather observations in developing countries, which are most vulnerable to climate impacts.
  • Reports on efforts to scan and digitize pages from historic weather logs.  These included talks on citizen science projects such as OldWeather.orgWeatherDetective, and the Sir Charles Todd folios project led by Mac Benoy.
  • Stefan Br√∂nnimann presented on the Mount Tambora eruption of 1815 and the subsequent period of global cooling, commonly known as the "Year Without a Summer".
  • Dispatches from ACRE regional chapters leading data rescue efforts around the world.
  • Updates from various groups leading climate reanalysis initiatives, including the 20CR, UERRA, and ERA-CLIM2 projects.


Wanted: Southern Hemisphere data


The benchmark of a good climate reanalysis is performing better than climatology, yet the uncertainty in a reanalysis is controlled by the amount of underlying observations assimilated into the analysis.  More observations correspond to a higher confidence in the analysis, whereas a data sparse region returns an analysis with low confidence.  For long-term reanalyses such as the 20CR and ERA-20C, the analysis in the 19th and early 20th centuries poses a challenge due to the overall scarcity of input data.  In particular, there is a great need for data in the Southern Hemisphere, not only for earlier periods, but for more recent years as well.  Philip Brohan's "Fog of Ignorance" visualizations have become a well-known highlight at ACRE meetings, and illustrate how ground-truth observations improve the certainty in historical climate reconstructions.

Antarctic weather 1909-10 from Philip Brohan illustrates the "Fog of Ignorance" in the 20th Century Reanalysis due to a lack of observations. Note the grey fog disappears in the vicinity of observations (marked by yellow dots) made available by ACRE facilitated data rescue.

ACRE is therefore making a strong push to recover historical data from south of the Equator, with efforts being led in regional chapters such as Chile, Southeast Asia, Southwest Pacific, and Antarctica, the latter of which is recovering observations from Antarctic whaling expeditions in the late 19th and early 20th centuries.

Learn more about ACRE at http://met-acre.net/.

2016-08-19

NCAR RDA outage August 22-24, 2016

UPDATE Thu Aug 25, 2016:
All systems are back up and operating normally.  Thank-you for your patience.

NCAR CISL supercomputers and central file disk systems will be down for system maintenance from 0:01 MDT on Monday, August 22, 2016 until late Wednesday, August 24.  The HPSS systems will begin to shut down at 21:00 MDT on Sunday, August 21.

When we come back, we'll have more storage and be able to provide more data for you on disks.

From the Daily B:
The Yellowstone, Geyser, Caldera, and GLADE systems will be unavailable beginning at 12:01 a.m. Monday, and the HPSS system will be unavailable starting at 9 p.m. Sunday, earlier than previously announced. All of these systems are expected to be back in production by late Wednesday afternoon, August 24.

Facilities upgrades during the downtime are related to expansion of the GLADE file systems and preparation for the new Cheyenne HPC system at the NCAR-Wyoming Supercomputing Center. Power will be unavailable for much of that period.

When the systems are returned to service following the downtime, users will have larger GLADE home space quotas. The current 10-GB quota will be increased to 25 GB. The larger GLADE file systems will serve both the Yellowstone and Cheyenne supercomputers. Users will be informed via the CISL Notifier service when each system is returned to production.

2016-08-01

NCAR RDA Partial Outage Aug 2, 2016

NCAR CISL supercomputers and central file disk system will be down for system maintenance 7:00 to 17:00 MDT on Tuesday Aug 2, 2016.

The RDA web server will remain in operation, but users will not be able to transfer data while the central file disk system is unavailable.  Additionally, data subset processing and translation services will be unavailable during this outage.

2016-07-13

What spatial resolution can I expect when using gridded dataset products?

Occasionally, rdahelp@ucar.edu fields requests for higher grid resolutions than what we can currently offer. Some users wonder why older data is not available at the higher resolutions in newer data.

The requests have often come from new users of RDA resources--beginning researchers or experienced researchers from communities (e.g. engineering) outside of weather and climate.

In general, data resolution is limited by computational power and the underlying observations.

Globally-gridded analyses and forecasts are limited to the computational power at the time they were created.  This table shows typical resolutions in different eras.

Time RangeRes (Deg)Res (km)
1970s-1980s2-3200-500
1990s-20101100
2010s-current0.2-0.525-60

Regional models can offer higher resolution than global models, especially if they have higher data density.  For instance, ds609.0 NCEP North American Mesoscale (NAM) could offer 12 km resolution in 2012, a time when NCEP's global models only provided 0.5 to 1.0 degree (about 50-100 km) resolution.

Models also may not run on rectilinear grids, which span the globe in even degree increments.  Rectilinear grid points are far apart at the equator and close together at the poles.  This makes for vastly differently sized tiles, which can slow down models (or create numerical artifacts).

Some models use Gaussian grids that maintain approximately the same horizontal spacing over the globe by reducing the number of grid points approaching the pole.

ECMWF N80 Reduced Gaussian Grid
Other global models, like GFS, are spectral models that represent the atmospheric state as a superposition of wave functions.  Model output is in the form of spectral coefficients.  The higher the number of basis functions, the higher the resolution of the model.

To learn more about the coupling between the spectral functions and the Gaussian grid points, read this explanation from ECMWF.

The RDA sometimes interpolates spectral and Gaussian models to rectilinear grids to simplify data analysis tasks for our users.  Interested users may access the full spectral coefficients of GFS in ds084.6 and reconstruct the data to their own grid.

While one can interpolate data to higher density (smaller spatial separation) grids, it won't increase the resolution of the underlying data, which is limited by the model resolution of the time.

Reanalyses, retrospective analyses performed with higher resolution models grids, can provide higher resolutions than the operational analyses of the era represented in the early part of the reanalysis span.

If your research involves reconstructing what planners could have been able to foresee, then use analysis data for that time.  For instance, one RDA user constructed a Palmer drought index for Brazil using historical analysis data.  By using only data that was available for that time, decision makers in Brazil can see what information would have been available in past droughts.

If your research involves a long time span and you want the data processed consistently, use reanalysis data.

I alluded earlier to the resolution of the underlying data.  Ground-based sensor networks are limited in spatial coverage and change over time.  While continental US (CONUS) and Europe may enjoy high spatial density coverage of ground stations, many other areas, such as the oceans and sub-Saharan Africa, do not.

Short-duration field campaigns may offer high spatial and/or temporal resolution, but only for a limited time.  We offer a few of those data sets, but their usefulness is limited by their short duration.

For truly global coverage, we turn to satellites.  Satellite data is also not spatially uniform due to differences in satellite orbits and sensors.

Different parameters (T, water vapor, cloud tops, albedo) are measured by different sensors at different wavelengths.  The horizontal resolution of satellite measurements is related to the wavelength and the size of the telescope lens or mirror.

Courtesy of NASA GSFC

Characteristics measured in the visible range (albedo) may have higher resolution than characteristics measured in the longer-wavelength microwave range (T and water vapor).

Remote Sensing Systems (REMSS) offers some excellent materials to get-started in understanding satellite data products.

For instance, the resolution of different wavebands on The Special Sensor Microwave Imager (SSM/I) vary so surface temperature and water vapor, even from the same satellite and instrument, will have different resolutions.


Ocean winds, like those obtained from Windsat, are derived from several microwave bands, whose resolutions also vary by wavelength.


Satellite orbits also determine resolution.  Geostationary weather satellites fly above the equator and in a much higher orbit (35,800 km above MSL) than polar satellites such as NOAA-NN and DMSP-NN (800 km above MSL).  The difference in height also changes the possible resolutions.  Look at the difference in total columnar water vapor resolution between TOVS (geostationary) and SSMI (polar) in Stephens et al.

Notice the blockiness of the contours with geostationary TOVS.
The contours are noticeably smoother with polar SSMI.
Even with the same orbits and same wavelengths, satellite resolutions have improved (and become closer to the theoretical limits) with improvements in optics and vibration isolation on spacecraft.  Satellite data improves over time, but is subject to theoretical limits.

No amount of money or technology can change the theoretical limits.  Running models at higher resolutions than the data supports also has limited utility.

One solution for obtaining higher resolutions is to dynamically downscale the data with a physical model such as The Weather Research and Forecasting Model (WRF).  Physics-based models use high-resolution terrain and landuse information, couple them with lower-resolution gridded data, and then model the state of the atmosphere with equations describing atmospheric dynamics.

Online and in-person tutorials can help you learn how to use this free community-supported model.

2016-07-11

NCAR RDA Partial Outage July 12, 2016

UPDATE Tue Jul 12 18:42:57 MDT 2016
Full RDA data services are back in operation.  Thank-you for your patience.

UPDATE Tue Jul 12 11:45:28 MDT 2016
The Glade (disk) outage is over. Full data file downloads are available again. I'll update again when full data services are restored.

UPDATE Tue Jul 12 09:40:05 MDT 2016
We are currently experiencing a disk failure and no data are accessible from this webserver, from /glade (for NCAR users), or from directories where data have been prepared for you at your request. Also, many of our dataset file lists are unavailable at this time. We are working to resolve the problem and apologize for any inconvenience.

NCAR CISL supercomputers, Yellowstone and Geyser, will be down for system maintenance 7:00 to 17:00 MDT on Tuesday July 12, 2016.

The RDA web server will remain in operation and users may continue to browse for and download whole data files.

However, data subsetting and translation services will be unavailable during this outage.

2016-06-24

NCAR RDA System Outage June 28, 2016

UPDATE Jun 28 15:40:05 MDT 2016:
All systems and services are up and running. Thank-you for your patience.

All RDA services will be unavailable starting at 6:00 AM MDT on June 28 due to system maintenance. It is estimated that RDA services will be back online by early evening.

We apologize for the inconvenience.

2016-06-03

NCAR RDA System Outage June 7, 2016

UPDATE Jun 7 21:36:04 MDT 2016:
All systems and services are up and running. Thank-you for your patience.

SCHEDULED OUTAGE NOTIFICATION

The Yellowstone, Geyser, and Caldera clusters will be unavailable during a scheduled maintenance period from 8 a.m. to 5 p.m. MDT on Tuesday, June 7. CISL staff will use this downtime to upgrade the system's GPFS client software to version 4.2.0.3.

A system reservation will be put in place 12 hours before the scheduled downtime. Users' jobs with specified job times that overlap the reservation period will remain on hold until the system is restored to service. Longer-running jobs that have not finished by 7 a.m. Tuesday will be killed and need to be resubmitted after the maintenance period. We apologize for the short notice and for any inconvenience this might cause.

START: Tue Jun 7 2016 6:00 AM MDT
DURATION: 11 hours

SERVICES AFFECTED
Geyser and Caldera
Yellowstone
Direct access of RDA data from /GLADE will not be available during this time.

The RDA web server will remain up and continue to accept data requests. But, the requests will not be filled until after the clusters are brought back up. Jobs submitted up to 12 hours before the system outage may also be held until after the outage. Data request jobs should run automatically after system restart.

To avoid uncertainty, submit your data requests early, so that they may complete before the system maintenance outage.

We apologize for the inconvenience.

You can check the CISL Resource Status Page for updates.

2016-06-02

Open Access Data

The RDA is an open access data archive. Everyone, regardless of affiliation status--including no affiliation--can obtain an RDA account.

The majority of RDA data holdings are freely available to everyone without restriction other than the standard UCAR/NCAR Terms of Use.

Registering for an RDA account

Affiliation with an academic or research institution is NOT required for registration, but we recommend that you register with one, if you qualify.  Selecting an Organization Type and giving us the Organization Name helps us learn about our users, and may enable you to access restricted datasets.

Your Organization Type choices:
  1. UCAR: Select only if your registration email address ends with @ucar.edu
  2. University/College
  3. Other Education: K-12 students and teachers, other education not covered by the first two categories.  
  4. Commercial
  5. Government: Federal, State, Local
  6. Military
  7. Public/Non-profit: National Labs, FFRDCs, NGOs, .org
  8. Unaffiliated: Those who do not fit into one of the above categories.  We encourage and support citizen-scientists.
Fill in ALL required fields in the registration screen (denoted with a *.)
After you submit the registration form, you will receive a confirmation message to the email you used for your registration.  You must respond within 48 hours or else your registration will be deleted.  If you don't receive an email from us, check your spam folder.

If you forgot your password, select "Forgot Password?" and we will send a change password link to your email account on record.

If you already have an account, but have changed affiliations, log in to your RDA account using your old email address and the password you set up.  Then edit your profile to reflect your new affiliation.

If you no longer have access to that email account and you forgot your password, email rdahelp@ucar.edu with your name, your former email account and your new one.  We'll help you update your information and you will be accessing data again in no time.

Please do not create more than one account per person.  Do not share your account with others.  Doing so may cause you to be banned from our site.

In summary

  • Register.
  • Keep your registration up-to-date and accurate.
  • We'll support you, no matter your affiliation status.

2016-04-25

NCAR RDA System Outage April 27, 2016

UPDATE Apr 27 22:04:20 MDT 2016: All systems and services are up and running. Thank-you for your patience.

UPDATE Apr 27 17:00:00 MDT 2016: RDA systems are still down due to extended maintenance. It is
estimated that they should be back online by 5PM MDT.
We have no estimate of when they will be back on line. We appreciate your patience during this extended outage.

RDA services, including data download and data request processing, will be unavailable during system maintenance on Wednesday, April 27, 2016.

Work will start at 4:30 MDT and complete around 9:00 MDT. Data request jobs should run automatically after system restart.

We apologize for the inconvenience.

2016-04-12

Accessing RDA data files from GLADE

Do you have an account on NCAR's supercomputer, Yellowstone (or Geyser or Caldera)?  Are you eligible for an account?

If so, you can take advantage of direct access to 500 Petabytes (or the most popular 25%) of the RDA's data holdings on GLADE.
The Globally Accessible Data Environment—a centralized file service known as GLADE—uses high-performance GPFS shared file system technology to give users a common view of their data across the HPC, analysis, and visualization resources that CISL manages.

GLADE file spaces are intended as work areas for day-to-day tasks and are well suited for managing software projects, scripts, code, and data sets. They are available by default except for project spaces.
GLADE access allows you to perform your data analysis tasks without having to download data, a considerable time-saver. As a bonus, CISL staff install and maintain myriad data tools so you can spend more time on science and less time on sys admin tasks.

At the SEA conference last week, I discovered that many Yellowstone/GLADE users were not aware of direct GLADE data access or how to locate specific data that they need on GLADE.

When you find a data set that interests you, click on the Data Access tab.  If the dataset is available on GLADE, "GLADE File Listing" links will appear in the green columns on the right.
Select GLADE File Listing.
Then narrow down your file selection until you see individual file names.
Prepend GLADE dataset path to data file path. 
The GLADE path to each dataset will be /glade/p/rda/data/dsnnn.n/

Then add the path to the individual file. E.g.
/glade/p/rda/data/ds083.2/grib2/2016/2016.03/fnl_20160301_00_00.grib2

wgrib, wgrib2 and a plethora of software for reading, manipulating and visualization of weather and climate data are already installed on Yellowstone.

Graduate students and postdocs at US universities can apply for *FREE* Small Allocation accounts.  Faculty and staff of US universities or national labs can apply for Data Access only accounts.  Check your eligibility.

This could make your research life much easier.

2016-04-11

NCAR RDA HPSS Request Outage April 12, 2016

Update: This work is now complete. Thank-you for your patience.
HPSS will experience some maintenance downtime on Tuesday Apr 12 to allow Oracle to work on some known hardware issues.

START: Tue Apr 12 2016 9:30 AM MDT
END: Tue Apr 12 2016 2:30 PM MDT
All RDA data services that utilize the High Performance Storage System (HPSS, aka the tape libraries) will be down during that time.

Submit your jobs early so that they can complete before the scheduled down time. HPSS data requests in progress before HPSS is taken down for maintenance should restart after HPSS is back on line. If your data request job remains in limbo after system maintenance is complete, please contact rdahelp@ucar.edu so we can restart the job for you.

2016-03-11

ds083.3: GDAS/FNL 0.25 degree global grids

Since July 8, 2015, RDA has been archiving 0.25 degree  GDAS/FNL analysis and forecast (hours 3, 6, 9) globally gridded data in GRIB2 format.

We call it ds083.3: NCEP GDAS/FNL 0.25 Degree Global Tropospheric Analyses and Forecast Grids.

I hope you call it useful.  ;-)

The analysis files are named gdas1.fnl0p25.YYYYMMDDHH.f00.grib2.

You may be able to find and download FNL at this resolution from NCEI beginning in mid-January 2015, but the RDA archive begins in July 2015 with gdas1.fnl0p25.2015070800.f00.grib2.

Forecast grids corresponding to 3, 6 and 9 hours after analysis time are named gdas1.fnl0p25.YYYYMMDDHH.fFH.grib2 where Forecast Hour, FH, can be 03, 06, 09.

wrfhelp@ucar.edu expects most WRF users will see forecast improvement with 0.25 degree instead of 1.0 degree input.  You can use a series of f00 analysis files to provide initial conditions and lateral boundary conditions for your WRF run.  You can also add forecast files to extend your WRF run past the time of the latest available FNL analysis file.

Each 0.25 degree GDAS/FNL analysis file is about ~180 MB while the corresponding 1.0 degree file is  ~17 MB.

Also note that the forecast 0.25 degree grids are ~205 MB because they include water (APCP, ACPCP, PRATE...) and cloud variables that are not in the analysis grids.  This is why some people call the analysis grids "dry".

Please order subsets of the global grids in order to efficiently use shared bandwidth.

Update:
We will not be back-filling the 0.25 degree GDAS/FNL to Jan 2015.  If you need FNL files we don't offer, order them directly from NOAA.

2016-03-07

What a difference 6 hours makes

I illustrated NCEP Model Performance with verification statistics for the 18Z GFS forecast cycle because 18Z corresponds to noon CST--near the peak daily temperatures over the Continental Unites States (CONUS). However, that may have been misleading or only told a partial truth.

Let's look at the statistics for 18Z again.
18Z analysis cycle GFS temperature bias for forecast hours 0-168, compared to conventional upper air soundings. Operational GFS on the left and experimental GFS on the right.  Scale [-1.0, 1.0] C
Note the data scales.
Bias between upper air stations and the 48-hour GFS forecast for the 18Z cycle.  Bias statistics computed over ~2,750 observations.
Now let's look at the same visualizations for the 12Z data analysis/forecast cycle.

12Z analysis cycle GFS temperature bias for forecast hours 0-168, compared to conventional upper air soundings.  Operational GFS on the left and experimental GFS on the right.  Scale [-0.5, 0.5] C
Notice that the temperature bias scale is reduced from [-1.0, 1.0] to [-0.5, 0.5] C? The bias reduction is less dramatic than a factor of two.

In the next graph, the scale doesn't change, but the peak of the bias is reduced from ~4.0C to ~2.5C at the tropopause.

Bias between upper air stations and the 48-hour GFS forecast for the 12Z cycle.  Bias statistics computed over ~135,000 observations.
What causes that difference? The graph on the right gives a clue.

The horizontal scale gives the total number of observations used to compute the statistics. At 18Z, there were ~2,750. At 12Z, there were ~135,000. More radiosondes are released near 12Z than any other time of day (with 0Z coming in a close second.)

Links:

2016-03-04

NCEP Model Performance

Users of gridded analysis or forecast data sets may wonder how well do the analyses reflect the measurements that were assimilated into them.   Furthermore, how do the forecasts compare to the reality?

Welcome to the world of verification statistics.

If you took all the radiosondes that were ingested into an analysis like GDAS/FNL, then you could compute the mean difference (bias) and the RMSE between the measurements and the analysis for the exact same time.  The overall goal is to minimize the biases globally (but allow small biases at individual stations.)

This NCEP EMC site allows you to view some useful statistics for each analysis cycle (00Z, 06Z, 12Z, 18Z).  For instance, if you compared the analyses and forecasts from the 18Z analysis/forecast cycle against all upper air measurements, then you see a slight warm bias in the troposphere and a slight cool bias in the stratosphere at Forecast Hour 0 (analysis time).
18Z analysis cycle GFS temperature bias for forecast hours 0-168, compared to conventional upper air soundings. Operational GFS on the left and experimental GFS on the right.
Notice that the fit is not perfect. The operational GFS model is shown on the left; an experimental version (GFSX) is shown on the right. GFSX appears to be a slight improvement.

Let's look at the Root Mean Squared Error (RMSE).  Are you amazed that we can forecast the global temperature to within 2.5 degrees 5 days ahead?  Or are you young enough to take that for granted?
18Z analysis cycle GFS temperature RMSE for forecast hours 0-168, compared to conventional upper air soundings. RMSE of GFSX is smaller than GFS (right).
Again, GFSX appears to be an improvement against the current operational GFS model.  After monitoring both, NCEP EMC scientists may decide to implement GFSX as the new operational model, GFS. 

Tweaks like this are common as I explained in Analysis, forecast, reanalysis--what's the difference?  If consistent processing is important for your work, always use a reanalysis.

Here's a vertical cross-section of the same verification data, at forecast hour 48.  The web site does not offer a 0 hour graph, but the first plot shows that the 48 hour forecast errors are slightly larger than the analysis.
Bias between upper air stations and the 48-hour GFS forecast.
The NCEP EMC Mesoscale Verification site offers further insight into GFS vs GDAS/FNL.  If you read What's the difference between GFS and FNL?, you may recall that the GDAS/FNL analysis takes place several hours later than GFS, so that it can incorporate more observations. By the time GDAS/FNL is ready, the 12-hour GFS forecast representing the same time should be ready.

The 500 mb height, aka the half-height of the atmosphere, gives you an indication of temperatures and major atmospheric features such as highs and lows.  GDAS/FNL shows slightly sharper features than GFS, but notice how well they agree with each other overall.

I hope, in studying these statistics, you agree with me that NWP is a major triumph of human ingenuity and cooperation.

Links:



2016-02-29

Learning about data before learning with data

I've put together a short and opinionated guide to books about data worth reading. If your favorite book didn't make the list, please leave a comment listing the book and why it is worth including.

You'll find Books for scientists and data scientists under the "Pages" heading at the top of the right column of this blog along with the RDA Video Tutorials page. Can you think of other permanent pages we should create and list there?

2016-02-23

NCAR RDA Sporadic Outages February 25-26, 2016

UPDATE 2015-02-26 14:48 MST: This work is complete and RDA web services are back in operation. Thank-you for your patience.

UPDATE 2016-02-25 16:00 MST:  Work on the development servers is complete.  Work on the production servers will commence on 2016-02-26 at 8:30 MST.
--------
System upgrades and maintenance of the RDA servers will commence at 9:00 MST on Thursday, February 25, 2016.
Both development and production systems will experience periods of downtime during this operation. Work will start with the development systems and once complete, a status update will be sent out. At that time work will begin on the production systems. Another status update will be sent upon completion of operations.
RDA web services will be impacted. Users may not experience outages until work begins on the production servers, most likely in the afternoon MST. However, schedule your data requests to finish before mid-morning Feb 25 to be safe.

Progress updates will be posted here.

2016-02-12

Upcoming improvements to the Globus login process

On February 13, starting at 10 AM CST, Globus will be upgrading its data transfer services and Globus transfers will not be available while this release is deployed to their production environment.  This downtime will affect all Globus data transfers initiated from the RDA website.  Globus services are expected to be back online at approximately 3 PM CST.

Included in the release are upgrades to the Globus authentication and authorization mechanisms which will allow RDA users to access the service using their RDA login e-mail and password.  A separate Globus username and password are no longer required.  To log in using your RDA credentials, select the 'NCAR RDA' organization on the Globus login page, and then follow the instructions to enter in your RDA user e-mail and password.  RDA users who currently have a Globus username and password may continue to log in with their Globus credentials by selecting 'GlobusID' from the list of organizations.

For more information on the upcoming release and improvements, see the Globus blog post at https://www.globus.org/blog/enhanced-login-mechanism-streamlines-access-globus.

2016-02-04

Subsetting to save time and bandwidth

Many people use RDA data to initialize the WRF model.  Some researchers ask for help to cope with bandwidth constraints.

Others may not even be aware that they are running into bandwidth constraints until too late--when they get the "429 Too Many Requests" error after their IP address has been blocked for overuse of RDA resources.

We want to help you get the data you need to do your work.   But you will save time and bandwidth if you keep a few simple points in mind.

2016-02-01

NCAR RDA System Outage February 2-3, 2016

Second UPDATE 18:00 MST Feb 3, 2016: All RDA services are back on line.  We thank you for your patience during this extended outage.

UPDATE Feb 3, 2016: System maintenance continues.  Requests for data on tape (HPSS), data translation, and subsetting jobs will be held until maintenance concludes.  Data downloads of whole files on disk (webfiles) is operational.

RDA services, including data download and data request processing, will be unavailable during system maintenance on Tuesday, February 2, 2016.

Work will start at 7:00 MST and the system should be up by 16:00 MST or earlier. Data request jobs should run automatically after system restart.

We apologize for the inconvenience.

2016-01-28

ERA5 compared to ERA-Interim

The RDA anticipates downloading and processing a significant portion of ECMWF ERA5 beginning calendar year 2016. In response to user inquiries about the characteristics of ERA5, the following basic information from various sources has been tabulated:

Preliminary Comparison of ECMWF ERA-Interim and ERA-5 Reanalyses
Based on Core-Climax Workshop, Brussels, January 15-16, 2015,  and  "ECMWF – Computing and Forecast System", iCAS 2015, Annecy, France, September 2015 (Isabella Weger)
CategoryERA-InterimERA-5
Start of ProductionAugust 2006
IFS1 Cy31r2
June 2015
IFS1 Cy41r1
Model InputAs in operations
(inconsistent SST)
Appropriate for Climate
(CMIP52, HadISST.23)
Model Horizontal GridReduced Gaussian and spectral coefficientsReduced Gaussian and spectral coefficients
Model Horizontal ResolutionNominally 79 km global (0.703125°)
ECMWF T255 N128 (∼512 x 256)
Nominally 31 km global (0.140625°)
ECMWF T1279 N640 (∼2560 x 1280)
Model Vertical Resolution60 levels to 10 Pa (hybrid coordinate)137 levels to 1 Pa (hybrid coordinate)
Time Period1979 to present time1979 to present time
Possible extension back to ∼1950
DisseminationMonthly (up to 3 month lag at NCAR)Monthly (up to 3 month lag at NCAR)
Daily for ERA-5T
ObservationsPrimarily ERA-40, GTS4Various reprocessed CDR5s
Radiative TransferRTTOV67RTTOV611
Analysis Method4D-Var7
1D + 4D-Var for rain
10-member ensemble 4D-VAR (EDA8)
All-sky microwave
Variational Bias CorrectionSatellite radiancesSatellite radiances, ozone, aircraft, surface pressure, radiosondes
Maximum Volume50 TB1.5 PB
1IFS, Integrated Forecast System
2CMIP5, Coupled Model Intercomparison Project Phase 5
3HadISST.2, Hadley Centre Sea Ice and Sea Surface Temperature data set version 2
4GTS, Global Telecommunication System
5CDR, Climate Data Record
6RTTOV, Radiative Transfer for TOVS, TOVS being TIROS Operational Vertical Sounder
 (originally hosting the Microvave Sounding Unit – MSU, the High Resolution Infrared Radiation Sounder – HIRS, and the Stratospheric Sounding Unit – SSU)
74D-Var, 4-Dimensional Variational Data Assimilation
8EDA, Ensemble Data Assimilation

2016-01-22

The data starts here 2

Professor Catherine D'Ignazio asked What would feminist data visualization look like? I've never thought about data visualization and feminism together before, but her essay over at the MIT Center for Civic Media is well worth a read.  So are all the essays from the recent Responsible Data Forum's event about Data Visualization.

Her concept of feminist data visualization is just plain sound data visualization (dataviz).
  1. Invent new ways to represent uncertainty, outsides, missing data, and flawed methods
  2. Invent new ways to reference the material economy behind the data.
  3. Make dissent possible
How this applies at the RDA

2016-01-20

ds735.0 NCEP GDAS Satellite Data Extended

New and improved, with more satellites!
Visualization of DMSP F17 WV courtesy of REMSS.
A chance encounter with members of the WRF Data Assimilation (WRFDA) team led me down a rabbit hole to improve the usefulness of ds735.0 for users of WRFDA.

In 2009, RDA began archiving satellite data products that were ingested into GDAS.  (We backfilled the data to 2004 or 2005.)  Since then, the number and types of satellite data products ingested into GDAS has grown.  It was time for ds735.0 to keep up.

2016-01-15

WRF-able data sets

RDA data specialists has been working with wrfhelp to help Weather Research and Forecasting Model (WRF) users more easily get set up and running with RDA-supplied data sets.

Gridded data in GRIB format is used in the WRF Preprocessing System (WPS) to create both initial conditions inside the WRF domains and lateral boundary conditions outside of them.  In order to read in the GRIB data and write it out into a WRF input file, users need to supply a Vtable--a Rosetta stone of sorts that tells WPS which variables to pull-out from the GRIB file.

2016-01-04

AGU Poster Session Basics

I enjoyed meeting many RDA data users at the 2015 American Geophysical Union Fall Meeting, including two graduate students from IIT Delhi.

The AGU Fall Meeting is always a bit of a homecoming for me as I catch up with colleagues and school friends. (I attended both high school and college within 20 miles of Moscone Center so it is literally a homecoming as well.)

With over 24,000 attendees over the five days of the meeting, it is not logistically possible to give everyone an oral presentation slot.  Moreover,  many find condensing their work into a 12-minute talk difficult.

Most attendees will be offered a spot in a poster session, which allows ample time for face-to-face (f2f) discussion.  Many find the f2f discussion so helpful, they present their work twice in oral AND poster sessions.  I've often seen professors give oral talks and refer the audience to the poster(s) of the graduate student(s) for more details about the work in the talk.

If you haven't attended an AGU before, the AGU Poster Presenter Guidelines are a good place to start.  But, they don't adequately give you a feel for what these sessions are like.