Trip Report: Rocky Mountain Celebration of Women in Computing 2016

Last month, I attended Rocky Mountain Celebration of Women in Computing 2016 (RMCWiC) along with Dr Natasha Flyer and (soon-to-be Dr) Delilah Feng.  The 280+ registered attendees (including over 160 college students) overwhelmed the Hotel RL in downtown Salt Lake City.

Can you find the NCAR CISL representatives in this photo?
The Rocky Mountain Celebration of Women in Computing, an ACM (Association for Computing Machinery) Celebration event, is the biennial event for the Rocky Mountain Region that encourages the career interests of women in computing...
NCAR's Computational Information Systems Laboratory (CISL) jointly sponsors RMCWiC and we actively recruit visiting students and scientists at these meetings.

My favorite part was the poster session, where students presented their work.  The breadth, quality of work and enthusiasm were very high.  I enjoyed talking to the students so much, they ran out of dessert before I remembered they were serving it!
A very lively poster session.
Computing is very broad; even the work within CISL is very broad.  I lingered longest at the talks featuring embedded sensors, real-time data processing and data visualization.  When one student told me she was ready to tackle higher dimensional data problems, I was thrilled.
This student presented data visualization research about how to effectively convey uncertainty, a subject of high importance in meteorology and climatology.
If you are a student or recent PhD, consider applying to one of NCAR CISL's visitor programs.  We are particularly interested in hosting undergraduate and graduate students for our Summer Internships in Parallel Computational Science (SIParCS) program.  These summer internships pay local living expenses and cash stipends.  Furthermore, SIParCS students work with one or more mentors throughout their stay and may be awarded travel stipends to present their SIParCS work at conferences such as AGU and AMS.

I worked part-time or was a stay-at-home mom for many years before re-entering the scientific computing workforce.  Career re-entry candidates are also encouraged to apply.  Feel free to contact me about career re-entry opportunities offered by NSF, NASA and other government agencies.


WRF Vtable Carpentry

When we announced the introduction of WRF-able data sets (Vtables auto-generated from RDA-collected metadata) in January 2016, WRF Preprocessing System (WPS) component, ungrib, was able to use the RDA Vtables without any hiccups.

Around the time of the release of WRF 3.8.0 in April 2016, rdahelp@ucar.edu began receiving reports of error messages when running ungrib with RDA-provided Vtables.  If the Vtable includes lines describing fields that are not contained in that particular input GRIB file, ungrib stops instead of ignoring that field (as it had done previously.)

WPS V3.8 Updates

The simple solution is for the user to edit the RDA-provided Vtable to remove the extraneous line(s) referring to fields not contained in the input GRIB file.

After investigation and consideration, wrfhelp and rdahelp decided not to change either the WPS code or the auto-generated Vtables to return to plug and play.  Here's why.

The RDA provides archives of analysis, forecast and reanalysis data sets.  Reanalysis data sets contain the same variables/parameters throughout the entire time series.  Operational analysis and forecast systems vary over time.  Parameters or vertical levels may (and should!) change over the time series.

With advances in computational power and modeling of physical processes, newer fields and levels can be added.  Bandwidth is limited.  Dropping vestigial parameters when they are replaced with better ones is necessary.

RDA-provided Vtables are complete.  They include lines describing every WRF-usable parameter found in the data set time series.  To see how parameters and vertical levels changed over time, click on the "Variables by dataset product" or "detailed metadata for level information" links from the data set home pages.

Consider the long-running NCEP analysis series, FNL, ds083.2. When it began in 1999, it was in GRIB (aka GRIB1) format. Beginning in 2007, it is offered in GRIB2 format. Several times, the model was changed and fields were added (and sometimes subtracted.)

View the vertical level information of long-running NCEP analysis series, FNL, ds083.2.  Select "detailed metadata" to view GRIB1 details; select "GRIB2 level table" to view GRIB2 details.

Defaults to "Parameter View".  Also explore "Vertical Level View."
Notice that the soil layers expanded from a single layer 10-200 cm below the surface to three layers in 2005
Vtable.RDA_ds083.2 contains lines describing the entire time series from 1999 to the present. GRIB1 codes are on the left; GRIB2 codes are on the right. Notice how there are lines with GRIB1 or GRIB2 codes only. (One set of columns is blank.) If you are using a GRIB2 file (2007 and later), then edit out the lines that pertain only to GRIB1 files (GRIB2 columns blank.) If you are using a GRIB1 file, delete the lines referring to parameters found only in the GRIB2 files.

Delete the lines that refer to variables not found in the file you are using.
Similarly, if you are using a 2005 and older GRIB1 file, delete the six lines referring to three layers of soil temperature or moisture between 10 and 200 cm. If you are using a 2005-05-31 and later GRIB1 or GRIB2 file, delete the two lines referring to Soil temperature/moisture 10-200 cm below ground.

These soil layers are found only in FNL files 2005-05-31 and later.  Delete for earlier dates.
In closing, "plug and play" is both a blessing and a curse.  We all want things that "just work" right out of the box.  But, your data should not be a black box.  You don't need to become an expert in the arcana of data syntax, but you should peek "under the hood" and understand what is in your data file.

To learn more about GRIB data sets, read our GRIB blog series, particularly Setting up to work with GRIB2. Use wgrib and wgrib2 to explore the contents of your GRIB or GRIB2 files. You might find a parameter you aren't already using but might be helpful. Or, you may see a lot of parameters and vertical layers that you don't need. In that case, may I suggest you Subset to save time and bandwidth?


NCAR RDA partial outage September 19, 2016

The NCAR CISL High Performance Storage System (HPSS) will be down for Monday, September 19, 2016,  to prepare for the installation of Cheyenne, NCAR's new high-performance supercomputer.

Although the RDA's primary and backup copies of data are stored on HPSS, the most popular data, approximately a third of our holdings, are replicated on our web server for faster data access. Thus, most users will not be impacted.

Users requesting HPSS data will experience a delay, but need not do anything special.  HPSS data services will automatically resume upon completion of this work.

HPSS will be down next Monday Sept 19 from 8:30 AM to 6:30 PM for hardware and systems integration work to help prepare the way for HPSS support of Cheyenne.

START: Mon Sep 19 2016 8:30 AM MDT
END: Mon Sep 19 2016 6:30 PM MDT


NCAR RDA outage September 14, 2016

UPDATE Wed Sep 14 14:13:35 MDT:
All RDA data services are back online. Thank-you for your patience.

NCAR computers will be down for 10 hours, beginning Wed Sep 14 2016 6:00 AM MDT.  All RDA services will be down during that time.

START: Wed Sep 14 2016 6:00 AM MDT
END: Wed Sep 14 2016 16:00 AM MDT

NCAR CISL central file disk systems will be down for system maintenance from 7:00 on Wednesday September 14, 2016 until 10:00 MDT. All data downloads, including access through THREDDS servers, will be unavailable during this outage. Custom data orders will be held and automatically restarted after the work is complete.
START: Wed Sep 14 2016 7:00 AM MDT
END: Wed Sep 14 2016 10:00 AM MDT

GLADE Administration


How to access a restricted data set

I alluded earlier to a few* exceptions to open access.  They are open to varying degrees due to restrictions from the data providers (not from RDA.)

To see what is available to you,
  • Click on "dashboard" (at the top of the RDA web portal, right next to "sign out"
  • Then click on "Edit/Change Profile"
  • The bottom half of the screen for an unaffiliated user will look something like this:
Restricted data sets at the RDA.
  • ECMWF Operational data is only accessible to users at UCAR member institutions in the U.S. and Canada, as well as users at all other U.S. universities and government institutions. Eligible users must agree to the ECMWF Terms of Use (TOU).
  • JRA products are available to all affiliated users.
  • COSMIC data access is granted only to affiliated users who have also registered at CDAAC.
  • ECMWF Data (Other Than Operational Data) is available to all registered users who agree to their TOU.
Suppose you want to compare how JRA-55 (ds628.0) and NCEP/NCAR Global Reanalysis (ds090.0) differ over your region of interest.  Read their data set Description pages.  Note that ds628.0 has sections labeled "Access Restrictions" and "Usage Restrictions" while ds090.0 does not.

JRA-55 Access and Usage Restriction information.
Click the "Data Access" tab for JRA-55 and you will see the screen below if you are nonaffiliated. If you are eligible (affiliated), you will be taken to a page with the Terms of Use. Read and accept the Terms of Use and you are then able to access the data.

Nonaffiliated users see this.
If you do have a valid and current affiliation, edit your profile.  Reply to the verification email we send, and then you will be able to read and accept the Terms of Use.
Edit your profile here.  We WILL confirm your email to be sure that it is current.
Some have been reluctant to update their email/affiliations when they leave their universities. Losing access to something you once had is a bummer. I still miss the Olympic-sized swimming pool at Berkeley.

But, users of restricted datasets need to reverify their status by replying to verification email notices sent out every six months. Thus, you don't lose anything by keeping your email and affiliation status up-to-date. In fact, you lose access to email notices that we send out about outages and important updates about data you have downloaded from us in the past.

* Only 78 of our 600+ data sets belong to one of the restricted classes of data sets.


Yes, Ireland does have droughts: Notes from the 9th ACRE Workshop

The RDA hosts many climate reanalysis collections, yet only two in our archive are unique in that they extend beyond the modern (post-1948) radiosonde era: the NOAA-CIRES 20th Century Reanalysis (1851–2014; 20CR) and the ERA-20C (1900–2010) produced under ECMWF's ERA-CLIM project.  In order to produce a climate reconstruction dating back over a century, both of these reanalyses rely on assimilating very old surface and marine weather observations, which have been assembled into long-term observational datasets such as the International Surface Pressure Databank (ISPD) and the International Comprehensive Ocean-Atmosphere Dataset (ICOADS).  How do these historic observations find their way into a modern climate reconstruction such as 20CR and ERA-20C?

It all starts with the Atmospheric Circulation Reconstructions over the Earth (ACRE) initiative, which is an international collaboration between meteorological organizations, atmospheric scientists, librarians, and volunteers who are dedicated to the recovery and digitization of historic global weather observations.  By facilitating the process of rescuing, digitizing, and performing quality control of data from old weather log books, ACRE ensures that these important observations provide a better understanding of past climate, which in turn aids in modern climate applications and studies of climate impacts.

The 9th ACRE Workshop

ACRE holds annual workshops to discuss all things related to data rescue, and I attended the 9th ACRE Workshop and Historical Weather and Climate Data Forum, held June 20-24 at Maynooth University in Ireland.  The meeting agenda was populated with talks illustrating the full cycle of how historic data observations make their way from being discovered and digitized all the way to being assimilated into climate reconstructions and applications.

Most notably, the annual ACRE workshop is held in different locations around the world, and the local ACRE partners which host the workshop are featured prominently in the meeting agenda.  This not only brings attention to data rescue efforts in the region which hosts the workshop and encourages local groups to become more involved, but it also highlights the importance of how data rescue facilitated by ACRE leads to a better understanding of the past local climate and subsequent future impacts.  At the most recent meeting in June, climate scientists from Maynooth University and Met √Čireann, the Irish national weather service, gave illuminating talks on how they're using ACRE supported observations and reanalyses to understand long-term Irish precipitation trends, the Irish droughts of the 19th and 20th centuries, and how weather played a role in the Easter Rising of April 1916.

St. Patrick's College at Maynooth University provided a beautiful setting for the 9th ACRE Workshop

Other highlights included presentations on the following topics:
  • Dona Cuppett and Rick Crouthamel from the International Environmental Data Rescue Organization (IEDRO) volunteer their time to travel to developing countries around the world to assist in setting up a data rescue and digitization operation.  Their presentation described the logistical challenges involved in undertaking this effort, and underscored the importance of rescuing historic weather observations in developing countries, which are most vulnerable to climate impacts.
  • Reports on efforts to scan and digitize pages from historic weather logs.  These included talks on citizen science projects such as OldWeather.orgWeatherDetective, and the Sir Charles Todd folios project led by Mac Benoy.
  • Stefan Br√∂nnimann presented on the Mount Tambora eruption of 1815 and the subsequent period of global cooling, commonly known as the "Year Without a Summer".
  • Dispatches from ACRE regional chapters leading data rescue efforts around the world.
  • Updates from various groups leading climate reanalysis initiatives, including the 20CR, UERRA, and ERA-CLIM2 projects.

Wanted: Southern Hemisphere data

The benchmark of a good climate reanalysis is performing better than climatology, yet the uncertainty in a reanalysis is controlled by the amount of underlying observations assimilated into the analysis.  More observations correspond to a higher confidence in the analysis, whereas a data sparse region returns an analysis with low confidence.  For long-term reanalyses such as the 20CR and ERA-20C, the analysis in the 19th and early 20th centuries poses a challenge due to the overall scarcity of input data.  In particular, there is a great need for data in the Southern Hemisphere, not only for earlier periods, but for more recent years as well.  Philip Brohan's "Fog of Ignorance" visualizations have become a well-known highlight at ACRE meetings, and illustrate how ground-truth observations improve the certainty in historical climate reconstructions.

Antarctic weather 1909-10 from Philip Brohan illustrates the "Fog of Ignorance" in the 20th Century Reanalysis due to a lack of observations. Note the grey fog disappears in the vicinity of observations (marked by yellow dots) made available by ACRE facilitated data rescue.

ACRE is therefore making a strong push to recover historical data from south of the Equator, with efforts being led in regional chapters such as Chile, Southeast Asia, Southwest Pacific, and Antarctica, the latter of which is recovering observations from Antarctic whaling expeditions in the late 19th and early 20th centuries.

Learn more about ACRE at http://met-acre.net/.


NCAR RDA outage August 22-24, 2016

UPDATE Thu Aug 25, 2016:
All systems are back up and operating normally.  Thank-you for your patience.

NCAR CISL supercomputers and central file disk systems will be down for system maintenance from 0:01 MDT on Monday, August 22, 2016 until late Wednesday, August 24  The HPSS systems will begin to shut down at 21:00 MDT on Sunday, August 21.

When we come back, we'll have more storage and be able to provide more data for you on disks.

From the Daily B:
The Yellowstone, Geyser, Caldera, and GLADE systems will be unavailable beginning at 12:01 a.m. Monday, and the HPSS system will be unavailable starting at 9 p.m. Sunday, earlier than previously announced. All of these systems are expected to be back in production by late Wednesday afternoon, August 24.

Facilities upgrades during the downtime are related to expansion of the GLADE file systems and preparation for the new Cheyenne HPC system at the NCAR-Wyoming Supercomputing Center. Power will be unavailable for much of that period.

When the systems are returned to service following the downtime, users will have larger GLADE home space quotas. The current 10-GB quota will be increased to 25 GB. The larger GLADE file systems will serve both the Yellowstone and Cheyenne supercomputers. Users will be informed via the CISL Notifier service when each system is returned to production.


NCAR RDA Partial Outage Aug 2, 2016

NCAR CISL supercomputers and central file disk system will be down for system maintenance 7:00 to 17:00 MDT on Tuesday Aug 2, 2016.

The RDA web server will remain in operation, but users will not be able to transfer data while the central file disk system is unavailable.  Additionally, data subset processing and translation services will be unavailable during this outage.


What spatial resolution can I expect when using gridded dataset products?

Occasionally, rdahelp@ucar.edu fields requests for higher grid resolutions than what we can currently offer. Some users wonder why older data is not available at the higher resolutions in newer data.

The requests have often come from new users of RDA resources--beginning researchers or experienced researchers from communities (e.g. engineering) outside of weather and climate.

In general, data resolution is limited by computational power and the underlying observations.

Globally-gridded analyses and forecasts are limited to the computational power at the time they were created.  This table shows typical resolutions in different eras.

Time RangeRes (Deg)Res (km)

Regional models can offer higher resolution than global models, especially if they have higher data density.  For instance, ds609.0 NCEP North American Mesoscale (NAM) could offer 12 km resolution in 2012, a time when NCEP's global models only provided 0.5 to 1.0 degree (about 50-100 km) resolution.

Models also may not run on rectilinear grids, which span the globe in even degree increments.  Rectilinear grid points are far apart at the equator and close together at the poles.  This makes for vastly differently sized tiles, which can slow down models (or create numerical artifacts).

Some models use Gaussian grids that maintain approximately the same horizontal spacing over the globe by reducing the number of grid points approaching the pole.

ECMWF N80 Reduced Gaussian Grid
Other global models, like GFS, are spectral models that represent the atmospheric state as a superposition of wave functions.  Model output is in the form of spectral coefficients.  The higher the number of basis functions, the higher the resolution of the model.

To learn more about the coupling between the spectral functions and the Gaussian grid points, read this explanation from ECMWF.

The RDA sometimes interpolates spectral and Gaussian models to rectilinear grids to simplify data analysis tasks for our users.  Interested users may access the full spectral coefficients of GFS in ds084.6 and reconstruct the data to their own grid.

While one can interpolate data to higher density (smaller spatial separation) grids, it won't increase the resolution of the underlying data, which is limited by the model resolution of the time.

Reanalyses, retrospective analyses performed with higher resolution models grids, can provide higher resolutions than the operational analyses of the era represented in the early part of the reanalysis span.

If your research involves reconstructing what planners could have been able to foresee, then use analysis data for that time.  For instance, one RDA user constructed a Palmer drought index for Brazil using historical analysis data.  By using only data that was available for that time, decision makers in Brazil can see what information would have been available in past droughts.

If your research involves a long time span and you want the data processed consistently, use reanalysis data.

I alluded earlier to the resolution of the underlying data.  Ground-based sensor networks are limited in spatial coverage and change over time.  While continental US (CONUS) and Europe may enjoy high spatial density coverage of ground stations, many other areas, such as the oceans and sub-Saharan Africa, do not.

Short-duration field campaigns may offer high spatial and/or temporal resolution, but only for a limited time.  We offer a few of those data sets, but their usefulness is limited by their short duration.

For truly global coverage, we turn to satellites.  Satellite data is also not spatially uniform due to differences in satellite orbits and sensors.

Different parameters (T, water vapor, cloud tops, albedo) are measured by different sensors at different wavelengths.  The horizontal resolution of satellite measurements is related to the wavelength and the size of the telescope lens or mirror.

Courtesy of NASA GSFC

Characteristics measured in the visible range (albedo) may have higher resolution than characteristics measured in the longer-wavelength microwave range (T and water vapor).

Remote Sensing Systems (REMSS) offers some excellent materials to get-started in understanding satellite data products.

For instance, the resolution of different wavebands on The Special Sensor Microwave Imager (SSM/I) vary so surface temperature and water vapor, even from the same satellite and instrument, will have different resolutions.

Ocean winds, like those obtained from Windsat, are derived from several microwave bands, whose resolutions also vary by wavelength.

Satellite orbits also determine resolution.  Geostationary weather satellites fly above the equator and in a much higher orbit (35,800 km above MSL) than polar satellites such as NOAA-NN and DMSP-NN (800 km above MSL).  The difference in height also changes the possible resolutions.  Look at the difference in total columnar water vapor resolution between TOVS (geostationary) and SSMI (polar) in Stephens et al.

Notice the blockiness of the contours with geostationary TOVS.
The contours are noticeably smoother with polar SSMI.
Even with the same orbits and same wavelengths, satellite resolutions have improved (and become closer to the theoretical limits) with improvements in optics and vibration isolation on spacecraft.  Satellite data improves over time, but is subject to theoretical limits.

No amount of money or technology can change the theoretical limits.  Running models at higher resolutions than the data supports also has limited utility.

One solution for obtaining higher resolutions is to dynamically downscale the data with a physical model such as The Weather Research and Forecasting Model (WRF).  Physics-based models use high-resolution terrain and landuse information, couple them with lower-resolution gridded data, and then model the state of the atmosphere with equations describing atmospheric dynamics.

Online and in-person tutorials can help you learn how to use this free community-supported model.


NCAR RDA Partial Outage July 12, 2016

UPDATE Tue Jul 12 18:42:57 MDT 2016
Full RDA data services are back in operation.  Thank-you for your patience.

UPDATE Tue Jul 12 11:45:28 MDT 2016
The Glade (disk) outage is over. Full data file downloads are available again. I'll update again when full data services are restored.

UPDATE Tue Jul 12 09:40:05 MDT 2016
We are currently experiencing a disk failure and no data are accessible from this webserver, from /glade (for NCAR users), or from directories where data have been prepared for you at your request. Also, many of our dataset file lists are unavailable at this time. We are working to resolve the problem and apologize for any inconvenience.

NCAR CISL supercomputers, Yellowstone and Geyser, will be down for system maintenance 7:00 to 17:00 MDT on Tuesday July 12, 2016.

The RDA web server will remain in operation and users may continue to browse for and download whole data files.

However, data subsetting and translation services will be unavailable during this outage.