2016-11-28

NCAR's Research Data Archive featured on UCARConnect 2

The November 2016 issue of UCARConnectData: The Currency of Science is now live.  The NCAR RDA is proud to have contributed to this issue.

RDA Data Specialist, Grace Peng, wrote Big Data, Big Planet
We’re experiencing a big data explosion both in cultural awareness and in penetration into many aspects of everyday life. How did we get here? What role do weather and climate play in our data moment?

Big data is characterized by its "Vs": Volume, Variety and Velocity. Data volume is “big” if it is too large to reside on or be processed with a personal computer.

Did you know that even in the pre-computer era, weather data was the original big data use case? It’s most useful when we have a lot of it and we need the data in real-time. Furthermore, we use a large variety of physical measurements to characterize the state of the atmosphere and ocean.
RDA manager, Steve Worley, and former intern, Sophie Hou, speak about the history of the RDA and scientific data in general in this video.

Sophie Hou has joined NCAR permanently as our new Data Curation & Stewardship Coordinator. Read her career profile about what led her to this exciting career.

UCARConnect is a (roughly) bimonthly publication of UCAR aimed at science enthusiasts of all ages, including K-12 students and teachers.

2016-11-18

GRIB2 file carpentry

We've learned from several users that the second WRF method in our post about How to work around operational changes to GFS/GDAS/FNL does not work.

Use the first method described.
First, you can work around the problem by removing the new vertical levels in the files after the GFS/GDAS change. Then all of your WPS input will be consistent. You can do this yourself with wgrib2, or our "Get a Subset" tool.
Here's the detailed recipe for those of you not familiar with wgrib2 tool for manipulating grib2 files.

We're going to use tip #4 and #4a from Wesley's Tricks for wgrib2 page to extract a range of records from a grib2 file.

Start by making a short inventory to identify the records that you want to keep and to remove.
wgrib2 gdas1.fnl0p25.2016051106.f00.grib2 > gdas1.fnl0p25.2016051106.f00.inv

wgrib2 gdas1.fnl0p25.2016051112.f00.grib2 > gdas1.fnl0p25.2016051112.f00.inv

Note that gdas1.fnl0p25.2016051106.f00.inv has 322 lines and gdas1.fnl0p25.2016051112.f00.inv has 352 lines. The extra 30 lines/records are in records 5-34.
We'll extract fields 1-4 and 35-352 separately and then recombine them with the UNIX command, cat.  It would be tedious to type in each and every file I want to convert. To save time, I put all the files I want to convert into one directory and ran this bash script.
#!/bin/bash

for file in gdas1*f00.grib2; do
  wgrib2 $file -for 1:4 -grib temp1
  wgrib2 $file -for 35:352 -grib temp2
  cat temp1 temp2 > $file.27
done

2016-10-31

NCAR RDA outage Nov 1, 2016

UPDATE Nov 1 15:11:39 MDT 2016: All RDA services are back in service. Thank you for your patience.

NCAR CISL central file disk systems will be down for system maintenance from 8:00 MDT on Tuesday, November 1, 2016 for approximately 4 hours.

All data downloads, including access through THREDDS servers, will be unavailable during this outage.

NCAR CISL supercomputers will be down for system maintenance from 8:00 MDT on Tuesday, November 1, 2016 for approximately 8 hours.

Custom data orders will be held and automatically restarted after the work is complete.

Change notice from the Daily B:
The Yellowstone, Geyser, and Caldera clusters will be unavailable from 8 a.m. to 6 p.m. MDT Tuesday, November 1, to allow CISL staff to apply a critical security patch. The GLADE file systems will be unavailable from 8 a.m. to noon.

A system reservation will be put in place 12 hours before the scheduled downtime. Users’ jobs with specified job times that overlap the reservation period will remain on hold until the system is restored to service. Running jobs that have not finished by 8 a.m. Tuesday will be killed and will need to be resubmitted after the maintenance period.

We apologize for any inconvenience this might cause. Users will be informed via the CISL Notifier service when the systems are returned to production.

2016-10-28

How to work around operational changes to GFS/GDAS/FNL

Operational models are moving targets.
Any changes in software for an operational analysis can result in spurious signals or shifts. Operational systems' software is frequently changed as they uncover bugs or biases and fix them; code segments are improved to better represent atmospheric phenomena. The changes are usually not announced ahead of time. The change log may be difficult for the non-expert user to decipher. Thus, operational analyses are not appropriate for compiling a long time series to study changes over time, e.g. to look for climate signals.
A handful of users in the last week have contacted rdahelp@ucar.edu about problems initializing WRF with FNL data spanning 2016-05-11. Why does WPS, the WRF Preprocessing System, crash right at this juncture?

The answer lies in the operational changes made to the GFS/GDAS system at NOAA NCEP. You can view the full list of changes to GFS/GDAS since 1991 including the announcement of the changes effective 2016-05-11 at 12 UTC.
- Addition of five layers in the upper stratosphere in gridded output
This change affects ds084.1 NCEP GFS 0.25 Degree, ds083.2 NCEP FNL 1.0 Degreeds083.3 NCEP FNL 0.25 Degree and ds335.0 Historical Unidata Internet Data Distribution (IDD) Gridded Model Data.

The first thing to do when things break is to examine your data. You can use wgrib2, or take a close look at the metadata for your input data set, as I've shown earlier, in WRF Vtable Carpentry.

Scroll down to the "Vertical Levels" section of the data set information page and click on "detailed metadata."

You will see that six parameters (HGT, TMP, RH, UGRD, VGRD, 03MR), many used by WRF, received five additional levels (1, 2, 3, 5, 7 mb) beginning 2016-05-11 at 12 UTC.


These changes do not affect the performance of ungrib.exe, but they may impact real.exe and wrf.exe.

The RDA team (rdahelp@ucar.edu) helps our data users with WRF WPS up to providing an accurate Vtable that describes the data obtained from us. The WRF team (wrfhelp@ucar.edu) picks up support from there. This problem spans both support groups.

I contacted WRF to coordinate our help and I am posting information beyond RDA's normal scope here as a service to our users so that all the information to get unstuck resides in one place.

First, you can work around the problem by removing the new vertical levels in the files after the GFS/GDAS change.  Then all of your WPS input will be consistent.  You can do this yourself with wgrib2, or our "Get a Subset" tool.

UPDATE Nov 18, 2016
The following method, provided by wrfhelp, does not work. You need to use the first method above. For details on how to achieve this using wgrib2, read GRIB2 file carpentry.

If you want to use all of the available data in a WRF run that spans a discontinuity of data:
  1. Run ungrib.exe as usual
  2. Using a namelist that accurately describes the number of vertical levels found in the input GRIB2 files before the change (27 in this case), run metgrid.exe and real.exe for the times before the change.
  3. Edit your namelist to the new number of vertical levels (32) and then run metgrid.exe and real.exe on the remainder of the GRIB2 input files for the period after the change.
  4. This generates a complete set of wrfbdy and wrfinput files for your entire WRF run.
  5. Run wrf.exe up until the discontinuity with 27 in the namelist.  
  6. Then edit the namelist for 32 vertical levels and restart wrf.exe for the remainder of the run.
Pat yourself on the back. You've just completed numerical integration of a complex system of differential equations around a discontinuity. ;-)

2016-10-18

Trip Report: Rocky Mountain Celebration of Women in Computing 2016

Last month, I attended Rocky Mountain Celebration of Women in Computing 2016 (RMCWiC) along with Dr Natasha Flyer and (soon-to-be Dr) Delilah Feng.  The 280+ registered attendees (including over 160 college students) overwhelmed the Hotel RL in downtown Salt Lake City.

Can you find the NCAR CISL representatives in this photo?
The Rocky Mountain Celebration of Women in Computing, an ACM (Association for Computing Machinery) Celebration event, is the biennial event for the Rocky Mountain Region that encourages the career interests of women in computing...
NCAR's Computational Information Systems Laboratory (CISL) jointly sponsors RMCWiC and we actively recruit visiting students and scientists at these meetings.

My favorite part was the poster session, where students presented their work.  The breadth, quality of work and enthusiasm were very high.  I enjoyed talking to the students so much, they ran out of dessert before I remembered they were serving it!
A very lively poster session.
Computing is very broad; even the work within CISL is very broad.  I lingered longest at the talks featuring embedded sensors, real-time data processing and data visualization.  When one student told me she was ready to tackle higher dimensional data problems, I was thrilled.
This student presented data visualization research about how to effectively convey uncertainty, a subject of high importance in meteorology and climatology.
If you are a student or recent PhD, consider applying to one of NCAR CISL's visitor programs.  We are particularly interested in hosting undergraduate and graduate students for our Summer Internships in Parallel Computational Science (SIParCS) program.  These summer internships pay local living expenses and cash stipends.  Furthermore, SIParCS students work with one or more mentors throughout their stay and may be awarded travel stipends to present their SIParCS work at conferences such as AGU and AMS.

I worked part-time or was a stay-at-home mom for many years before re-entering the scientific computing workforce.  Career re-entry candidates are also encouraged to apply.  Feel free to contact me about career re-entry opportunities offered by NSF, NASA and other government agencies.

2016-09-28

WRF Vtable Carpentry

When we announced the introduction of WRF-able data sets (Vtables auto-generated from RDA-collected metadata) in January 2016, WRF Preprocessing System (WPS) component, ungrib, was able to use the RDA Vtables without any hiccups.

Around the time of the release of WRF 3.8.0 in April 2016, rdahelp@ucar.edu began receiving reports of error messages when running ungrib with RDA-provided Vtables.  If the Vtable includes lines describing fields that are not contained in that particular input GRIB file, ungrib stops instead of ignoring that field (as it had done previously.)

WPS V3.8 Updates

The simple solution is for the user to edit the RDA-provided Vtable to remove the extraneous line(s) referring to fields not contained in the input GRIB file.

After investigation and consideration, wrfhelp and rdahelp decided not to change either the WPS code or the auto-generated Vtables to return to plug and play.  Here's why.

The RDA provides archives of analysis, forecast and reanalysis data sets.  Reanalysis data sets contain the same variables/parameters throughout the entire time series.  Operational analysis and forecast systems vary over time.  Parameters or vertical levels may (and should!) change over the time series.

With advances in computational power and modeling of physical processes, newer fields and levels can be added.  Bandwidth is limited.  Dropping vestigial parameters when they are replaced with better ones is necessary.

RDA-provided Vtables are complete.  They include lines describing every WRF-usable parameter found in the data set time series.  To see how parameters and vertical levels changed over time, click on the "Variables by dataset product" or "detailed metadata for level information" links from the data set home pages.

Consider the long-running NCEP analysis series, FNL, ds083.2. When it began in 1999, it was in GRIB (aka GRIB1) format. Beginning in 2007, it is offered in GRIB2 format. Several times, the model was changed and fields were added (and sometimes subtracted.)

View the vertical level information of long-running NCEP analysis series, FNL, ds083.2.  Select "detailed metadata" to view GRIB1 details; select "GRIB2 level table" to view GRIB2 details.

Defaults to "Parameter View".  Also explore "Vertical Level View."
Notice that the soil layers expanded from a single layer 10-200 cm below the surface to three layers in 2005
Vtable.RDA_ds083.2 contains lines describing the entire time series from 1999 to the present. GRIB1 codes are on the left; GRIB2 codes are on the right. Notice how there are lines with GRIB1 or GRIB2 codes only. (One set of columns is blank.) If you are using a GRIB2 file (2007 and later), then edit out the lines that pertain only to GRIB1 files (GRIB2 columns blank.) If you are using a GRIB1 file, delete the lines referring to parameters found only in the GRIB2 files.

Delete the lines that refer to variables not found in the file you are using.
Similarly, if you are using a 2005 and older GRIB1 file, delete the six lines referring to three layers of soil temperature or moisture between 10 and 200 cm. If you are using a 2005-05-31 and later GRIB1 or GRIB2 file, delete the two lines referring to Soil temperature/moisture 10-200 cm below ground.

These soil layers are found only in FNL files 2005-05-31 and later.  Delete for earlier dates.
In closing, "plug and play" is both a blessing and a curse.  We all want things that "just work" right out of the box.  But, your data should not be a black box.  You don't need to become an expert in the arcana of data syntax, but you should peek "under the hood" and understand what is in your data file.

To learn more about GRIB data sets, read our GRIB blog series, particularly Setting up to work with GRIB2. Use wgrib and wgrib2 to explore the contents of your GRIB or GRIB2 files. You might find a parameter you aren't already using but might be helpful. Or, you may see a lot of parameters and vertical layers that you don't need. In that case, may I suggest you Subset to save time and bandwidth?

2016-09-16

NCAR RDA partial outage September 19, 2016

The NCAR CISL High Performance Storage System (HPSS) will be down for Monday, September 19, 2016,  to prepare for the installation of Cheyenne, NCAR's new high-performance supercomputer.


Although the RDA's primary and backup copies of data are stored on HPSS, the most popular data, approximately a third of our holdings, are replicated on our web server for faster data access. Thus, most users will not be impacted.

Users requesting HPSS data will experience a delay, but need not do anything special.  HPSS data services will automatically resume upon completion of this work.
SCHEDULED OUTAGE NOTIFICATION

HPSS will be down next Monday Sept 19 from 8:30 AM to 6:30 PM for hardware and systems integration work to help prepare the way for HPSS support of Cheyenne.

START: Mon Sep 19 2016 8:30 AM MDT
END: Mon Sep 19 2016 6:30 PM MDT

2016-09-08

NCAR RDA outage September 14, 2016

UPDATE Wed Sep 14 14:13:35 MDT:
All RDA data services are back online. Thank-you for your patience.

UPDATE:
NCAR computers will be down for 10 hours, beginning Wed Sep 14 2016 6:00 AM MDT.  All RDA services will be down during that time.

START: Wed Sep 14 2016 6:00 AM MDT
END: Wed Sep 14 2016 16:00 AM MDT

NCAR CISL central file disk systems will be down for system maintenance from 7:00 on Wednesday September 14, 2016 until 10:00 MDT. All data downloads, including access through THREDDS servers, will be unavailable during this outage. Custom data orders will be held and automatically restarted after the work is complete.
START: Wed Sep 14 2016 7:00 AM MDT
END: Wed Sep 14 2016 10:00 AM MDT

SERVICES AFFECTED
GLADE Administration

2016-08-30

How to access a restricted data set

I alluded earlier to a few* exceptions to open access.  They are open to varying degrees due to restrictions from the data providers (not from RDA.)

To see what is available to you,
  • Click on "dashboard" (at the top of the RDA web portal, right next to "sign out"
  • Then click on "Edit/Change Profile"
  • The bottom half of the screen for an unaffiliated user will look something like this:
Restricted data sets at the RDA.
  • ECMWF Operational data is only accessible to users at UCAR member institutions in the U.S. and Canada, as well as users at all other U.S. universities and government institutions. Eligible users must agree to the ECMWF Terms of Use (TOU).
  • JRA products are available to all affiliated users.
  • COSMIC data access is granted only to affiliated users who have also registered at CDAAC.
  • ECMWF Data (Other Than Operational Data) is available to all registered users who agree to their TOU.
Suppose you want to compare how JRA-55 (ds628.0) and NCEP/NCAR Global Reanalysis (ds090.0) differ over your region of interest.  Read their data set Description pages.  Note that ds628.0 has sections labeled "Access Restrictions" and "Usage Restrictions" while ds090.0 does not.

JRA-55 Access and Usage Restriction information.
Click the "Data Access" tab for JRA-55 and you will see the screen below if you are nonaffiliated. If you are eligible (affiliated), you will be taken to a page with the Terms of Use. Read and accept the Terms of Use and you are then able to access the data.

Nonaffiliated users see this.
If you do have a valid and current affiliation, edit your profile.  Reply to the verification email we send, and then you will be able to read and accept the Terms of Use.
Edit your profile here.  We WILL confirm your email to be sure that it is current.
Some have been reluctant to update their email/affiliations when they leave their universities. Losing access to something you once had is a bummer. I still miss the Olympic-sized swimming pool at Berkeley.

But, users of restricted datasets need to reverify their status by replying to verification email notices sent out every six months. Thus, you don't lose anything by keeping your email and affiliation status up-to-date. In fact, you lose access to email notices that we send out about outages and important updates about data you have downloaded from us in the past.

* Only 78 of our 600+ data sets belong to one of the restricted classes of data sets.