2015-11-11

What is GRIB2?

I want to expand upon a couple of points that I made in What is GRIB? because we continue to receive questions at rdahelp related to these two ideas.  I  hope, by explaining the why of GRIB, you can better use data in this format.


I ended What is GRIB? with a paragraph alluding to its atomic* nature.
The gridded fields of values within GRIB files are sometimes referred to as a 'message', a nod to its heritage as a transmission format. Each message is 'atomic' in the sense that it is the smallest unit of data that makes any sense. They contain a starting point, an end point (in bytes), GRIB codes describing the message contents, and the data values themselves.
GRIB and BUFR are WMO transmission formats.  Data is highly vulnerable to network interruptions during transmission.  If a network outage occurs after transmitting 299 MB of a 300 MB file, it would be desirable to restart the transmission mid-file rather than retransmit from the beginning. With atomic data, you can restart the transmission at the point of the last complete atomic message.

This means that a GRIB file can be an aggregation of smaller GRIB units.  You can do this yourself with the simple UNIX command, cat.

e.g. 'cat infile1.grib infile2.grib .... infilen.grib > outfile.grib'
This works with GRIB1 or GRIB2 files.  Try this at home.

Transmission of large GRIB analysis files can be interrupted by transmission of smaller data messages such as station data in BUFR format.  The entire analysis file, with all records for a time stamp, can be reconstituted by the data receiver by removing the interleaved unrelated data messages.

The atomic nature of GRIB files means certain elements, such as the grid information, can be repeated many times in a file.  While this repetition is helpful during data transmission, it can bloat file sizes without compression.

This is one reason why GRIB2 is compressed.

Wesley Ebisuzaki wrote in Introduction to GRIB2 using the GFS forecasts:
GRIB2 is a transmission format so compression is a high priority. Starting with a GFS forecast file,  converting it to netcdf-3 increases the file size by 6.4 times.
This is why I warn against converting from GRIB2 to GRIB1 or NetCDF for most users.

Because grid information is similar in nature to imagery, it's not surprising that jpeg compression works so well to reduce GRIB file sizes.

Many users have begged for translation from GRIB2 to GRIB1 format so that they don't have to install jpeg code libraries to use GRIB2 data.  I sympathize--I don't like to install code libraries only to chase down compile errors and code dependencies either.

But, the RDA has limited bandwidth.  We need to use it wisely so that we can continue to serve all users efficiently.  We cannot afford to use more than twice the bandwidth necessary to send data.

I've written a guide to painlessly get started with using GRIB2 data on UNIX systems.  The guide walks you through installation of wgrib2, the Ginsu knife of GRIB2 tools.  This guide works for various flavors of UNIX including Linux and OS X.  There will be no Windows guide.

 If you must use NetCDF, you can download data in GRIB2 and then use wgrib2 to convert to NetCDF as you need it.
'wgrib2 infile.grib2 -netcdf outfile.nc'

The wgrib2 tar package includes the JasPer library, an open-source implementation of the JPEG-2000 standard.  This is much easier than downloading and installing JasPer alone.

Key points:
  1. GRIB files need compression, hence GRIB2.  Setting up to work with GRIB2 is not hard if you follow the guide.
  2. The atomic nature of GRIB allows you to make your own GRIB files simply by appending them together.  This is a powerful ability that can impress others that don't know how easy this is (like your PhD advisor).

* I mean atomic in the original Greek meaning of the word--the smallest unit that can not be broken down further without losing its nature.

I do not entirely agree with Wesley's statement that, "Grib messages are like atoms. Each atom is compete and takes up a finite volume. If you had small fingers, you could grab that sodium atom and place it over there. A grib message has a starting byte location and an ending byte location."

This is nit-picky, but I sometimes supported myself in graduate school as a teaching assistant for undergraduate Quantum Mechanics. Atoms do not have a finite volume in the sense of a size of box. They are highly localized waves that extend out infinitely. However, their probability function drops off rapidly and become ignorable beyond a certain volume.

Aside:
Optical tweezers can now manipulate and move single cells and even chromosomes within the cells. But, as yet, optical tweezers can grab one single atom, but not move them (yet).

However, you can grab single GRIB atomic messages and concatenate them together.

No comments:

Post a Comment

This section is for people who want to discuss using our data holdings effectively. Moderators will delete irrelevant comments.