11 June 2015

Transferring RDA Data with Globus

The RDA supports many different data access pathways for our users, and it should come as no surprise that the most popular method is web downloads via HTTP.  With HTTP, our users can download data files directly from our website, or via cURL and wget commands.

The popularity of web downloads from the RDA has been growing exponentially in recent years.  In 2014 alone, we served more than 1.1 petabytes of data to over 11,000 users. We fulfilled more than 4,000 customized data requests for our users, the majority (if not all) of which were downloaded via HTTP.

In order to keep up with this fast growth in data usage, the RDA has developed the capability for its users to transfer data via the Globus data transfer service.

What is Globus?

Globus (globus.org) can be described in the simplest terms as a third-party data transfer service, and provides a fast, secure, and reliable method for transferring large data volumes.  Data transfers are carried out using the GridFTP protocol and are facilitated by a Globus Connect server running on an IBM cluster housed at NCAR.

Step-by-step instructions on how to create a Globus account and transfer files are available on the Globus website.  To initiate a transfer, a user defines two endpoints—one origination endpoint and one receiving endpoint—and then submits the transfer task to Globus.  Once a data transfer job has been started, Globus takes care of the rest, including processing the transfer and monitoring its performance and progress.  After the transfer completes, Globus sends you an e-mail notification with a report detailing the transfer statistics and performance.

Users can initiate and manage Globus data transfers from their web browser; more savvy users who wish to integrate their data transfers into their automated workflows can use the Globus command line interface (CLI) or Python/Java API client.

How to transfer RDA data using Globus

Starting from any RDA dataset description page (see, for example, the 20th Century Reanalysis dataset page), select the Data Access tab, then select the link labeled 'Request Globus Invitation'.  Globus transfers can also be requested for customized data requests by selecting the 'Globus Download' button on the data download page.

Example Data Access matrix from a dataset page on the RDA website.  Selecting the link labeled 'Request Globus Invitation' initiates a Globus data share for the dataset.

At this point, a pop-up window will appear and you will be asked to confirm that you are requesting a Globus invitation to transfer the data from the dataset.  Once you have submitted the request, Globus will send you an e-mail invitation containing a unique URL to accept the data share invitation.  After accepting the invitation, you may then begin transferring data to your receiving endpoint via the Globus website.

The data transfer interface on the Globus website.  The RDA shared endpoint is shown on the left and the user's receiving endpoint is displayed on the right.


Alternate Identity login

To use the Globus service, users must register for a (free) Globus account.  If you do not have one, Globus will prompt you to register for an account prior to accepting the data share and transferring data. After you have a Globus account, you may then link your RDA and Globus accounts under your Globus user settings (after logging in, select Account –> Identities –> Link another identity).  Doing so will allow you to log into Globus using your RDA user e-mail and password, thus requiring you to remember only one username/password combination.

The Globus alternate identity provider login interface.  RDA users may log into their Globus account using their RDA user e-mail and password by selecting the 'NCAR RDA' identity provider.


Data endpoint management

All Globus data shares created for our users are listed under each user's 'dashboard' (select 'Dashboard' at the top of the RDA home page).  Here, users can view and manage all active Globus shares assigned to their account.  Data shares can be deleted; or, if a user has misplaced the Globus e-mail invitation for a data share, a new one can be sent.

A listing of Globus data shares, displayed in the RDA user's dashboard.