If you have access to Lonestar or Stampede, we recommend you incorporate the BioITeam start-up script into your profile as described here, and then run the command
gsaf_download.sh to download your data. The content of the file "gsaf_download.sh" is shown in the code block below.
To use the script below, use this general step-wise procedure for downloading your data onto any unix system:
- Create a file on the unix system called "gsaf_download.sh" , and insert the contents in the code block. Make sure to chmod a+x to make sure the command is executable.
- Copy the URL of the web page that is linked from the GSAF email notifying you that the data is ready to download. This web page URL is the data key.
- Insert the data key into the command line to execute "gsaf_download.sh" as shown below. The key must be enclosed in double quotations.
- Execute the command to retrieve your data.
The underlying iRODS/iweb system it relies on is the TACC Corral system and is research-grade software.
You will receive an email from the GSAF when your data is available with a link to a web page which then has links to these single-use tickets.
You do NOT need a TACC account to access your data, but since you will be downloading from TACC we advise you to download your data to a TACC resource for the fastest download speeds possible.
NOTE that the TACC web server providing download functionality does NOT have a proper security certificate, so you must use "--no-check-certificate" with your wget command.
The following suggested bash script takes the URL to the link entitled, "Access your data for JAyynnn from sequencing run SAyynnn here." It fetches that web page (which is accessible many times) and starts downloading the data files. It then compares the md5sum checksums of each file to those computed by the GSAF when your data was created to verify the integrity of the data.
To use this script, create a file on a linux system called (for example) "
gsaf_download.sh". Insert the contents above into that file (for example with the text editor nano), then make sure to
chmod a+x gsaf_download.sh to make it executable.
Running it should look something like this (NOTE the use of quotations for the web address):
Specific data keys are only valid for a short period of time, but the GSAF keeps your data indefinitely (unless you request it be deleted). Therefore you can request new data access keys at anytime from your job home page. Both your job submission email and your data delivery email will have a link to your job home page.
To get new data access keys, go to your job home page (follow the link in either your job submission or your data delivery), expand the "Sequencing Data" section, and click the link corresponding to the sequencing run from which you want data. Keys arrive via email to ALL email addresses listed in the job submission.
If your email address changes, please email the GSAF staff with the job number and both the old and new email address. We will update the email address associated with the job.
All data is archived at TACC on both their Ranch and Corral subsystems, but net data transfer speeds from these resources are more limited than from AWS. Consequently we use AWS as an intermediary for data delivery.
The most common error is the lack of double quotes around the key. Another possibility is to make sure the text is in unix format. It you use the Windows environment use the utility "dos2unix" be to convert to the correct format.