Child pages
  • How to download your data
Skip to end of metadata
Go to start of metadata

NOTICE: This method of data delivery (AWS) will soon be obsolete, the facility will be transitioning to Illumina's cloud, BaseSpace.  Please visit Illumina's site here to create an account, the GSAF will use the email address provided in the job submission form to transfer data through BaseSpace so please be sure to provide the email address that is linked to your BaseSpace account!!  We anticipate this to be fully implemented by the beginning of February.

For those interested, Introduction to BaseMount

If you have old data that you are not able to retrieve through BaseSpace, please send an email to requesting the data. Provide your job ID (example:JA12345) and sequencing run ID (example: SA17018).  We will make alternate arrangements to deliver the data to you.  Please note that this is a courtesy service we are providing during the transition period;  your first mode of obtaining data should always be BaseSpace.

Data Download

Short summary: 

If you have access to Lonestar or Stampede, we recommend you incorporate the BioITeam start-up script into your profile as described here, and then run the command to download your data.  The content of the file "" is shown in the code block below. 

To use the script below, use this general step-wise procedure for downloading your data onto any unix system:

  1. Create a file on the unix system called ""  , and insert the contents in the code block.  Make sure to chmod a+x to make sure the command is executable.
  2.  Copy the URL of the web page that is linked from the GSAF email notifying you that the data is ready to download.  This web page URL is the data key.
  3. Insert the data key into the command line to execute "" as shown below.  The key must be enclosed in double quotations.
  4. Execute the command to retrieve your data.

More information:

The underlying iRODS/iweb system it relies on is the TACC Corral system and is research-grade software.

You will receive an email from the GSAF when your data is available with a link to a web page which then has links to these single-use tickets.

You do NOT need a TACC account to access your data, but since you will be downloading from TACC we advise you to download your data to a TACC resource for the fastest download speeds possible.  

NOTE that the TACC web server providing download functionality does NOT have a proper security certificate, so you must use "--no-check-certificate" with your wget command.

The following suggested bash script takes the URL to the link entitled, "Access your data for JAyynnn from sequencing run SAyynnn here."  It fetches that web page (which is accessible many times) and starts downloading the data files.  It then compares the md5sum checksums of each file to those computed by the GSAF when your data was created to verify the integrity of the data.

wget -O files.html "$1"
for file in `grep '^<!--gsafdata' files.html | grep '.gz' | awk '{print $2}'`
    echo $file
    url=`cat files.html | grep -v json | grep -m 1 $file | awk 'BEGIN {FS="\""} {print $2}'`
    echo "Downloading: $url"
    wget -o $file.wget.log -O $file --no-check-certificate "$url"
grep '^<!--gsafdata' files.html | grep '.gz' | awk '{print $5"  "$2}' > md5.txt
numfiles=`wc -l md5.txt | awk '{print $1}'`
md5sum -c md5.txt
if [ $? -eq 0 ]
    echo "Downloaded $numfiles files successfully."
    echo "Calculated md5sums do not match those provided by the GSAF.  Try requesting a new key and downloading again.  If that fails, contact the GSAF."


To use this script, create a file on a linux system called (for example) "".  Insert the contents above into that file (for example with the text editor nano), then make sure to chmod a+x to make it executable.

Running it should look something like this (NOTE the use of quotations for the web address): ""



 How long can I access my data through your data key system?

Specific data keys are only valid for a short period of time, but the GSAF keeps your data indefinitely (unless you request it be deleted). Therefore you can request new data access keys at anytime from your job home page. Both your job submission email and your data delivery email will have a link to your job home page. 

 My data access key expired - how do I get access to my data again?

To get new data access keys, go to your job home page (follow the link in either your job submission or your data delivery), expand the "Sequencing Data" section, and click the link corresponding to the sequencing run from which you want data. Keys arrive via email to ALL email addresses listed in the job submission.

 What do I do if my email address changed and I need to get new data access keys?

If your email address changes, please email the GSAF staff with the job number and both the old and new email address. We will update the email address associated with the job.

 Why does my data come from AWS instead of TACC? Don't you use TACC to store data?

All data is archived at TACC on both their Ranch and Corral subsystems, but net data transfer speeds from these resources are more limited than from AWS. Consequently we use AWS as an intermediary for data delivery.

 If the download does not work the first time, what should I try?

The most common error is the lack of double quotes around the key.  Another possibility is to make sure the text is in unix format.  It you use the Windows environment use the utility "dos2unix" be to convert to the correct format.

  • No labels