The University Wiki Service has upgraded the Confluence Server software, from version 5.9.14 to 5.10.8. Please refer to the knowledge base article, KB0015891, for a high level summary of upgrade changes. Thank you!
Skip to end of metadata
Go to start of metadata

Short summary: 

If you have access to Lonestar or Stampede, we recommend you incorporate the BioITeam start-up script into your profile as described here, and then run the command gsaf_download.sh to download your data.  The content of the file "gsaf_download.sh" is shown in the code block below. 

To use the script below, use this general step-wise procedure for downloading your data onto any unix system:

  1. Create a file on the unix system called "gsaf_download.sh"  , and insert the contents in the code block.  Make sure to chmod a+x to make sure the command is executable.
  2.  Copy the URL of the web page that is linked from the GSAF email notifying you that the data is ready to download.  This web page URL is the data key.
  3. Insert the data key into the command line to execute "gsaf_download.sh" as shown below.  The key must be enclosed in double quotations.
  4. Execute the command to retrieve your data.

More information:

The underlying iRODS/iweb system it relies on is the TACC Corral system and is research-grade software.

You will receive an email from the GSAF when your data is available with a link to a web page which then has links to these single-use tickets.

You do NOT need a TACC account to access your data, but since you will be downloading from TACC we advise you to download your data to a TACC resource for the fastest download speeds possible.  

NOTE that the TACC web server providing download functionality does NOT have a proper security certificate, so you must use "--no-check-certificate" with your wget command.

The following suggested bash script takes the URL to the link entitled, "Access your data for JAyynnn from sequencing run SAyynnn here."  It fetches that web page (which is accessible many times) and starts downloading the data files.  It then compares the md5sum checksums of each file to those computed by the GSAF when your data was created to verify the integrity of the data.

#!/bin/bash
wget -O files.html "$1"
for file in `grep '^<!--gsafdata' files.html | grep '.gz' | awk '{print $2}'`
do
    echo $file
    url=`cat files.html | grep -v json | grep -m 1 $file | awk 'BEGIN {FS="\""} {print $2}'`
    echo "Downloading: $url"
    wget -o $file.wget.log -O $file --no-check-certificate "$url"
done
grep '^<!--gsafdata' files.html | grep '.gz' | awk '{print $5"  "$2}' > md5.txt
numfiles=`wc -l md5.txt | awk '{print $1}'`
md5sum -c md5.txt
if [ $? -eq 0 ]
then
    echo "Downloaded $numfiles files successfully."
else
    echo "Calculated md5sums do not match those provided by the GSAF.  Try requesting a new key and downloading again.  If that fails, contact the GSAF."
fi

 

To use this script, create a file on a linux system called (for example) "gsaf_download.sh".  Insert the contents above into that file (for example with the text editor nano), then make sure to chmod a+x gsaf_download.sh to make it executable.

Running it should look something like this (NOTE the use of quotations for the web address):

gsaf_download.sh "http://gsaf.s3.amazonaws.com/JA14227.SA14043.html?AWSAccessKeyId=AKIAJ724J4ZGKJIUA6XA&Expires=1401714026&Signature=obZfDLiPkrZjfieKheJwOe4nbd%2Bs%3D"

 

FAQ:

 How long can I access my data through your data key system?

Specific data keys are only valid for a short period of time, but the GSAF keeps your data indefinitely (unless you request it be deleted). Therefore you can request new data access keys at anytime from your job home page. Both your job submission email and your data delivery email will have a link to your job home page. 

 My data access key expired - how do I get access to my data again?

To get new data access keys, go to your job home page (follow the link in either your job submission or your data delivery), expand the "Sequencing Data" section, and click the link corresponding to the sequencing run from which you want data. Keys arrive via email to ALL email addresses listed in the job submission.

 What do I do if my email address changed and I need to get new data access keys?

If your email address changes, please email the GSAF staff with the job number and both the old and new email address. We will update the email address associated with the job.

 Why does my data come from AWS instead of TACC? Don't you use TACC to store data?

All data is archived at TACC on both their Ranch and Corral subsystems, but net data transfer speeds from these resources are more limited than from AWS. Consequently we use AWS as an intermediary for data delivery.

 If the download does not work the first time, what should I try?

The most common error is the lack of double quotes around the key.  Another possibility is to make sure the text is in unix format.  It you use the Windows environment use the utility "dos2unix" be to convert to the correct format.

  • No labels