Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Most sequencing facilities will give you compressed sequencing data files
    • gzip format (.gz extension) for individual files
    • tar or zip format for directories of files
  • Even with compression it's easy to run out of storage space!

You may be tempted un-compress decompress your sequencing files to manipulate them more directly

  • resist the temptation to gunzip!
  • nearly all modern bioinformatics tools are able to work on .gz files
  • there are techniques for working with compressed files without ever un-compressing decompressing them

arrange adequate storage space

  • At TACC
    • Obtain an allocation on TACC's corral disk array (initial 5 TB are no-cost)
    • Stage your active projects on corral or $WORK$WORK2
      • copy data to $SCRATCH for analysis
      • copy important analysis products back to corral or $WORK$WORK2
    • Periodically back up corral or $WORK2 directories to ranch tape archive
  • On a UT Biomedical Research Support Facility (BRCF) "POD"
    • See https://wikis.utexas.edu/display/RCTFusers
      • Home and Work areas on POD servers are automatically backed up weekly
        • and archived to ranch every 4-6 months
    • GSAF customers can obtain a no-cost 2 TB allocation on the shared GSAF POD

backup analysis artifacts regularly

  • All TACC users automatically have a 2 TB allocation TACC's ranch tape archive system
    • larger allocations can be requested by project owners in the TACC User Portal
    • free! and under-utilized
  • Periodically back up your corral or $WORK$WORK2 directories to ranch tape archive
    • large directories should be combined first using the tar program

distinguish between types of data

...

While a project is active you will want to keep more intermediate artifacts for reference. Many of these can be deleted removed after publication.

track your analysis steps

...