Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here's an example of a "best practice". Wherever your permanent storage area is, it should have a rational sub-directory structure that reflects its contents. It's easy to process a few NGS datasets, but when they start multiplying like tribbles, good organization and naming conventions will be the only thing standing between you and utter chaos!

For example:

  • original – for original sequencing data (compressed fastq FASTQ files)
    • sub-directories named, for example, by year_month.<project_name>
  • aligned – for alignment artifacts (bam BAM files, etc)
    • sub-directories named, e.g.,  by year_month.<project_name>
  • analysis – further downstream analysis
    • reasonably named subdirectoriessub-directories, often by project
  • genome – reference genomes and other annotation files used in alignment and analysis
    • sub-directories for different reference genomes
    • e.g. ucsc/hg19, ucsc/sacCer3, mirbase/v20
  • code – for scripts and programs you and others in your organization write
    • ideally maintained in a version control system such as git, subversion or cvs.
    • easiest to name sub-directories for people.

...