...
Here's an example of a "best practice". Wherever your permanent storage area is, it should have a rational sub-directory structure that reflects its contents. It's easy to process a few NGS datasets, but when they start multiplying like tribbles, good organization and naming conventions will be the only thing standing between you and utter chaos!
For example:
- original – for original sequencing data (compressed fastq FASTQ files)
- sub-directories named, for example, by year_month.<project_name>
- aligned – for alignment artifacts (bam BAM files, etc)
- sub-directories named, e.g., by year_month.<project_name>
- analysis – further downstream analysis
- reasonably named subdirectoriessub-directories, often by project
- genome – reference genomes and other annotation files used in alignment and analysis
- sub-directories for different reference genomes
- e.g. ucsc/hg19, ucsc/sacCer3, mirbase/v20
- code – for scripts and programs you and others in your organization write
- ideally maintained in a version control system such as git, subversion or cvs.
- easiest to name sub-directories for people.
...