Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGSday2CoreNGS-Tue
# -or-
idev -m 90 -N 1 -A OTH21164 -p development 

Data staging

Set ourselves up to process some yeast data data in $SCRATCH, using some of best practices for organizing our workflow.

Code Block
languagebash
titleSet up directory for working with FASTQs
# Create a $SCRATCH area to work on data for this course,
# with a sub-directory for pre-processing raw fastq files
mkdir -p $SCRATCH/core_ngs/fastq_prep

# Make symbolic links to the original yeast data:
cd $SCRATCH/core_ngs/fastq_prep
ln -s -f $CORENGS/yeast_stuff/Sample_Yeast_L005_R1.cat.fastq.gz
ln -s -f $CORENGS/yeast_stuff/Sample_Yeast_L005_R2.cat.fastq.gz

# or
ln -s -f ~/CoreNGS/yeast_stuff/Sample_Yeast_L005_R1.cat.fastq.gz
ln -s -f ~/CoreNGS/yeast_stuff/Sample_Yeast_L005_R2.cat.fastq.gz

# or
ln -s -fsf /work/projects/BioITeam/projects/courses/Core_NGS_Tools/yeast_stuff/Sample_Yeast_L005_R1.cat.fastq.gz
ln -s -fsf /work/projects/BioITeam/projects/courses/Core_NGS_Tools/yeast_stuff/Sample_Yeast_L005_R2.cat.fastq.gz

...

Tip
titlePathname wildcarding

The asterisk character ( * ) is a pathname wildcard that matches 0 or more characters.

Read more about pathname wildcards here: Pathname wildcards and special characters

Exercise: About how big are the compressed files? The uncompressed files? About what is the compression factor?

...

Warning

Both gzip and gunzip are extremely I/O intensive when run on large files.

While TACC has tremendous compute resources and the Lustre its specialized parallel file system is great, it has its limitations. It is not difficult to overwhelm the Lustre TACC file system if you gzip or gunzip more than a few files at a time – as few as 35-46!

The intensity of compression/decompression operations is another reason you should compress your sequencing files once (if they aren't already) then leave them that way.

...

  • q – quit
  • Ctrl-f or space – page forward
  • Ctrl-b – page backward
  • /<pattern> – search for <pattern> in forward direction
    • n – next match
    • N – previous match
  • ?<pattern> – search for <pattern> in backward direction
    • n – previous match going back
    • N – next match going forward

If you start less with the -N option, it will display line numbers.q

Exercise: What line of small.fq contains the read name with grid coordinates 2316:10009:100563?

...

So what if you want to see line numbers on your head or tail output? Neither command seems to have an option to do this.

...

So what is that vertical bar ( | ) all about? It is the pipe symboloperator!

The pipe symbol operator ( | ) connects one program's standard output to the next program's standard input. The power of the Linux command line is due in no small part to the power of piping. Read more about piping here: Piping. And read more about standard I/O streams here: Standard Streams in Unix/Linux.

...