Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are three types of quoting in the shell:

  1. single quoting (e.g. 'some text') – this serves two purposes
    • it groups together all text inside the quotes into a single argument that is passed to the command
    • it tells the shell not to "look inside" the quotes to perform any evaluations
      • any environment variables in the text – or anything that looks like an environment variable – are not evaluated
      • no pathname globbing (e.g. *) is performed
  2. double quoting (e.g. "some text") – also serves two purposes
    • it groups together all text inside the quotes into a single argument that is passed to the command
    • it inhibits pathname globbing, but allows environment variable evaluation
  3. backtick quoting (e.g. `date`)
    • evaluates the expression inside the backticks
    • the resulting standard output of the expression replaces the backticked text

...

Sitting at the computer, you should have some idea what you need to do. There's probably a command to do it. If you have some idea what it starts with, you can type a few characters and hit Tab twice to get some help. If you have no idea, you Google it or ask someone else.

...

Most built-in commands in Linux use a common syntax to ask more of a command. They usually add a dash ( - ) followed by a code letter that names the added function. These "command line switches" are called options.

Options are, well, optional – you only add them when you need them. The part of the command line after the options, like filenames, are called arguments. Arguments can also be optional, but you can tell them from options because they don't start with a dash.

Code Block
languagebash
titleUseful options for ls
# long listing option (-l)
ls -l

# long listing (-l), all files (-a) and human readable file sizes (-h) options. $HOME is an argument (directory name)
ls -l -a -h $HOME

# sort by modification time (-t) displaying a long listing (-l) that includes the date and time
ls -lt

Almost all built-in Linux commands, and especially NGS tools, use options heavily.

Like dialects in a language, there are at least three basic schemes commands/programs accept options in:

  1. Single-letter short options, which start with a single dash ( - ) and can often be combined, like:

    Code Block
    languagebash
    titleExamples of different short options
    head -20 # show 1st 20 lines
    ls -lhtS (equivalent to ls -l -h -t -S)
    
  2. Long options use the convention that double dashes ( -- ) precede the multi-character option name, and they can never be combined. Strictly speaking, long options should be separated from their values by the equals sign ( = ) according to the Posix standard. POSIX standard (see https://en.wikipedia.org/wiki/POSIX). But most programs let you use a space as separator also. Here's an example using the mira genome assembler:

    Code Block
    languagebash
    titleExample of long options
    mira --project=ct --job=denovo,genome,accurate,454 -SK:not=8
    
  3. Word options, illustrated in the GATK command line to call SNPs below. Word options combine aspects of short and long options – they usually start with a single dash ( - ), but can be multiple letters and are never combined. Sometimes the option (e.g. java's -Xms initial memory heap size option), and its value (512m which means 512 megabytes) may be smashed together. Other times a multi-letter switch and its value are separated by a space (e.g. -glm BOTH).

    Code Block
    languagebash
    titleExamples of word options
    java -d64 -Xms512m -Xmx4g -jar /work/01866projects/phr254BioITeam/gshare/Tools_And_Programs/bincommon/opt/GenomeAnalysisTK.jar -glm BOTH -R $reference -T UnifiedGenotyper -I $outprefix.realigned.recal.bam --dbsnp $dbsnp -o $outprefix.snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance
    

...

Getting help
Anchor
Getting help
Getting help

So you've noticed that options can be complicated – not to mention program arguments. Some options have values and others don't. Some are short, others long. How do you figure out what kinds of functions a command (or NGS tool) offers? You need help!

--help option

Many (but not all) built-in shell commands will give you some help if you provide the long --help option. This can often be many pages, so you'll probably want to pipe the output to a pager like more. This is most useful to remind yourself what the name of that dang option was, assuming you know something about it.

...

Many 3rd party tools will provide extensive usage information if you just type the program name then hit Enter.

For example:

Code Block
titleUse the program name alone as a command to get help
module load bwa
bwa

...

Code Block
titlebwa top-level help information
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.712-r441r1039
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

Notice that bwa, like many NGS programs, is written as a set of sub-commands. This top-level help displays the sub-commands available. You then type bwa <command> to see help for the sub-command:

...

Code Block
titlebwa top-level help information
Usage:   bwa index [options] <in.fasta>

Options: -a bwtsw|is] [-c] <in.fasta>
Options: -a STR    BWT construction algorithm: bwtsw or is [auto]
         -p STR    prefix of the index [same as fasta name]
         -b INT    block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
         -6        index files named as <in.fasta>.64.* instead of <in.fasta>.*

Warning: `-a bwtsw' does not work for short genomes, while `-a is' and
         `-a div' do not work not for long genomes. Please choose `-a'
         according to the length of the genome.

Google

Google

If you don't already If you don't already know much about a command (or NGS tool), just Google it! Try something like "bwa manual" or "rsync man page". Many tools have websites that combine tool overviews with detailed option help. Even for built-in Linux commands, you're likely to get hits of a tutorial style, which are more useful when you're getting started.

...

  • ls - list the contents of the current speicified directory
    • -l says produce a long listing (including file permissions, sizes, owner and group
    • -a says show all files, even normally-hidden dot files whose names start with a period ( . )
    • -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)
  • cd <whereto> - change the current working directory to <whereto>. Some special <wheretos>:
    •  .. (period, period)
    cd <whereto> - change the current working directory to <whereto>. Some special <wheretos>:
    •  .. (period, period) means "up one level"
    •  ~ (tilde) means "my home directory"
  • file <file> tells you what kind of file <file> is
  • df shows you the top level directory structure of the system you're working on, along with how much disk space is available
    • -h says to show sizes in human readable form (e.g. 12G instead of 12318201749)
  • pwd - display the present working directory.
    present working directory
    • -P says to display the full absolute path
    • the format is something like /home/myID
    • just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path"

Create, rename, link to, delete files

  • touch <file> – create an empty file, or update the modification timestamp on an existing file
  • mkdir -p <dirname> – create directory <dirname>.  
    • -p says to create any needed subdirectories also
  • mv <file1> <file2> – renames <file1> to <file2>
    • mv <file1> <file2> ... <fileN> <dir>/  – moves files <file1> <file2> ... <fileN> into directory <dir>
  • ln -s <path> creates a symbolic (-s) link to <path> in the current directory
    • default link name corresponds to the last name component in <path>
    • always change into (cd) the directory where you want the link before executing ln -s
    • a symbolic link can be deleted without affecting the linked-to file
  • rm <file> deletes a file. This is permanent - not a "trash can" deletion.
    • rm -rf  <dirname> deletes an entire directory (be careful!)

Displaying file contents

  • cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
    • CAUTION – only use on small files!
  • zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
    • CAUTION – only use on small files!
    • Another CAUTION – does not understand .zip or .bz2 compression formats
  • more and less "pagers"
    • both display their (possibly very long) input one Terminal "page" at a time
    • in more, :
      • use spacebar to advance a page
      ;
      • use q or Ctrl-c to exit more
    • in less:
      • q – quit
      • Ctrl-f or space – page forward
      • Ctrl-b – page backward
      • /<pattern> – search for <pattern> in forward direction
        • n – next match
        • N – previous match
      • ?<pattern> – search for <pattern> in backward direction
        • n – previous match going back
        • N – next match going forward
  • head and tail
    • show you the top or bottom 10 lines (by default) of their input
    • head -20 show the top 20 lines
    • tail -2 shows the last 2 lines
    • tail -n +100 shows lines starting at line 100
    • tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
  • gunzip -c <file.gz> | more (or less) – like zcat, uncompresses lines of <file.gz> and outputs them to standard output
    • <file.gz> is not altered on disk
    • always pipe the output to a pager!

...