You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

This page should serve as a reference for the many "things Linux" we use in this course.

Getting around in the shell

Important keyboard shortcuts

Type as little and as accurately as possible by using keyboard shortcuts!

Tab key completion

The Tab key is your best friend! Hit the Tab key once or twice - it's almost always magic! Hitting Tab invokes "shell completion", instructing the shell to try to guess what you're doing and finish the typing for you. On most modern Linux shells, Tab completion will:

  • complete file or directory names up to any ambiguous part (single Tab)
    • if nothing shows up, there is no unambiguous match
  • display all possible completions (Tab twice)
    • you then decide where to go next
  • work for shell commands too (like rsync or chmod)

Up arrow

Use "up arrow" to retrieve any of the last 500 commands you've typed, going backwards through your history. You can then edit them and hit Enter (even in the middle of the command) and the shell will use that command. The down arrow "scrolls" forward from where you are in the command history.

Ctrl-a, Ctrl-e

You can use Ctrl-a (holding down the "control" key and "a") to jump the cursor right to the beginning of the line. The omega to that alpha is Ctrl-e, which jumps the cursor to the end of the line. Arrow keys work, and Ctrl-arrow will skip by word forward and backward.

Wildcards and special file names

The shell has shorthand to refer to groups of files by allowing wildcards in file names.

* (asterisk) is the most common filename wildcard. It matches "any length of any characters".

Other useful ones are brackets ( [ ] ) to allow for any character in the list of characters between the brackets. And you can use a hyphen ( - ) to specify a range of characters

For example:

  • ls *.bam – lists all files in the current directory that end in .bam
  • ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
  • ls [ABab]*.bam – lists all .bam files whose 1st letter is A, B, a or b.

Three special file names:

  1. . (single period) means "this directory".
  2. .. (two periods) means "directory above current." So ls -l .. means "list contents of the parent directory."
  3. ~ (tilde) means "my home directory".

Environment variables

Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:

Set an environment variable
export varname="Some value, here it's a string"

Be careful – do not put spaces around the equals sign when assigning environment variable values. Also, always use double quotes if your value contains (or might contain) spaces.

You set environment variables using the bare name (varname above).

You then refer to or evaluate an environment variables using a dollar sign ( $ ) before the name:

Refer to an environment variable
echo $varname

The export keyword when you're setting ensures that any sub-processes that are invoked will inherit this value. Without the export only the current shell process will have that variable set.

Use the env command to see all the environment variables you currently have set.

Using Commands

Command options

Sitting at the computer, you should have some idea what you need to do. There's probably a command to do it. If you have some idea what it starts with, you can type a few characters and hit Tab twice to get some help. If you have no idea, you Google it or ask someone else.

Once you know a basic command, you'll soon want it to do a bit more - like seeing the sizes of files in addition to their names.

Most built-in commands in Linux use a common syntax to ask more of a command. They usually add a dash ( - ) followed by a code letter that names the added function. These "command line switches" are called options.

Options are, well, optional – you only add them when you need them. The part of the command line after the options, like filenames, are called arguments. Arguments can also be optional, but you can tell them from options because they don't start with a dash.

Useful options for ls
# long listing option (-l)
ls -l

# long listing (-l), all files (-a) and human readable file sizes (-h) options. $HOME is an argument (directory name)
ls -l -a -h $HOME

# sort by modification time (-t) displaying a long listing (-l) that includes the date and time
ls -lt

Almost all commands, and especially NGS tools, use options heavily.

Like dialects in a language, there are at least three basic schemes commands/programs accept options in:

  1. Single-letter short options, which start with a single dash ( - ) and can often be combined, like:

    Examples of different short options
    head -20 # show 1st 20 lines
    ls -lhtS (equivalent to ls -l -h -t -S)
    
  2. Long options use the convention that double dashes ( -- ) precede the multi-character option name, and they can never be combined. Strictly speaking, long options should be separated from their values by the equals sign ( = ) according to the Posix standard. But most programs let you use a space as separator also. Here's an example using the mira genome assembler:

    Example of long options
    mira --project=ct --job=denovo,genome,accurate,454 -SK:not=8
    
  3. Word options, illustrated in the GATK command line to call SNPs below. Word options combine aspects of short and long options – they usually start with a single dash ( - ), but can be multiple letters and are never combined. Sometimes the option (e.g. java's -Xms initial memory heap size option), and its value (512m which means 512 megabytes) may be smashed together. Other times a multi-letter switch and its value are separated by a space (e.g. -glm BOTH).

    Examples of word options
    java -d64 -Xms512m -Xmx4g -jar /work/01866/phr254/gshare/Tools_And_Programs/bin/GenomeAnalysisTK.jar -glm BOTH -R $reference -T UnifiedGenotyper -I $outprefix.realigned.recal.bam --dbsnp $dbsnp -o $outprefix.snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance
    

Getting help

So you've noticed that options can be complicated – not to mention program arguments. Some options have values and others don't. Some are short, others long. How do you figure out what kinds of functions a command (or NGS tool) offers? You need help!

--help option

Many (but not all) built-in shell commands will give you some help if you provide the long --help option. This can often be many pages, so you'll probably want to pipe the output to a pager like more. This is most useful to remind yourself what the name of that dang option was, assuming you know something about it.

-h option

The -h option is similar to --help. If --help doesn't work, try -h. Again, output can be lengthy and best used if you already have an idea what the program does.

just type the program name

Many 3rd party tools will provide extensive usage information if you just type the program name then hit Enter.

For example:

Top-level bwa sub-commands
 

Google

If you don't already know about a command (or NGS tool), just Google it. Try something like "bwa manual" or "rsync man page". Many tools have websites that combine tool overviews with detailed option help.

man pages

 

 

Sometimes man lets you down - no man page. Don't fret, try one of these:

  1. Just type in the command and hit return - it will usually try to help you.
  2. Type the command followed by one of: -h-help--help-? and may give you some help.
    Sometimes the command by itself will give you short help, and will list the magic option for full help.

 

man pages should detail all options available for a command. Unless there's no man page.

man pages

Man pages - linux has had built-in help files since the mid-1500's, way before Macs or PCs thought of such things. In linux they're called man pages - short for "manual"; it's not a gender thing (I assume). man intro will give you an introduction to all user commands.

Exercise:

Try "man grep", or "man du", or "man sort" - you'll want these sometime.

Tip: Type the letter q to quit man, j and k/<CR> to move up and down by line, b or spacebar up/down by page. Want to search? Just hit the slash key /, enter the search word and hit enter. These are actually the tools of the less command which man is using.

Basic linux commands you need to know

Here's a copy of the cheat sheet we passed out.

And here's  a set of commands you should know, by category.

File system navigation

  • ls - list the contents of the current directory
  • cd <whereto> - change the present working directory to <whereto>. Some special <wheretos>:
    • .. (period, period) means "up one level"
       ~ (tilde) means "my home directory"
  • file <file> tells you what kind of file <file> is
  • df shows you the top level of the directory structure of the system you're working on, along with how much disk space is available
  • pwd - display the present working directory. The format is something like /home/myID - just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".

Create and delete files

  • mkdir -p <dirname> create directory <dirname>.
  • rm <file> deletes a file. This is permanent - not a "trash can" deletion.
  • ln -s create a symbolic link

Displaying file contents

  • cat <file> outputs all the contents of <file> - CAUTION - only use on small files.
  • more <file> and
  • less <file> both display the contents of <file> in nice ways. Read the bit above about man to figure out how to navigate and search when using less
  • head <file> and tail <file> shows you the top or bottom 10 lines of a file <file>

Copying files and directories

  • cp <source> <destination> copies the file source to the location and/or file name destination}. Using . (period) means "here, with the same name". * cp -r <dirname> <destination> will recursively copy the directory dirname and all its contents to the directory destination.
  • scp <user>@<host>:<source> <destination> works just like cp but copies source from the user user's directory on remote machine host to the local file destination
  • wget <url> fetches a file from a valid URL.

Miscellaneous commands

  • xxx

 

 

 

 

 

 

 

  • No labels