Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Getting to a remote computer

The

...

Terminal window

SSH

ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer.

On Macs, Linux and Windows Git-bash or Cygwin, you run it from a Terminal window. Answer yes to the SSH security question prompt.

Code Block
titleSSH to access Lonestar at TACC
ssh your_TACC_userID@stampede.tacc.utexas.edu

If you're using Putty as your Terminal from Windows:

  • Double-cliek the Putty.exe icon
  • In the PuTTY Configuration window
    • make sure the Connection type is SSH
    • enter stampede.tacc.utexas.edu for Host Name
    • click Open button
    • answer Yes to the SSH security question
  • In the PuTTY terminal
    • enter your TACC user id after the login as: prompt, then Enter

The bash shell

 

When you log in to a Linux computer, the operating system checks your login credentials and if they're OK it sets up some configuration for you and then runs a program called a "shell" which acts like your fast-food drive-thru window to the rest of the operating system. You type commands and hit "enter" to send something into the drive-thru window, and then the OS passes output back through the drive-thru window.

Every time you exchange stuff through this window, it's within a context, like one specific drive-thru window at one restaurant. The directory within the file system is one part of that context; the programs and environment variables available to you are other parts of that context. When you log in, the system and shell agree that you'll start off in your home directory on the system.

 

Essential command-line tricks to look like an expert quickly, or figure out what's going on.

Type as little and as accurately as possible by cheating:

  • Cheat 1: Use "up arrow" to retrieve any of the last 500 commands you've typed. You can then edit them and hit enter (even in the middle of the command) and the shell will use that command.

...

This taps into a feature of the shell - your history. The command history will print to the screen the last 500 commands you've typed. You can modify this number if you'd like. VERY USEFUL TIP - every so often do history >> what which will write your history to a file called "what". I leave these lying around in directories so I can remember what I was doing, how I generated output data, etc. These can often become the basis for a shell script (we'll get to those). Advanced topic: use history to be super-fast at the command line.

  • 10 or later has ssh and scp in Command Prompt or PowerShell (may require latest Windows updates)
    • Open the Start menu → Search for Command
Expand
titleOther Windows ssh/Terminal options

If your Windows version does not have ssh in Command Prompt or PowerShell:

More advanced options for those who want a full Linux environment on their Windows system:

From now on, when we refer to "Terminal", it is either the Mac/Linux Terminal program, Windows Command Prompt or PowerShell, or the PuTTY program.

SSH

ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer. We're going to use ssh to access the Lonestar6 compute cluster at TACC (Texas Advanced Computing Center), where the remote host name is ls6.tacc.utexas.edu.

In your local Terminal window:

Code Block
languagebash
titleSSH to Lonestar6 at TACC
ssh <your_TACC_userID>@ls6.tacc.utexas.edu

# For example:
ssh abattenh@ls6.tacc.utexas.edu
  • Answer yes to the SSH security question prompt
    • this will only be asked the 1st time you access ls6
  • Enter the password associated with your TACC account
    • for security reasons, your password characters will not be echoed to the screen
  • Get your 2-factor authentication code from your phone's TACC Token app, and type it in
Expand
titleLogging in with PuTTY

If you're using PuTTY as your Terminal from Windows:

  • Double-click the Putty icon
  • In the PuTTY Configuration window
    • make sure the Connection type is SSH
    • enter ls6.tacc.utexas.edu for Host Name
      • Optional: to save this configuration for further use:
        • Enter Lonestar6 into the Saved Sessions text box, then click Save
        • Next time select Lonestar6 from the Saved Sessions list and click Load.
    • click Open button
    • answer Yes to the SSH security question
  • In the PuTTY terminal
    • enter your TACC user id after the "login as:" prompt, then Enter
    • enter the password associated with your TACC account
    • provide your 2-factor authentication code

The bash shell

You're now at a command line! It looks as if you're running directly on the remote computer, but really there are two programs communicating:

  1. your local Terminal
  2. the remote shell

There are many shell programs available in Linux, but the default is bash (Bourne-again shell).

The Terminal is pretty "dumb" – just sending what you type over its secure sockets layer (SSL) connection to TACC, then displaying the text sent back by the shell. The real work is being done on the remote computer, by executable programs called by the bash shell (also called commands, since you call them on the command line).

image-2023-4-26_9-27-6.pngImage Added

About the command line

Read more about the command line and commands on our Linux fundamentals page:

Setting up your environment

Setup your login profile (~/.bashrc)

Now execute the lines below to set up a login script, called ~/.bashrc. [ Note the tilde ( ~ ) is shorthand for "my Home directory". See Linux fundamentals: pathname syntax ]

When you login via an interactive shell, a well-known script is executed to establish your favorite environment settings. The well-known filename is ~/.bashrc (or ~/.profile on some systems), which is specific to the bash shell.

We've pre-created a common login script for you that will help you know where you are in the file system and make it easier to access some of our shared resources. To set it up, perform the steps below:

Tip

You can copy and paste these lines from the code block below into your Terminal window. Just make sure you hit Enter after the last line.


Warning

If you already have a .bashrc set up, make a backup copy first.

Code Block
languagebash
cd
ls -la 
# Do you see a .bashrc file? If so, save it off
cp .bashrc .bashrc.beforeNGS

You can restore your original login script after this class is over.

If your Terminal has a dark background (e.g. black), copy this file:

Code Block
languagebash
titleCopy a pre-configured login script for dark background Terminals
cp /corral-repl/utexas/BioITeam/core_ngs_tools/login/bashrc.corengs.ls6.dark_bg  ~/.bashrc
chmod 600 ~/.bashrc

If your Terminal has a light background (e.g. white), copy this file:

Code Block
languagebash
titleCopy a pre-configured login script for light background Terminals
cp /corral-repl/utexas/BioITeam/core_ngs_tools/login/bashrc.corengs.ls6.light_bg  ~/.bashrc
chmod 600 ~/.bashrc

So why don't you see the .bashrc file you just copied when you do ls? Because all files starting with a period (dot files) are hidden by default. To see them add the -l (long listing) and -a (all) options to ls:

Code Block
languagebash
# show a long listing of all files in the current directory, including "dot files" that start with a period
ls -la  

(Read more about File attributes)

Expand
titleWhat is chmod doing?

What's going on with chmod?

The chmod 600 ~/.bashrc command marks the file as readable and writable only by you.
The .bashrc script file will not be executed unless it has these exact permissions settings.

Since your ~/.bashrc is executed when you login, to ensure it is set up properly you should first log off Lonestar6 like this:

Code Block
languagebash
titleLog off Lonestar6
exit

Your Terminal  has logged off of Lonestar6 and is back on your local computer.

Now log back in to ls6.tacc.utexas.edu. This time your ~/.bashrc will be executed to establish your environment:

Tip
titlell alias

Your new ~/.bashrc file defines a ll alias command, so when you type ll it is short for ls -la.

You should see a new command line prompt:

Code Block
ls6:~$

The great thing about this prompt is that it always tells you where you are, which avoids you having to execute the pwd (present working directory) command every time you want to know what the current directory is. Execute these commands to see how the prompt reflects your current directory.

Code Block
languagebash
# mkdir -p says to create all parent directories in the specified path
mkdir -p ~/tmp/a/b/c
cd ~/tmp/a/b/c

# Your prompt should look like this:
ls6:~/tmp/a/b/c$ 

The prompt now tells you you are in the c sub-directory of the b sub-directory of the a sub-directory of the tmp sub-directory of your Home directory ( ~ ).

Your login script has configured this command prompt behavior, along with a number of other things.

Create some symbolic links and directories

Create some symbolic links that will come in handy later:

Code Block
languagebash
titleCreate symbolic directory links
cd  # makes your Home directory the "current directory"
ln -s -f $SCRATCH scratch
ln -s -f $WORK work
ln -sf /work/projects/BioITeam/projects/courses/Core_NGS_Tools CoreNGS

ls # you'll see the 3 symbolic links you just created

Symbolic links (a.k.a. symlinks) are "pointers" to files or directories elsewhere in the file system hierarchy. You can almost always treat a symlink as if it is the actual file or directory.

Tip

$WORK and $SCRATCH are TACC environment variables that refer to your Work and Scratch file system areas – more on these file system areas soon. (Read more about Environment variables)


Expand
titleWhat is "ln -s" doing?

The ln -s command creates a symbolic link, a shortcut to the linked file or directory.

  • Here the link targets are your Work and Scratch file system areas
  • Having these link shortcuts will help when you want to copy files to your Work or Scratch, and when you navigate the TACC file system using a remote SFTP client
  • Always change directory (cd) to the directory where we want the links created before executing ln -s
    • Here we want the links under your home directory (cd with no arguments)

Want to know where a link points to? Use ls with the -l (long listing) option.

Code Block
languagebash
titlels -l shows where links go
ls -l


Set up a ~/local/bin directory and link a script there that we will use in the class.

Code Block
languagebash
titleSet up ~/local/bin directory
mkdir -p ~/local/bin
cd ~/local/bin
ln -s -f /work/projects/BioITeam/common/bin/launcher_creator.py

Since our ~/.bashrc login script added ~/local/bin to our $PATH, we can call any script or command in that directory with just its file name. And Tab completion works on program names too:

Code Block
languagebash
cd

# hit Tab once after typing "laun"
# This will expand to launcher_creator.py

Details about your login script

Let's take a look at the contents of your ~/.bashrc login script, using the cat (concatenate files) command. cat simply reads a file and writes each line of content to standard output (here, your Terminal):

Code Block
languagebash
titleDisplay .bashrc file contents
cd  
cat .bashrc


Tip
titleDon't use cat for large files

The cat command just displays the entire file's content, line by line, without pausing, so should not be used to display large files. Instead, use a pager like more or less. For example:

more ~/.bashrc

This will display one "page" (Terminal screen) of text at a time, then pause. Press space to advance to the next page, or Ctrl-c to exit more.

You'll see the following (you may need to scroll up a bit to see the beginning):

Code Block
languagebash
titleContents of your .bashrc file
#!/bin/bash
# TACC startup script: ~/.bashrc version 2.1 -- 12/17/2013
#   This file is NOT automatically sourced for login shells.
# Your ~/.profile can and should "source" this file.
# Note neither ~/.profile nor ~/.bashrc are sourced automatically
# by bash scripts.
#   In a parallel mpi job, this file (~/.bashrc) is sourced on every
# node so it is important that actions here not tax the file system.
# Each nodes' environment during an MPI job has ENVIRONMENT set to
# "BATCH" and the prompt variable PS1 empty.
#################################################################
# Optional Startup Script tracking. Normally DBG_ECHO does nothing
if [ -n "$SHELL_STARTUP_DEBUG" ]; then DBG_ECHO "${DBG_INDENT}~/.bashrc{"; fi
##########
# SECTION 1 -- modules
if [ -z "$__BASHRC_SOURCED__" -a "$ENVIRONMENT" != BATCH ]; then
  export __BASHRC_SOURCED__=1
  module load launcher
fi
############
# SECTION 2 -- environment variables
if [ -z "$__PERSONAL_PATH__" ]; then
  export __PERSONAL_PATH__=1
  export PATH=.:$HOME/local/bin:$PATH
fi
# For better colors using a dark background terminal, un-comment this line:
#export LS_COLORS=$LS_COLORS:'di=1;33:fi=01:ln=01;36:'
# For better colors using a white background terminal, un-comment this line:
#export LS_COLORS=$LS_COLORS:'di=1;34:fi=01:ln=01;36:'
export LANG="C"  # avoid the annoying Perl locale warnings 
export BIWORK=/work/projects/BioITeam
export CORENGS=$BIWORK/projects/courses/Core_NGS_Tools
export BI=/corral-repl/utexas/BioITeam
export ALLOCATION=OTH21164        # For ls6        Group is G-824651
##export ALLOCATION=UT-2015-05-18 # For stampede2  Group is G-816696

##########
# SECTION 3 -- controlling the prompt
if [ -n "$PS1" ]; then PS1='ls6:\w$ '; fi
##########
# SECTION 4 -- Umask and aliases
#alias ls="ls --color=always"
alias ll="ls -la"
alias lah="ls -lah"
alias lc="wc -l"
alias hexdump='od -A x -t x1z -v'
umask 002
##########
# Optional Startup Script tracking
if [ -n "$SHELL_STARTUP_DEBUG" ]; then DBG_ECHO "${DBG_INDENT}}"; fi

There's a lot of stuff here; let's look at just a few things.

Environment variables

The login script sets several environment variables.

Code Block
languagebash
titleSetting environment variables to useful locations
export BIWORK=/work/projects/BioITeam
export CORENGS=$BIWORK/projects/courses/Core_NGS_Tools

Environment variables are like variables in other programming languages like python or perl (in fact bash is a complete programming language). 

They have a name (like BIWORK above) and a value (the value of $BIWORK is the pathname of the shared /work/projects/BioITeam directory).

To see the value of an environment variable, use the echo command, then the variable name after a dollar sign ( $ ):

Code Block
languagebash
echo $CORENGS

We'll use the $CORENGS environment variable to avoid typing out a long pathname:

Code Block
languagebash
ls $CORENGS

(Read more about Environment variables)

Shell completion with Tab

You can use these environment variables to shorten typing, for example, to look at the contents of the shared /work/projects/BioITeam directory as shown below, using the magic Tab key to perform shell completion.

Tip
titleImportant Tip -- the Tab key is your BFF!

The Tab key is one of your best friends in Linux. Hitting it invokes shell completion, which is as close to magic as it gets!

  • Tab once will expand the current command line contents as far as it can unambiguously.
    • if nothing shows up, there is no unambiguous match
  • Tab twice will give you a list of everything the shell finds matching the current command line.
    • you then decide where to go next

Follow along with this:

Code Block
languagebash
titleShell completion exercise
# hit Tab once to expand the environment variable name
ls $BIW 

# hit Tab again to expand the environment variable
ls $BIWORK/

# now hit Tab twice to see the contents of the directory
ls /work/projects/BioITeam/

# type "pr" and hit Tab again
ls /work/projects/BioITeam/pr

# type "co" and hit Tab again
ls /work/projects/BioITeam/projects/co

# type "Co" and hit Tab again
ls /work/projects/BioITeam/projects/courses/Co

# your command line should now look like this
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/

# now type "mi" and one Tab
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/mi
 
# your command line should now look like this
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/

# now hit Tab once
# There is no unambiguous match, so hit Tab again
# After hitting Tab twice you should see several filenames:
# fastqc/ small.bam  small.fq   small2.fq

# now type "sm" and one Tab
# your command line should now look like this
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/small
 
# type a period (".") then hit Tab twice again
# You're narrowing down the choices -- you should see two filenames
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/small
# small.bam  small.fq

# finally, type "f" then hit Tab again. It should complete to this:
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/small.fq

Extending the $PATH

When you type a command name the shell has to have some way of finding what program to run. The list of places (directories) where the shell looks is stored in the $PATH environment variable. You can see the entire list of locations by doing this:

Code Block
languagebash
titleSee where the bash shell looks for programs
echo $PATH

As you can see, there are a lot of locations on the $PATH.

Here's how the common login script adds the ~/local/bin directory you created above, to the location list, along with a special dot character ( . ) that means "here", or "whatever the current directory is". In the statement below, colon ( : ) separates directories in the list. (Read more about pathname syntax)

Code Block
languagebash
titleAdding directories to PATH
export PATH=.:$HOME/local/bin:$PATH

Setting up the friendly command prompt

The complicated looking if statement in SECTION 3 of your .bashrc sets up a friendly shell prompt that shows the current working directory. This is done by setting the special PS1 environment variable and including a special \w directive that the shell knows means "current directory".

Code Block
languagebash
titleSetting up the friendly shell prompt for stampede
##########
# SECTION 3 -- controlling the prompt
if [ -n "$PS1" ]; then PS1='ls6:\w$ '; fi
  • Cheat 2: Hit the tab key twice - it's almost always magic. This instructs the shell to try to guess what you're doing and finish the typing for you. On most modern linux shells, it works for commands (like "ls" or "scp") and for completing file or directory names.

...

This is really useful if you can't remember whether fasta2fastq.sh is fastaToFastq or fastaToFastq.sh or Fasta2fastq or Fasta2Fastq.sh or something else. It's also helpful for reconstructing directory paths or filenames on-the-fly.

You might find write out a long command with a ton of options in the terminal and then find out that you misspelled something at the very beginning of the line. It can be really annoying to hold down the arrow key to get back to that point.

  • Cheat 3: You can use control-a (holding down the "control" key and "a") to jump the cursor right to the beginning of the line. The omega to that alpha is control-e, which jumps the cursor to the end of the line. Arrow keys work, and control-arrowl will skip by word forward and backward.

Unfortunately, you are pretty much out of luck if you want to jump to the middle of the line. In this case you might want to copy the whole command into a nice text editor on your desktop, change it, and copy it back.

Advanced topic: command line editors.

Exercise:

Type "modu" then hit tab twice - it presents two choices, module and modutil. Type the next character l, hit tab twice and it will complete the rest of the typing. If you hit tab twice again, the OS will show you all the files in your current working directory which doesn't make any sense for the command "module" - it's smart, but not smart enough to figure out that the next word in the command needs to be one of module's built-in commands.

Inline help

Man pages - linux has had built-in help files since the mid-1500's, way before Macs or PCs thought of such things. In linux they're called man pages - short for "manual"; it's not a gender thing (I assume). man intro will give you an introduction to all user commands.

Exercise:

Try "man grep", or "man du", or "man sort" - you'll want these sometime.

Tip: Type the letter q to quit man, j and k/<CR> to move up and down by line, b or spacebar up/down by page. Want to search? Just hit the slash key /, enter the search word and hit enter. These are actually the tools of the less command which man is using.

Basic linux commands you need to know like breathing air

  • ls - list the contents of the current directory
  • pwd - print the present working directory - which restaurant am I at right now - the format is something like /home/myID - just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".
  • cd <whereto> - change the present working directory to <whereto> - pick up my drive-thru window (shell) and move it so that I'm now looking thru to the directory <whereto>
    • Some special <wheretos>: .. (period, period) means "up one level". ~ (tilde) means "my home directory". ~myfriend (tilde "myfriend) means "myfriend's home directory".
  • df shows you the top level of the directory structure of the system you're working on, along with how much disk space is available
  • head <file> and tail <file> shows you the top or bottom 10 lines of a file <file>
  • more <file> and less <file> both display the contents of <file> in nice ways. Read the bit above about man to figure out how to navigate and search when using less
  • file <file> tells you what kind of file <file> is.
  • cat <file> outputs all the contents of <file> - CAUTION - only use on small files.
  • rm <file> deletes a file. This is permanent - not a "trash can" deletion.
  • cp <source> <destination> copies the file source to the location and/or file name destination}. Using . (period) means "here, with the same name". * cp -r <dirname> <destination> will recursively copy the directory dirname and all its contents to the directory destination.
  • scp <user>@<host>:<source> <destination> works just like cp but copies source from the user user's directory on remote machine host to the local file destination
  • mkdir <dirname> and rmdir <dirname> make and remove the directory "dirname". This only removes empty directories - "rm -r <dirname>" will remove everything.
  • wget <url> fetches a file with a valid URL. It's not that common but we'll use wget to pull data from one of TACC's web-based storage devices.
Wildcards and special file names.

The shell has shorthand to refer to groups of files by allowing wildcards in file names. * (asterisk) is the most common; it is a wildcard meaning "any length of any characters". Other useful ones are [] to allow for any character in the set <characters>> and {{[] for a range of characters.

For example: ls *.bam lists all files in the current directory that end in .bam; ls [A-Za-z]*.bam does the same, but only if the first character of the file is a capital letter.

Three special file names:

  1. . (single period) means "this directory".
  2. .. (two periods) means "directory above current." So ls -l .. means "list contents of the parent directory."
  3. ~ (tilde) means "my home directory".
Exercises:

Scavenger hunt practice; on Lonestar issue the following commands:

Code Block
titlePlay a scavenger hunt for more practice
cp -r /corral-repl/utexas/BioITeam/linuxpractice .
cd linuxpractice
cd what
cat readme

and follow the instructions. Hints: use <tab><tab> to fill in filenames as much as you can.

...

Use variables to store where you are, move away, and then back. Try this and see if you can figure out what the shell is doing for you:

Code Block
titlePractice some linux basics
pwd
here=`pwd`
cd /scratch/01057
pwd
cd $here
pwd

...

Learn about these few advanced tricks (by trying, man pages, Google...)

Code Block
titleAdvanced tricks
pushd / popd
cd -
which <command>

If you've done all those too, you might consider looking over some advanced command-line tool usage

Options: the lifeblood of linux commands

Sitting at the computer, you should have some idea what you need to do. There's probably a command to do it. If you have some idea what it starts with, you can type a few characters and hit tab twice to get some help. If you have no idea, you google it or ask someone else. But soon you want those commands to do a bit more - like seeing the sizes of files in addition to their names.

Most commands in linux use a common syntax to ask more of a command; they usually add a dash "-" followed by a code letter that means "do the basic command, but with a bit more..."

Code Block
titleUseful options for ls
ls -l
ls -lh
ls -t

These little toggle-like things are often called "command line switches"; there can be other options, like filenames, that aren't switches.

Almost all commands, and especially NGS tools, use options heavily.

Like dialects in a language, there are at least three basic schemes commands/programs accept options in:

  1. One letter options which can sometimes be combined, or other single options like:

    Code Block
    titleExamples of different option types
    head -10
    ls -lhtS (equivalent to ls -l -h -t -S)
    
  2. Word options, like -d64 and -Xms512m in this command, that are never combined (this is the GATK command to call SNPs):

    Code Block
    titleExamples of word options
    java -d64 -Xms512m -Xmx4g -jar /work/01866/phr254/gshare/Tools_And_Programs/bin/GenomeAnalysisTK.jar -glm BOTH -R $reference -T UnifiedGenotyper -I
    $outprefix.realigned.recal.bam --dbsnp $dbsnp -o $outprefix.snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000
    -A DepthOfCoverage -A AlleleBalance
    
  3. "Long option" forms, using the convention that a single dash - precedes single-letter options, and double dashes- - precede word options, like this command to run the mira assembler:

    Code Block
    titleExample of long options
    mira --project=ct --job=denovo,genome,accurate,454 -SK:not=8
    

man pages should detail all options available for a command. Unless there's no man page.

More help please

Sometimes man lets you down - no man page. Don't fret, try one of these:

  1. Just type in the command and hit return - it will usually try to help you.
  2. Type the command followed by one of: -h -help --help -? and may give you some help.
    Sometimes the command by itself will give you short help, and will list the magic option for full help.

...

First do:

Code Block
module load blast

Now figure out how to run some kind of blast program on lonestar with options. Hints: try <tab><tab>, man, running some blast command, use options to figure out other options.

I've put nr, nt, and refseq_rna blast databases on Lonestar here:
/corral-repl/utexas/BioITeam/blastdb/
along with a test sequence: the human JAG1 gene, here:
/corral-repl/utexas/BioITeam/sphsmith/jag1.fa

Expand
titleHint/solution

blastn -query jag1.fa -db /corral-repl/utexas/BioITeam/blastdb/hs37d5.fa -evalue 1e-100 -outfmt 6
But of course you wouldn't run this on the head node - you'd instead enter it into a file called "commands" using a text editor, or do exactly this:

Code Block
TitleCreate launcher script for Blastn
echo "blastn -query $BI/sphsmith/jag1.fa -db \
   $BI/blastdb/hs37d5.fa -evalue 1e-100 -outfmt 6" > commands
/corral-repl/utexas/BioITeam/bin/launcher_creator.py -l blast.sge -n blast_jag1 -t 00:30:00 -j commands
qsub blast.sge

...