Table of Contents |
---|
Getting to a remote computer
The
...
Terminal window
- Macs and Linux have a Terminal programs program built-in – find it now on your computer
- Windows needs help
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download putty.exe (terminal) and pscp.exe (secure copy client)
- Git-bash – http://msysgit.github.io/
- terminal plus minimal Linux environment
- Cygwin – http://www.cygwin.com/
- a full Linux environment, including X-windows for running GUI programs remotely
- complicated to install
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
SSH
ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer.
On Macs, Linux and Windows Git-bash or Cygwin, you run it from a Terminal window. Answer yes to the SSH security question prompt.
Code Block | ||
---|---|---|
| ||
ssh your_TACC_userID@stampede.tacc.utexas.edu
|
If you're using Putty as your Terminal from Windows:
- Double-cliek the Putty.exe icon
- In the PuTTY Configuration window
- make sure the Connection type is
SSH
- enter
stampede.tacc.utexas.edu
for Host Name - click Open button
- answer Yes to the SSH security question
- make sure the Connection type is
- In the PuTTY terminal
- enter your TACC user id after the login as: prompt, then Enter
The bash shell
When you log in to a Linux computer, the operating system checks your login credentials and if they're OK it sets up some configuration for you and then runs a program called a "shell" which acts like your fast-food drive-thru window to the rest of the operating system. You type commands and hit "enter" to send something into the drive-thru window, and then the OS passes output back through the drive-thru window.
Every time you exchange stuff through this window, it's within a context, like one specific drive-thru window at one restaurant. The directory within the file system is one part of that context; the programs and environment variables available to you are other parts of that context. When you log in, the system and shell agree that you'll start off in your home directory on the system.
Essential command-line tricks to look like an expert quickly, or figure out what's going on.
Type as little and as accurately as possible by cheating:
- Cheat 1: Use "up arrow" to retrieve any of the last 500 commands you've typed. You can then edit them and hit enter (even in the middle of the command) and the shell will use that command.
...
This taps into a feature of the shell - your history. The command history
will print to the screen the last 500 commands you've typed. You can modify this number if you'd like. VERY USEFUL TIP - every so often do history >> what
which will write your history to a file called "what". I leave these lying around in directories so I can remember what I was doing, how I generated output data, etc. These can often become the basis for a shell script (we'll get to those). Advanced topic: use history to be super-fast at the command line.
- 10 or later has ssh and scp in Command Prompt or PowerShell (may require latest Windows updates)
- Open the Start menu → Search for Command
- Open the Start menu → Search for Command
Expand | ||
---|---|---|
| ||
If your Windows version does not have ssh in Command Prompt or PowerShell:
More advanced options for those who want a full Linux environment on their Windows system:
|
From now on, when we refer to "Terminal", it is either the Mac/Linux Terminal program, Windows Command Prompt or PowerShell, or the PuTTY program.
SSH
ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer. We're going to use ssh to access the Lonestar6 compute cluster at TACC (Texas Advanced Computing Center), where the remote host name is ls6.tacc.utexas.edu.
In your local Terminal window:
Code Block | ||||
---|---|---|---|---|
| ||||
ssh <your_TACC_userID>@ls6.tacc.utexas.edu
# For example:
ssh abattenh@ls6.tacc.utexas.edu
|
- Answer yes to the SSH security question prompt
- this will only be asked the 1st time you access ls6
- Enter the password associated with your TACC account
- for security reasons, your password characters will not be echoed to the screen
- Get your 2-factor authentication code from your phone's TACC Token app, and type it in
Expand | ||
---|---|---|
| ||
If you're using PuTTY as your Terminal from Windows:
|
The bash shell
You're now at a command line! It looks as if you're running directly on the remote computer, but really there are two programs communicating:
- your local Terminal
- the remote shell
There are many shell programs available in Linux, but the default is bash (Bourne-again shell).
The Terminal is pretty "dumb" – just sending what you type over its secure sockets layer (SSL) connection to TACC, then displaying the text sent back by the shell. The real work is being done on the remote computer, by executable programs called by the bash shell (also called commands, since you call them on the command line).
About the command line
Read more about the command line and commands on our Linux fundamentals page:
- The bash shell REPL and commands
- Getting help
- Literal characters and metacharacters
- About command line input
Setting up your environment
Setup your login profile (~/.bashrc)
Now execute the lines below to set up a login script, called ~/.bashrc. [ Note the tilde ( ~ ) is shorthand for "my Home directory". See Linux fundamentals: pathname syntax ]
When you login via an interactive shell, a well-known script is executed to establish your favorite environment settings. The well-known filename is ~/.bashrc (or ~/.profile on some systems), which is specific to the bash shell.
We've pre-created a common login script for you that will help you know where you are in the file system and make it easier to access some of our shared resources. To set it up, perform the steps below:
Tip |
---|
You can copy and paste these lines from the code block below into your Terminal window. Just make sure you hit Enter after the last line. |
Warning | |||||
---|---|---|---|---|---|
If you already have a .bashrc set up, make a backup copy first.
You can restore your original login script after this class is over. |
If your Terminal has a dark background (e.g. black), copy this file:
Code Block | ||||
---|---|---|---|---|
| ||||
cp /corral-repl/utexas/BioITeam/core_ngs_tools/login/bashrc.corengs.ls6.dark_bg ~/.bashrc
chmod 600 ~/.bashrc |
If your Terminal has a light background (e.g. white), copy this file:
Code Block | ||||
---|---|---|---|---|
| ||||
cp /corral-repl/utexas/BioITeam/core_ngs_tools/login/bashrc.corengs.ls6.light_bg ~/.bashrc
chmod 600 ~/.bashrc |
So why don't you see the .bashrc file you just copied when you do ls? Because all files starting with a period (dot files) are hidden by default. To see them add the -l (long listing) and -a (all) options to ls:
Code Block | ||
---|---|---|
| ||
# show a long listing of all files in the current directory, including "dot files" that start with a period
ls -la |
(Read more about File attributes)
Expand | ||
---|---|---|
| ||
What's going on with chmod? The chmod 600 ~/.bashrc command marks the file as readable and writable only by you. |
Since your ~/.bashrc is executed when you login, to ensure it is set up properly you should first log off Lonestar6 like this:
Code Block | ||||
---|---|---|---|---|
| ||||
exit |
Your Terminal has logged off of Lonestar6 and is back on your local computer.
Now log back in to ls6.tacc.utexas.edu. This time your ~/.bashrc will be executed to establish your environment:
Tip | ||
---|---|---|
| ||
Your new ~/.bashrc file defines a ll alias command, so when you type ll it is short for ls -la. |
You should see a new command line prompt:
Code Block |
---|
ls6:~$ |
The great thing about this prompt is that it always tells you where you are, which avoids you having to execute the pwd (present working directory) command every time you want to know what the current directory is. Execute these commands to see how the prompt reflects your current directory.
Code Block | ||
---|---|---|
| ||
# mkdir -p says to create all parent directories in the specified path
mkdir -p ~/tmp/a/b/c
cd ~/tmp/a/b/c
# Your prompt should look like this:
ls6:~/tmp/a/b/c$ |
The prompt now tells you you are in the c sub-directory of the b sub-directory of the a sub-directory of the tmp sub-directory of your Home directory ( ~ ).
Your login script has configured this command prompt behavior, along with a number of other things.
Create some symbolic links and directories
Create some symbolic links that will come in handy later:
Code Block | ||||
---|---|---|---|---|
| ||||
cd # makes your Home directory the "current directory"
ln -s -f $SCRATCH scratch
ln -s -f $WORK work
ln -sf /work/projects/BioITeam/projects/courses/Core_NGS_Tools CoreNGS
ls # you'll see the 3 symbolic links you just created
|
Symbolic links (a.k.a. symlinks) are "pointers" to files or directories elsewhere in the file system hierarchy. You can almost always treat a symlink as if it is the actual file or directory.
Tip |
---|
$WORK and $SCRATCH are TACC environment variables that refer to your Work and Scratch file system areas – more on these file system areas soon. (Read more about Environment variables) |
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
The ln -s command creates a symbolic link, a shortcut to the linked file or directory.
Want to know where a link points to? Use ls with the -l (long listing) option.
|
Set up a ~/local/bin directory and link a script there that we will use in the class.
Code Block | ||||
---|---|---|---|---|
| ||||
mkdir -p ~/local/bin
cd ~/local/bin
ln -s -f /work/projects/BioITeam/common/bin/launcher_creator.py
|
Since our ~/.bashrc login script added ~/local/bin to our $PATH, we can call any script or command in that directory with just its file name. And Tab completion works on program names too:
Code Block | ||
---|---|---|
| ||
cd
# hit Tab once after typing "laun"
# This will expand to launcher_creator.py
|
Details about your login script
Let's take a look at the contents of your ~/.bashrc login script, using the cat (concatenate files) command. cat simply reads a file and writes each line of content to standard output (here, your Terminal):
Code Block | ||||
---|---|---|---|---|
| ||||
cd
cat .bashrc
|
Tip | ||
---|---|---|
| ||
The cat command just displays the entire file's content, line by line, without pausing, so should not be used to display large files. Instead, use a pager like more or less. For example: more ~/.bashrc This will display one "page" (Terminal screen) of text at a time, then pause. Press space to advance to the next page, or Ctrl-c to exit more. |
You'll see the following (you may need to scroll up a bit to see the beginning):
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash
# TACC startup script: ~/.bashrc version 2.1 -- 12/17/2013
# This file is NOT automatically sourced for login shells.
# Your ~/.profile can and should "source" this file.
# Note neither ~/.profile nor ~/.bashrc are sourced automatically
# by bash scripts.
# In a parallel mpi job, this file (~/.bashrc) is sourced on every
# node so it is important that actions here not tax the file system.
# Each nodes' environment during an MPI job has ENVIRONMENT set to
# "BATCH" and the prompt variable PS1 empty.
#################################################################
# Optional Startup Script tracking. Normally DBG_ECHO does nothing
if [ -n "$SHELL_STARTUP_DEBUG" ]; then DBG_ECHO "${DBG_INDENT}~/.bashrc{"; fi
##########
# SECTION 1 -- modules
if [ -z "$__BASHRC_SOURCED__" -a "$ENVIRONMENT" != BATCH ]; then
export __BASHRC_SOURCED__=1
module load launcher
fi
############
# SECTION 2 -- environment variables
if [ -z "$__PERSONAL_PATH__" ]; then
export __PERSONAL_PATH__=1
export PATH=.:$HOME/local/bin:$PATH
fi
# For better colors using a dark background terminal, un-comment this line:
#export LS_COLORS=$LS_COLORS:'di=1;33:fi=01:ln=01;36:'
# For better colors using a white background terminal, un-comment this line:
#export LS_COLORS=$LS_COLORS:'di=1;34:fi=01:ln=01;36:'
export LANG="C" # avoid the annoying Perl locale warnings
export BIWORK=/work/projects/BioITeam
export CORENGS=$BIWORK/projects/courses/Core_NGS_Tools
export BI=/corral-repl/utexas/BioITeam
export ALLOCATION=OTH21164 # For ls6 Group is G-824651
##export ALLOCATION=UT-2015-05-18 # For stampede2 Group is G-816696
##########
# SECTION 3 -- controlling the prompt
if [ -n "$PS1" ]; then PS1='ls6:\w$ '; fi
##########
# SECTION 4 -- Umask and aliases
#alias ls="ls --color=always"
alias ll="ls -la"
alias lah="ls -lah"
alias lc="wc -l"
alias hexdump='od -A x -t x1z -v'
umask 002
##########
# Optional Startup Script tracking
if [ -n "$SHELL_STARTUP_DEBUG" ]; then DBG_ECHO "${DBG_INDENT}}"; fi |
There's a lot of stuff here; let's look at just a few things.
Environment variables
The login script sets several environment variables.
Code Block | ||||
---|---|---|---|---|
| ||||
export BIWORK=/work/projects/BioITeam
export CORENGS=$BIWORK/projects/courses/Core_NGS_Tools
|
Environment variables are like variables in other programming languages like python or perl (in fact bash is a complete programming language).
They have a name (like BIWORK above) and a value (the value of $BIWORK is the pathname of the shared /work/projects/BioITeam directory).
To see the value of an environment variable, use the echo command, then the variable name after a dollar sign ( $ ):
Code Block | ||
---|---|---|
| ||
echo $CORENGS
|
We'll use the $CORENGS environment variable to avoid typing out a long pathname:
Code Block | ||
---|---|---|
| ||
ls $CORENGS
|
(Read more about Environment variables)
Shell completion with Tab
You can use these environment variables to shorten typing, for example, to look at the contents of the shared /work/projects/BioITeam directory as shown below, using the magic Tab key to perform shell completion.
Tip | ||
---|---|---|
| ||
The Tab key is one of your best friends in Linux. Hitting it invokes shell completion, which is as close to magic as it gets!
|
Follow along with this:
Code Block | ||||
---|---|---|---|---|
| ||||
# hit Tab once to expand the environment variable name
ls $BIW
# hit Tab again to expand the environment variable
ls $BIWORK/
# now hit Tab twice to see the contents of the directory
ls /work/projects/BioITeam/
# type "pr" and hit Tab again
ls /work/projects/BioITeam/pr
# type "co" and hit Tab again
ls /work/projects/BioITeam/projects/co
# type "Co" and hit Tab again
ls /work/projects/BioITeam/projects/courses/Co
# your command line should now look like this
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/
# now type "mi" and one Tab
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/mi
# your command line should now look like this
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/
# now hit Tab once
# There is no unambiguous match, so hit Tab again
# After hitting Tab twice you should see several filenames:
# fastqc/ small.bam small.fq small2.fq
# now type "sm" and one Tab
# your command line should now look like this
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/small
# type a period (".") then hit Tab twice again
# You're narrowing down the choices -- you should see two filenames
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/small
# small.bam small.fq
# finally, type "f" then hit Tab again. It should complete to this:
ls /work/projects/BioITeam/projects/courses/Core_NGS_Tools/misc/small.fq |
Extending the $PATH
When you type a command name the shell has to have some way of finding what program to run. The list of places (directories) where the shell looks is stored in the $PATH environment variable. You can see the entire list of locations by doing this:
Code Block | ||||
---|---|---|---|---|
| ||||
echo $PATH
|
As you can see, there are a lot of locations on the $PATH.
Here's how the common login script adds the ~/local/bin directory you created above, to the location list, along with a special dot character ( . ) that means "here", or "whatever the current directory is". In the statement below, colon ( : ) separates directories in the list. (Read more about pathname syntax)
Code Block | ||||
---|---|---|---|---|
| ||||
export PATH=.:$HOME/local/bin:$PATH
|
Setting up the friendly command prompt
The complicated looking if statement in SECTION 3 of your .bashrc sets up a friendly shell prompt that shows the current working directory. This is done by setting the special PS1 environment variable and including a special \w directive that the shell knows means "current directory".
Code Block | ||||
---|---|---|---|---|
| ||||
##########
# SECTION 3 -- controlling the prompt
if [ -n "$PS1" ]; then PS1='ls6:\w$ '; fi
|
- Cheat 2: Hit the tab key twice - it's almost always magic. This instructs the shell to try to guess what you're doing and finish the typing for you. On most modern linux shells, it works for commands (like "ls" or "scp") and for completing file or directory names.
...
This is really useful if you can't remember whether fasta2fastq.sh
is fastaToFastq
or fastaToFastq.sh
or Fasta2fastq
or Fasta2Fastq.sh
or something else. It's also helpful for reconstructing directory paths or filenames on-the-fly.
You might find write out a long command with a ton of options in the terminal and then find out that you misspelled something at the very beginning of the line. It can be really annoying to hold down the arrow key to get back to that point.
- Cheat 3: You can use
control-a
(holding down the "control" key and "a") to jump the cursor right to the beginning of the line. The omega to that alpha iscontrol-e
, which jumps the cursor to the end of the line. Arrow keys work, andcontrol-arrowl
will skip by word forward and backward.
Unfortunately, you are pretty much out of luck if you want to jump to the middle of the line. In this case you might want to copy the whole command into a nice text editor on your desktop, change it, and copy it back.
Advanced topic: command line editors.
Exercise:
Type "modu" then hit tab twice - it presents two choices, module
and modutil
. Type the next character l
, hit tab twice and it will complete the rest of the typing. If you hit tab twice again, the OS will show you all the files in your current working directory which doesn't make any sense for the command "module" - it's smart, but not smart enough to figure out that the next word in the command needs to be one of module
's built-in commands.
Inline help
Man pages - linux has had built-in help files since the mid-1500's, way before Macs or PCs thought of such things. In linux they're called man
pages - short for "manual"; it's not a gender thing (I assume). man intro
will give you an introduction to all user commands.
Exercise:
Try "man grep
", or "man du
", or "man sort
" - you'll want these sometime.
Tip: Type the letter q to quit man, j and k/<CR> to move up and down by line, b or spacebar up/down by page. Want to search? Just hit the slash key /
, enter the search word and hit enter. These are actually the tools of the less
command which man
is using.
Basic linux commands you need to know like breathing air
ls
- list the contents of the current directorypwd
- print the present working directory - which restaurant am I at right now - the format is something like/home/myID
- just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".cd <whereto>
- change the present working directory to<whereto>
- pick up my drive-thru window (shell) and move it so that I'm now looking thru to the directory<whereto>
- Some special
<wheretos>
:..
(period, period) means "up one level". ~ (tilde) means "my home directory".~myfriend
(tilde "myfriend) means "myfriend's home directory".
- Some special
df
shows you the top level of the directory structure of the system you're working on, along with how much disk space is availablehead <file>
andtail <file>
shows you the top or bottom 10 lines of a file<file>
more <file>
andless <file>
both display the contents of<file>
in nice ways. Read the bit above aboutman
to figure out how to navigate and search when usingless
file <file>
tells you what kind of file<file>
is.cat <file>
outputs all the contents of<file>
- CAUTION - only use on small files.rm <file>
deletes a file. This is permanent - not a "trash can" deletion.cp <source> <destination>
copies the filesource
to the location and/or file namedestination
}. Using.
(period) means "here, with the same name". *cp -r <dirname> <destination>
will recursively copy the directorydirname
and all its contents to the directorydestination
.scp <user>@<host>:<source> <destination>
works just like cp but copiessource
from the useruser
's directory on remote machinehost
to the local filedestination
mkdir <dirname>
andrmdir <dirname>
make and remove the directory "dirname". This only removes empty directories - "rm -r <dirname>" will remove everything.wget <url>
fetches a file with a valid URL. It's not that common but we'll usewget
to pull data from one of TACC's web-based storage devices.
Wildcards and special file names.
The shell has shorthand to refer to groups of files by allowing wildcards in file names. *
(asterisk) is the most common; it is a wildcard meaning "any length of any characters". Other useful ones are []
to allow for any character in the set <characters>> and {{
[]
for a range of characters.
For example: ls
*.bam
lists all files in the current directory that end in .bam
; ls
[A-Za-z]*.bam
does the same, but only if the first character of the file is a capital letter.
Three special file names:
.
(single period) means "this directory"...
(two periods) means "directory above current." Sols -l ..
means "list contents of the parent directory."- ~ (tilde) means "my home directory".
Exercises:
Scavenger hunt practice; on Lonestar issue the following commands:
Code Block | ||
---|---|---|
| ||
cp -r /corral-repl/utexas/BioITeam/linuxpractice .
cd linuxpractice
cd what
cat readme
|
and follow the instructions. Hints: use <tab><tab>
to fill in filenames as much as you can.
...
Use variables to store where you are, move away, and then back. Try this and see if you can figure out what the shell is doing for you:
Code Block | ||
---|---|---|
| ||
pwd
here=`pwd`
cd /scratch/01057
pwd
cd $here
pwd
|
...
Learn about these few advanced tricks (by trying, man
pages, Google...)
Code Block | ||
---|---|---|
| ||
pushd / popd
cd -
which <command>
|
If you've done all those too, you might consider looking over some advanced command-line tool usage
Options: the lifeblood of linux commands
Sitting at the computer, you should have some idea what you need to do. There's probably a command to do it. If you have some idea what it starts with, you can type a few characters and hit tab
twice to get some help. If you have no idea, you google it or ask someone else. But soon you want those commands to do a bit more - like seeing the sizes of files in addition to their names.
Most commands in linux use a common syntax to ask more of a command; they usually add a dash "-" followed by a code letter that means "do the basic command, but with a bit more..."
Code Block | ||
---|---|---|
| ||
ls -l
ls -lh
ls -t
|
These little toggle-like things are often called "command line switches"; there can be other options, like filenames, that aren't switches.
Almost all commands, and especially NGS tools, use options heavily.
Like dialects in a language, there are at least three basic schemes commands/programs accept options in:
One letter options which can sometimes be combined, or other single options like:
Code Block title Examples of different option types head -10 ls -lhtS (equivalent to ls -l -h -t -S)
Word options, like
-d64
and-Xms512m
in this command, that are never combined (this is the GATK command to call SNPs):Code Block title Examples of word options java -d64 -Xms512m -Xmx4g -jar /work/01866/phr254/gshare/Tools_And_Programs/bin/GenomeAnalysisTK.jar -glm BOTH -R $reference -T UnifiedGenotyper -I $outprefix.realigned.recal.bam --dbsnp $dbsnp -o $outprefix.snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance
"Long option" forms, using the convention that a single dash - precedes single-letter options, and double dashes-
-
precede word options, like this command to run the mira assembler:Code Block title Example of long options mira --project=ct --job=denovo,genome,accurate,454 -SK:not=8
man
pages should detail all options available for a command. Unless there's no man
page.
More help please
Sometimes man
lets you down - no man page. Don't fret, try one of these:
- Just type in the command and hit return - it will usually try to help you.
- Type the command followed by one of:
-h
-help
--help
-?
and may give you some help.
Sometimes the command by itself will give you short help, and will list the magic option for full help.
...
First do:
Code Block |
---|
module load blast
|
Now figure out how to run some kind of blast program on lonestar with options. Hints: try <tab><tab>
, man
, running some blast command, use options to figure out other options.
I've put nr, nt, and refseq_rna blast databases on Lonestar here:
/corral-repl/utexas/BioITeam/blastdb/
along with a test sequence: the human JAG1 gene, here:
/corral-repl/utexas/BioITeam/sphsmith/jag1.fa
Expand | |||||
---|---|---|---|---|---|
| |||||
|
...