ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer.
ssh programs exist for all major operating systems - Windows, Mac, and linux. Mac and linux come with these commands built-in; Windows needs some help. If you're using a Windows box and are part of UT Austin, Bevoware provides two free ssh programs, "ssh secure client" and "putty". We won't describe how to use these here - you're on your own to get this started. |
Let's try it now by entering this:
ssh <your user ID>@lonestar.tacc.utexas.edu |
When you log in to a linux computer, the operating system checks your login credentials and if they're OK it sets up some configuration for you and then runs a program called a "shell" which acts like your fast-food drive-thru window to the rest of the operating system. You type commands and hit "enter" to send something into the drive-thru window, and then the OS passes output back through the drive-thru window.
Every time you exchange stuff through this window, it's within a context, like one specific drive-thru window at one restaurant. The directory within the file system is one part of that context; the programs and environment variables available to you are other parts of that context. When you log in, the system and shell agree that you'll start off in your home directory on the system.
Create a new window that will allow you to have a second login to Lonestar, and again use:
ssh <your user ID>@lonestar.tacc.utexas.edu |
to connect to Lonestar.
Once you're logged in, issue this command (we'll explain later):
idev -m 200 -q serial |
If you have any errors, try to copy and paste the command from the wiki into the shell.
Please switch back to your first login shell once you've entered the idev
command in the second shell.
Type as little and as accurately as possible by cheating:
This taps into a feature of the shell - your history. The command |
This is really useful if you can't remember whether You might find write out a long command with a ton of options in the terminal and then find out that you misspelled something at the very beginning of the line. It can be really annoying to hold down the arrow key to get back to that point. |
control-a
(holding down the "control" key and "a") to jump the cursor right to the beginning of the line. The omega to that alpha is control-e
, which jumps the cursor to the end of the line. Arrow keys work, and control-arrowl
will skip by word forward and backward.Unfortunately, you are pretty much out of luck if you want to jump to the middle of the line. In this case you might want to copy the whole command into a nice text editor on your desktop, change it, and copy it back.
Advanced topic: command line editors.
Type "modu" then hit tab twice - it presents two choices, module
and modutil
. Type the next character l
, hit tab twice and it will complete the rest of the typing. If you hit tab twice again, the OS will show you all the files in your current working directory which doesn't make any sense for the command "module" - it's smart, but not smart enough to figure out that the next word in the command needs to be one of module
's built-in commands.
Man pages - linux has had built-in help files since the mid-1500's, way before Macs or PCs thought of such things. In linux they're called man
pages - short for "manual"; it's not a gender thing (I assume). man intro
will give you an introduction to all user commands.
Try "man grep
", or "man du
", or "man sort
" - you'll want these sometime.
Tip: Type the letter q to quit man, j and k/<CR> to move up and down by line, b or spacebar up/down by page. Want to search? Just hit the slash key /
, enter the search word and hit enter. These are actually the tools of the less
command which man
is using.
ls
- list the contents of the current directorypwd
- print the present working directory - which restaurant am I at right now - the format is something like /home/myID
- just like on most computer systems, this represents leaves on the tree of the file system structure, also called a "path".cd <whereto>
- change the present working directory to <whereto>
- pick up my drive-thru window (shell) and move it so that I'm now looking thru to the directory <whereto>
<wheretos>
: ..
(period, period) means "up one level". ~ (tilde) means "my home directory". ~myfriend
(tilde "myfriend) means "myfriend's home directory".df
shows you the top level of the directory structure of the system you're working on, along with how much disk space is availablehead <file>
and tail <file>
shows you the top or bottom 10 lines of a file <file>
more <file>
and less <file>
both display the contents of <file>
in nice ways. Read the bit above about man
to figure out how to navigate and search when using less
file <file>
tells you what kind of file <file>
is.cat <file>
outputs all the contents of <file>
- CAUTION - only use on small files.rm <file>
deletes a file. This is permanent - not a "trash can" deletion.cp <source> <destination>
copies the file source
to the location and/or file name destination
}. Using .
(period) means "here, with the same name". * cp -r <dirname> <destination>
will recursively copy the directory dirname
and all its contents to the directory destination
.scp <user>@<host>:<source> <destination>
works just like cp but copies source
from the user user
's directory on remote machine host
to the local file destination
mkdir <dirname>
and rmdir <dirname>
make and remove the directory "dirname". This only removes empty directories - "rm -r <dirname>" will remove everything.wget <url>
fetches a file with a valid URL. It's not that common but we'll use wget
to pull data from one of TACC's web-based storage devices.The shell has shorthand to refer to groups of files by allowing wildcards in file names. *
(asterisk) is the most common; it is a wildcard meaning "any length of any characters". Other useful ones are []
to allow for any character in the set <characters>> and {{
[]
for a range of characters.
For example: ls
*.bam
lists all files in the current directory that end in .bam
; ls
[A-Za-z]*.bam
does the same, but only if the first character of the file is a capital letter.
Three special file names:
.
(single period) means "this directory"...
(two periods) means "directory above current." So ls -l ..
means "list contents of the parent directory."Scavenger hunt practice; on Lonestar issue the following commands:
cp -r /corral-repl/utexas/BioITeam/linuxpractice . cd linuxpractice cd what cat readme |
and follow the instructions. Hints: use <tab><tab>
to fill in filenames as much as you can.
Use variables to store where you are, move away, and then back. Try this and see if you can figure out what the shell is doing for you:
|
Learn about these few advanced tricks (by trying,
If you've done all those too, you might consider looking over some advanced command-line tool usage |
Sitting at the computer, you should have some idea what you need to do. There's probably a command to do it. If you have some idea what it starts with, you can type a few characters and hit tab
twice to get some help. If you have no idea, you google it or ask someone else. But soon you want those commands to do a bit more - like seeing the sizes of files in addition to their names.
Most commands in linux use a common syntax to ask more of a command; they usually add a dash "-" followed by a code letter that means "do the basic command, but with a bit more..."
ls -l ls -lh ls -t |
These little toggle-like things are often called "command line switches"; there can be other options, like filenames, that aren't switches.
Almost all commands, and especially NGS tools, use options heavily.
Like dialects in a language, there are at least three basic schemes commands/programs accept options in:
head -10 ls -lhtS (equivalent to ls -l -h -t -S) |
-d64
and -Xms512m
in this command, that are never combined (this is the GATK command to call SNPs):
java -d64 -Xms512m -Xmx4g -jar /work/01866/phr254/gshare/Tools_And_Programs/bin/GenomeAnalysisTK.jar -glm BOTH -R $reference -T UnifiedGenotyper -I $outprefix.realigned.recal.bam --dbsnp $dbsnp -o $outprefix.snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance |
-
precede word options, like this command to run the mira assembler:
mira --project=ct --job=denovo,genome,accurate,454 -SK:not=8 |
man
pages should detail all options available for a command. Unless there's no man
page.
Sometimes man
lets you down - no man page. Don't fret, try one of these:
-h
-help
--help
-?
and may give you some help.First do:
module load blast |
Now figure out how to run some kind of blast program on lonestar with options. Hints: try <tab><tab>
, man
, running some blast command, use options to figure out other options.
I've put nr, nt, and refseq_rna blast databases on Lonestar here:
/corral-repl/utexas/BioITeam/blastdb/
along with a test sequence: the human JAG1 gene, here:
/corral-repl/utexas/BioITeam/sphsmith/jag1.fa
|
Now let's go on to establishing a useful login profile on Lonestar.