This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.
Table of Contents |
---|
Terminal programs
...
- Macs and Linux have a Terminal program built-in
- Windows options:
- Windows 10+
- Command shell has Prompt program and PowerShell have ssh and scp (may require latest Windows updates)
- Start menu → Search for Command
- Windows Subsystem for Linux – Windows 10 Professional includes a Ubuntu-like bash shells
- See https://docs.microsoft.com/en-us/windows/wsl/install-win10
- We recommend the Ubuntu Linux distribution, but any Linux distribution will have an SSH client
- Command shell has Prompt program and PowerShell have ssh and scp (may require latest Windows updates)
- or
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- simple Terminal and file copy programs
- download either the Putty installer (https://the.earth.li/~sgtatham/putty/latest/w64/putty-64bit-0.70-installer.msi)
- or just putty.exe (terminal) and pscp.exe (secure copy client)
- Cygwin – http://www.cygwin.com/
- A full Linux environment, including X-windows for running GUI programs remotely
- Complicated to install
- Putty – http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- Windows 10+
...
- brackets ( [ ] ) to allow for any character in the list of characters between the brackets.
- and you can use a hyphen ( - ) to specify a range of characters (e.g. [A-G])
- braces ( { } ) enclose a list of commancomma-separated substrings strings to match (e.g. {dog,pony})
For example:
- ls *.bam – lists all files in the current directory that end in .bam
- ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
- ls [ABcd]*.bam – lists all .bam files whose 1st letter is A, B, c or d.
- ls *.{fastq,fq}.gz – lists all .fastq.gz and .fq.gz files.
...
- single quoting (e.g. 'some text') – this serves two purposes
- it It groups together all text inside the quotes into a single argument that is passed to the command.
- it It tells the shell not to "look inside" the quotes to perform any evaluations
.- Anything any environment variables in the text – or anything that looks like an environment variable – are variable or a bash meta-character is not evaluated.
- no No pathname globbing (e.g. *) is performed.
- double quoting (e.g. "some text") – also serves two purposes
- it It groups together all text inside the quotes into a single argument that is passed to the command.
- it It allows environment variable evaluation (but inhibits pathname globbing).
- backtick quoting (e.g. `date`)
- It evaluates the expression inside the backticks.
- the The resulting standard output of the expression replaces the backticked text.
Using Commands
Command options
Anchor | ||||
---|---|---|---|---|
|
...
Single-letter short options, which start with a single dash ( - ) and can often be combined, like:
Code Block language bash title Examples of different short options head -20 # show 1st 20 lines ls -lhtS (equivalent to ls -l -h -t -S)
Long options use the convention that double dashes ( -- ) precede the multi-character option name, and they can never be combined. Strictly speaking, long options should be separated from their values by the equals sign ( = ) according to the POSIX standard (see https://en.wikipedia.org/wiki/POSIX). But most programs let you use a space as separator also. Here's an example using the mira genome assembler:
Code Block language bash title Example of long options mira --project=ct --job=denovo,genome,accurate,454 -SK:not=8
Word options, illustrated in the GATK command line to call SNPs below.
Word options combine aspects of short and long options – they usually start with a single dash ( - ), but can be multiple letters and are never combined.
Sometimes the option (e.g. java's -Xms initial memory heap size option), and its value (512m which means 512 megabytes) may be smashed together.
Other times a multi-letter switch and its value are separated by a space (e.g. -glm BOTH).
...
Notice that bwa, like many NGS programs, is written as a set of sub-commands. This top-level help displays the sub-commands available. You then type bwa <command> to see help for the sub-command:
...
- ls - list the contents of the specified directory
- -l says produce a long listing (including file permissions, sizes, owner and group)
- -a says show all files, even normally-hidden dot files whose names start with a period ( . )
- -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)
- cd <whereto> - change the current working directory to <whereto>. Some special <wheretos>:
- .. (period, period) means "up one level"
- ~ (tilde) means "my home directory"
- file <file> tells you what kind of file <file> is
- df shows you the top level directory structure of the system you're working on, along with how much disk space is available
- -h says to show sizes in human readable form (e.g. 12G instead of 12318201749)
- pwd - display the present working directory
- -P says to display the full absolute path
Create, rename, link to, delete files
- touch <file> – create an empty file, or update the modification timestamp on an existing file
- mkdir -p <dirname> – create directory <dirname>.
- -p says to create any needed subdirectories also
- mv <file1> <file2> – renames <file1> to <file2>
- mv <file1> <file2> ... <fileN> <dir><to_dir>/ – moves files <file1> <file2> ... <fileN> into directory <dir> <to_dir>
- mv -t <to_dir> <file1> <file2> ... <fileN> – same as above but specifies the target directory as an option (-t <to_dir>).
- ln -s <path> creates a symbolic (-s) link to <path> in the current directory
- a symbolic link can be deleted without affecting the linked-to file
- default link name corresponds to the last name component in <path>
- always change into (cd) the directory where you want the link before executing ln -sa symbolic link can be deleted without affecting the linked-to file
- rm <file> deletes a file. This is permanent - not a "trash can" deletion.
- rm -rf <dirname> deletes an entire directory – be careful!
...