UNIXary

This is a work in progress. If you want something added, or explained better please let us know.

argument

argument(s) are provided to commands to tell them what to act on. For example, a file or #directory #path could be provided to the ls command to alter its default behavior of listing out the current working #directory.

cat

cat displays the contents of files listed its arguments in order even if it scrolls off the screen. This is more useful with standard output rediection; ie,

cat a b c > d

concatenates files a, b, c and stores the result in d.

cd

cd is the command used to change your current working #directory; eg,

cd /share/apps

cp

The copy command, cp, is used to copy files from #pathname to another. The basic usage is

cp src dest

where dest may be a file or directory in which case the file dest/src is created. More than one src can be provided in which case the destination must be a directory. For example,

cp src1 src2 dest

creates the files dest/src1 dest/src2. cp provides some useful #options:

-p => newly created files should have the permissions, and timestamps of existing files
-r => copies directories

diff

Environment

Every program runs in environment which is a collection of variables that are inherited from their parent, and which they pass on to their children. They use these variables to modify how they run. One important variable is #PATH which is used to find executable programs, but there are many others. For example, BLAST uses a variable called BLASTDB to find its databases.

globs

Globs are shell constructs which are used on the command line to stand for patterns of numbers. They are related to Regular Expressions so care must be made to understand their differences. There are 3 types of globs

* which matches any number of characters or none at all
? which matches any single character
[] which may contain single characters to be matched

Some example uses:

ls *.tar.gz (matches any file ending in .tar.gz)
ls c?t (matches cut, cat, and any other 3 letter words starting with c and ending with t)
ls c[au]t (matches cut, cat, but not other 3 letter words starting with c and ending with t)

WARNING When using globs with rm you need to be very careful to make sure the glob doesn't match more than intended. Using ls possibly with the -d option first is suggested. More so when used with -r. Some people run into problems because * does not match files that begin with a leading . as these are 'hidden' files, and are usually config files of some sort. This leads them them to run

rm .*

and disaster ensues if -r is also specified. This is because .* matches the special directory entry .. which stands for my parent. Always use .??* as a glob which matches hidden files.

grep

head/tail

kernel

The kernel is the core of the operating system. It is responsible for managing devices like disk drives, enforcing user permissions and access controls, and scheduling processes to be be run on the CPU.

ln

The ln command creates links which are file tree entries. Most common are symbolic links which are just fake names which point to an existing name. You create a symbolic link using

ln -s old new

where old and new are paths. Less common at least for normal uses are hard links. A hard link is a 2nd name for the same blocks of data. To create a hard link you use

ln old new

They are different in many ways, but the most notable difference is that with symbolic links you know that the link exists. Thus, if you remove the old file then the link dangles waiting for it to be recreated. On the other hand, there is no overt indication that a hard link exists. Either name may be removed with out affecting the fact that the other still exists, and can be used to refer to the blocks on the disks that store a particular file. The next major difference is that you cannot hard link a directory, but you can soft link one. Finally, soft links can be seen using ls -l which shows them as new -> old.

ls

ls is, next to cd, probably the most used UNIX command. It is used list out the contents of a directory. For example, you can do

ls path

to have the listing of #path. The #path #argument is optional; if you don't provide one, then the current working #directory is listed out. If #path is a directory the contents of the directory are listed out. If #path is a file, then only the file is listed out which at least confirms its existence. You can provide multiple paths, too.

ls has many useful #options

-l => show also meta-data such as modification and access times, permissions, sizes, and ownerships
-a => show all files (by default, files which start with . are not listed out since they are usually configuration files)
-d => list out a directory, and not it's contents
-t => sort by modification time (puts newly created or modified files at top)
-r => reverse sort
-F => add 1 character at end to indicate file type:
- * => executable (or at least executable permissions granted)
- / => directory
- others

man

The man command displays the manual page (as in Read The Fine Manual) for a particular command. To view more about the use of man run the command

man man

The -k option allows you to get a list (eg, man -k list shows you all of the commands with list in their synopsis which might be a way to search for information about listing files with the ls command. Finding, and reading man pages is an art that needs practice.

mkdir is the command to create one or more pathnames provided as arguments. It has one useful options, -p, which is used to create any intermediate #path elements. For example,

mkdir /home/foobar/doesnotexist/newdir

fails if /home, /home/foobar, and /home/doesnotexist all fail to exist. When -p is specified, though, any needed directories are created.

more / less

more and less are used to view the contents of one or more files a screen full at at time. They differ because less allows you to forwards and backwards in the file. See also head/tail.

mv

The move command, mv, is used when you want to rename a file. The basic usage is

mv src dest

NFS

options

options are used to modify the behavior of a command.

Permissions

pathname or path

UNIX stores and locates files in a tree fashion. The top of the tree is called / (referred to as #root). / contains several directories such as etc, usr, home, and so on. These can be referred to by the pathnames /etc, /usr, /home, etc. These may contain other directories, or files. Each file in the UNIX filesystem is then located by the full pathname that specifies exactly what branches must be followed starting with / and ending with the file. Thus, /home/foobar/specialfiles/data.fasta is the full path name for the file called data.fasta which is in user foobar's home directory in a directroy called specialfiles. Since each #process has a current working directory, you can use the special . and .. directories to to specify a relative path to an object in the tree. For example, if /home/foobar/specialdir is your current working directory, ../anotherspecialdir/specialfile specifies the file /home/foobar/anotherspecialdirectory/specialfile. On the other hand ./specialfile refers to /home/foobar/specialdir/specialfile. Thus,

ls ../anotherspecialdir

and

ls ./specialfile

works like expected. Note that in the second case the ./ may be left off since by default ls looks in it's current directory if you don't specify either a full or relative path to an object. This is not true when running a command. In order for a command to be executed, it must either by specified as a full path, or it needs to be in your #PATH variable. Thus, if there is a program called foo in your current working directory

foo

it will not be executed unless your current working directory is listed in your PATH, but

./foo

will (and overrides searching the PATH as does typing the full path name to foo). Information about a path can be found using #ls.

The path is an environmental variable which is used to specify a list of paths to be searched for programs. The PATH variable is to children processes by their invoking process and is typically set by your #shell initialization to a value such as /usr/bin:/bin which are the default locations for OS provided programs. You can add to it by typing

PATH=DIR:$PATH
export PATH

or

PATH=$PATH:DIR
export PATH

depending on where you want the entry to be added (since the paths are searched in order, and the first match is used). Typically you would do this to avoid having to type out a full path such as /share/apps/python-2.6.5/bin/python to launch a program, or when there is some ambiguity and you want to clearly state what you want. For example, there is a /usr/bin/python, and a /share/apps/python-2.6.5/bin/python which we provide. If you do nothing, then /usr/bin/python is found first, and it is run. CCBB provided apps are installed in /share/apps (either /share/apps/bin, or a more specific application directory). Since there are so many, and some different versions we do not set them in the path. Instead we expect you will use #ls and #cd to find the ones you need, or look them up elsewhere on this wiki.

pipe

A pipe is a connection between the standard input and standard output of two programs, and is specified with |. For example,

a | b

runs the program a, and tells b to read the output produced by a. Thus,

cat foo | sort | more

sorts file foo, and then displays it one screen at a time. The pipe is the cornerstone of UNIX philosophy: write simple programs that do one thing well, and use pipes to glue them together into more complex statements.

process

A process is a program that has been loaded into memory, and which is being executed. A process which depends on a disk or network reading and writing is said to be I/O bound. On the opposite extreme are CPU bound jobs. While this is ideal, too many CPU bound jobs can make the system sluggish for interactive users.

Regular Expression

rm

The rm command is used to remove paths which are files. By default it doesn't ask for permissions, and once executed the file is gone forever (unless there is a copy, or a backup to restore from). rm has several options:

-f => force removal even if permissions say it should be protected
-r => used to remove a directory AND its contents AND the contents of any subdirectories AND their contents (ie, USE WITH CAUTION).
-i => ask before removing

rmdir

The rmdir command is used to remove one or more directories which are specified as its arguements. To be removed, directories need to be empty. To get around this limitation without actually removing each item in the directory you can run rm -r. This is convenient if you are removing a directory tree which has several levels of sub-directories, but it is also potentially dangerous.

root

Either the top of the filesystem tree, the #user account which can perform any task on the computer, or the not to be trifled with person who has access to the root account. Due to the 2nd usage, a UNIX system which has been broken in to is said to be "rooted". This is not considered a good thing.

shell

Shells are used by #users to interactive with the system. Besides the ability to launch, and manage jobs, shells provide a powerful scripting language. This is the power of UNIX. By writing scripts it's easy to customize routine, repeatable actions that you might take to process your data. This also gives you a way to keep a record of how the process was done, and what might have resulted.

Accounts in CCBB are created using the Bourne Again Shell (BASH) which is descendant of the original Bourne Shell. This has been chosen as the standard because (1) it's the default for Linux systems, and (2) it has all of the nicer interactive features of other shells, while maintaining the scripting features of the Bourne, and Korn shells.

Learning to use the shell effectively is a major step towards harnessing the power of UNIX. Please review our New Users Guide for some good references.

standard input/output/error and redirection

Just as every program has an environment, every program has 3 input / output sources. These are called standard input, standard output, and standard error. Standard input is the keyboard. Standard output, and standard error are the screen, but they differ because standard output is buffered to avoid overwhelming the windowing environment with lots of little changes. Standard error is intended for program coders to use to immediately notify a program's user of an error. This can be a problem since the order of output may then appear in random locations as the buffering of output occurs at the same time that standard error is used.

Using shell redirection you can send connect these to files. For example, suppose you have a program that expects you to type some input before it actually process your files. If you wanted to run this program over and over again with the same inputs you could use

foo < input

where foo is the name the program, and input is a file that contains the input that you would have otherwise had to type. You can also redirect standard output using >. For example,

foo > output

which runs foo, and saves the standard output in a file called output. Likewise, standard error can be redirected with 2> as in

foo 2> error

You can redirect both with

foo > out 2>&1

Here order matters as &1 indicates that standard error 2> is tied (or redirected) to the same location as standard output (via &1). If you ran

foo 2>&1 >out

standard error would be tied to standard output (ie, the buffered version of the screen), and then standard output would be redirected to the file out. You can also do all 3

foo < in > out 2>error

Standard error, and output are useful because they let you save a record of the output of your data processing. You can then later refer to it to remind yourself of how a data processing run went. This is so important that the clustering batch processing system automatically creates output and error files for you.

swap

Swap is disk based virtual memory. When swap space is being used, the system is said to be swapping. This could be bad performance wise, but when memory is limited it may be the only way to get programs to run.

touch

user

A login to the system, though some user accounts are system accounts that do nothing more than own programs, or run services.

variable

zombie

A zombie is a #process which has terminated, and which is being kept by the #kernel in the process list in case the parent also terminates, or queries as to the state of the #process. When this occurs, the zombie is reaped from the process list.

Space shortcuts

Child pages