One of the steepest Unix/Linux learning curves is the sheer number of built-in commands, all of which have many options -- most of which you'll never use. Plus there are a number of advanced commands that are extremely powerful but also extremely complex.

To help address this, this page introduces a number of built-in Linux utilities along with some of their common options, by category.

Command line arguments can be replaced by standard input

Most built-in Linux commands that obtain data from a file can also accept the data piped in on their standard input.

And here's a Linux commands cheat sheet you may find useful:

Basic commands

Displaying file contents

  • cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
    • cat -n prefixes each line of output with its line number
    • CAUTION – only use on small files!
  • zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
    • CAUTION – only use on small files!
    • Another CAUTION – does not understand .zip or .bz2 compression formats
  • more and less pagers
    • both display their (possibly very long) input one Terminal "page" at a time
    • in more:
      • use spacebar to advance a page
      • use q or Ctrl-c to exit more
    • in less:
      • q – quit
      • Ctrl-f or space – page forward
      • Ctrl-b – page backward
      • /<pattern> – search for <pattern> in forward direction
        • n – next match
        • N – previous match
      • ?<pattern> – search for <pattern> in backward direction
        • n – previous match going back
        • N – next match going forward
    • use less -N to display line Numbers
    • use less -I to use case Insensitive pattern searches
    • less can be used directly on .gz format files
  • head and tail
    • show you the first or last 10 lines (by default) of their input
    • head -n 20 or just head -20 shows the first 20 lines
    • tail -n 2 or just tail -2 shows the last 2 lines
    • tail -n +100 or just tail +100 shows lines starting at line 100
    • tail -n +100 | head -20 shows 20 lines starting at line 100
    • tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
  • gunzip -c <file.gz> | more (or less) – like zcat, un-compresses lines of <file.gz> and outputs them to standard output
    • <file.gz> is not altered on disk
    • always pipe the output to a pager!

File system navigation

  • ls - list the contents of the specified directory
    • -l says produce a long listing (including file permissions, sizes, owner and group)
    • -a says show all files, even normally-hidden dot files whose names start with a period ( . )
    • -h says to show file sizes in human readable form (e.g. 12M instead of 12201749)
    • -t says to sort files on last modification time
    • -r says to reverse the current sort order
    • -d says to show directory listing information only, instead of directory contents
      • usually combined with -l, e.g.:  ls -ld <dirname>
  • cd <whereto> - change the current working directory to <whereto>. Some special <wheretos>:
    •  .. (period, period) means "up one level"
    •  ~ (tilde) means "my home directory"
    • - (dash) means "the last directory I was in"
  • find <in_directory> [ operators ] -name <expression> [  tests ]
    • looks for files matching <expression> in <in_directory> and its sub-directories
    • <expression> can be a double-quoted string including pathname wildcards (e.g. "[a-g]*.txt")
    • there are tons of operators and tests:
      • -type f (file) and -type d (directory) are useful tests
      • -maxdepth NN is a useful operator to limit the depth of recursion.
  • file <file> tells you what kind of file <file> is
  • df shows you the top level directory structure of the system you're working on, along with how much disk space is available
    • -h says to show sizes in human readable form (e.g. 12G instead of 12318201749)
  • pwd - display the present working directory
    • -P says to display the full absolute path
  • tree <directory> -  shows the file system hierarchy of the specified directory
    • tree is not available on all Linux systems

Create, rename, link to, delete files

  • touch <file> – create an empty file, or update the modification timestamp on an existing file
  • mkdir -p <dirname> – create directory <dirname>.  
    • -p says to create any needed sub-directories also
  • mv <file1> <file2> – renames <file1> to <file2>
    • mv <file1> ... <fileN> <to_dir>/  – moves files <file1> ... <fileN> into directory <to_dir>
    • mv -t <dir> <file1> ... <fileN> – same as above, but specifies the target directory as an option
      • via the -t <to_dir> option
  • ln -s <path> creates a symbolic (-s) link (symlink) to <path> in the current directory
    • a symbolic link can be manipulated as if it is the linked-to file or directory
      • and can be deleted without affecting the linked-to file/directory
    • the default link file name corresponds to the last name component in <path>
    • always specify the -s option to create a symbolic link
      • without the -s option a difficult-to-manage "hard link" is created
    • always change into (cd) the directory where you want the link before executing ln -s
    • ln -sf -t <target_dir>  <file1> <file2> ... <fileN> 
      • creates symbolic links to <file1> <file2> ... <fileN> in target directory <target_dir>
  • rm <file> deletes a file. This is permanent - not a "trash can" deletion.
    • rm -rf  <dirname> deletes an entire directory – be careful!

Copying files and directories

  • cp <source> [<source>...] <destination> copies the file(s) <source> [<source>...] to the directory or file  <destination>
    • using . (period) as the destination means "here, with the same name"
    • -p option says to preserve file modification timestamps
    • cp -r <dirname>/ <destination>/ will recursively copy the directory <dirname>/ and all its contents to the directory <destination>/.
    • cp -t <dirname>/ <file> [<file>...] copies one or more specified files to the target directory.
  • scp <user>@<host>:<remote_source_path> <local_destination_path>
    • Works just like cp but copies <remote_source_path> from the remote host machine to the <local_destination_path>
    • -p (preserve file times) and -r (recursive) options work the same as cp
    • scp <local_source_path>... <user>@<host>:<remote_destination_path> is similar, but copies one or more <local_source_path> to the <remote_destination_path> on the remote host machine.
    • A nice scp syntax resource is located here.
  • wget <url> fetches a file from a valid URL (e.g. http, https, ftp).
    • -O <file> specifies the name for the local file (defaults to the last component of the URL)
  • rsync -arvW <source_directory>/ <target_directory>/
    rsync -ptlrvP <source_directory>/ <target_directory>/
    • Recursively copies <source_directory> contents to <target_directory>, but only if <source_directory> files are newer or don't yet exist in <target_directory>
    • Remote path syntax (<user>@<host>:<absolute_or_home-relative_path>) can be used for either source or target, but not both.

    • Always include a trailing slash ( / ) after the source and target directory names!
    • -a means "archive" mode (equivalent to -ptl and some other options)
    • -r means recursively copy sub-directories (this is now the default behavior)
    • -v means verbose
    • -W means Whole file only
      • Normally the rsync algorithm compares the contents of files that need to be copied and only transfers the different portions. This option disables file content comparisons, which are not appropriate for large and/or binary files.
    • -p means preserve file permissions
    • -t means preserve file times
    • -l means copy symbolic links as links (this is the default behavior)
      • vs -L which means dereference the link and copy the file it refers to
    • -P means show transfer Progress (useful when large files are being transferred)
    • see https://manpages.ubuntu.com/manpages/trusty/man1/rsync.1.html

Miscellaneous commands

  • echo <text> prints the specified text on standard output
    • evaluation of metacharacters (special characters) inside the text may be performed first
    • -e says to enable interpretation of backslash escapes such as \t (tab) and \n newline
    • -n says not to output the trailing newline
  • wc -l  reports the number of lines (-l) in its input
    • wc -c reports the number of characters (-c) in its input
    • wc -w reports the number of words (-w) in its input
  • history lists your command history to the terminal
    • redirect to a file to save a history of the commands executed in a shell session
    • pipe to grep to search for a particular command
    • re-execute a previous command via !<NN> where <NN> is the history line number
  • env lists all the environment variables currently defined in your login session
  • seq N produce N numbers, 1-N, each on a separate line
  • xargs transfers data on its standard input to the command line of the specified command
    • e.g. ls ~/*.txt | xargs echo
  • which <pgm> searches all $PATH directories to find <pgm> and reports its full pathname
    • will report all the places it looked if <pgm> was not found
    • type <pgm> is more general and works for functions and aliases
  • du <file_or_directory..file_or_directory>
    • shows the disk usage (size) of the specified files/directories
    • -h says report the size in human-readable form (e.g. 12M instead of 12201749)
    • -s says summarize the directory size for directories
    • -c says print a grand total when multiple items are specified
  • groups - lists the Unix groups you belong to

Advanced commands

cut, sort, uniq

  • cut command lets you isolate ranges of data from its input lines
    • cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
      • use -d <delim> to change the field delimiter (Tab by default)
    • cut -c <character_number(s)> extracts one or more characters (-c) from each line of input
    • the <numbers> can be
      • a comma-separated list of numbers (e.g. 1,4,7)
      • a hyphen-separated range (e.g. 2-5)
      • a trailing hyphen says "and all items after that" (e.g. 3,7-)
    • cut does not re-order fields, so cut -f 5,3,1 acts like -f 1,3,5
  • sort sorts its input lines using an efficient algorithm
    • by default sorts each line lexically (as strings), low to high
      • use -n sort numerically (-n)
      • use -V for Version sort (numbers with consistent surrounding text)
      • use -r to reverse the sort order
    • use one or more -k <start_field_number>,<end_field_number> options to specify a range of "keys" (fields) to sort on
      • e.g. -k1,1 -2,2nr  to sort field 1 lexically and field 2 as a number high-to-low
      • by default, fields are delimited by whitespace -- one or more spaces or Tabs 
        • use -t <delim> to change the field delimiter (e.g. -t "\t" for Tab only; ignore spaces)
  • uniq -c counts groupings of its input (which should be sorted) and reports the text and count for each group
    • use cut | sort | uniq -c for a quick-and-dirty histogram

sed

  • sed (string editor) can be used to edit text using pattern substitution.
    • general form: sed 's/<search pattern>/<replacement>/'
    • note that sed's pattern matching syntax is quite different from grep's
    • Grymore sed tutorial

awk

  • awk is a powerful scripting language that is easily invoked from the command line
    • awk '<script>' - the '<script>'  is applied to each line of input (generally piped in)
      • always enclose '<script>' in single quotes to inhibit shell evaluation
      • awk has its own set of metacharacters that are different from the shell's
  • General structure of an awk script:
    • BEGIN {<expressions>}  –  use to initialize variables before any script body lines are executed
      • e.g. BEGIN {FS=":"; OFS="\t"; sum=0}
        • says use colon ( : ) as the input field separator (FS), and Tab ( \t ) as the output field separator (OFS)
          • the default input field separator (FS) is whitespace
            • one or more spaces or Tabs
          • the default output field separator (OFS) is a single space
        • initializes the variable sum to 0
    • {<body expressions>}  – expressions to apply to each line of input
      • use $1, $2, etc. to pick out specific input fields
      • e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
      • some special expressions:
        • NF - Number of Fields on the line
        • NR - Number of the Record (i.e., line number)
    • END {<expressions>} – executed after all input is complete
      • e.g. END {print sum}

Here is an excellent awk tutorial, very detailed and in-depth

cut versus awk

The basic functions of cut and awk are similar – both are field oriented. Here are the main differences:

  • Default field separators
    • Tab is the default field separator for cut
    • whitespace (one or more spaces or Tabs) is the default field separator for awk
  • Re-ordering
    • cut cannot re-order fields
    • awk can re-order fields, based on the order you specify
  • awk is a full-featured programming language while cut is just a single-purpose utility.

grep and regular expressions

The grep command

  • grep -P '<pattern>' searches for <pattern> in its input, and only outputs lines containing it
    • always enclose '<pattern>' in single quotes to inhibit shell evaluation!
      • pattern-matching metacharacters are very different from those in the shell
    • -P says to use Perl patterns, which are much more powerful than standard grep patterns
    • -v (inverse match) – only print lines with no match
    • -n (line number) – prefix output with the line number of the match
    • -i  (case insensitive) – ignore case when matching
    • -l says return only the names of files that do contain the pattern match
    • -L says return only the names of files that do not contain the pattern match
    • -c says just return a count of line matches
    • -A <n> (After) and -B <n> (Before) – output '<n>' number of lines after or before a match

Regular expressions

A regular expression (regex) is a pattern of characters and metacharacters that control and modify how search matching is done.

  • <pattern> (a regular expression, or regex) can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are supported by most languages (e.g. grep -P)
    • ^ – matches beginning of line
    • $ – matches end of line
    • .  – (period) matches any single character
    • * – modifier; place after an expression to match 0 or more occurrences
    • + – modifier, place after an expression to match 1 or more occurrences
    • ? – modifier, place after an expression to match 0 or 1 occurrences
    • \s – matches any whitespace character (\S any non-whitespace)
    • \d – matches digits 0-9
    • \w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
    • \t matches Tab
    • \n matches Linefeed\r matches Carriage return
    • [xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
      • this is called a character class.
      • use [^xyz123] to match any single character not listed in the class
    • (Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
      • note that parentheses ( ) may also be used to capture matched sub-expressions for later use
  • in Perl, where a pattern is delimited by //, modifiers appear after the pattern:
    • i - perform case-insensitive text matching
    • g - perform the specified substitution globally on each input record, not just on the 1st match

Regular expression modules are available in nearly every programming language (Perl, Python, Java, PHP, awk, even R)

  • each "flavor" is slightly different
  • even bash has multiple regex commands: grep, egrep, fgrep.

There are many good online regular expression tutorials, but be sure to pick one tailored to the language you will use.

Field delimiter summary

Be aware of the default field delimiter for the various bash utilities, and how to change them:

utilitydefault delimiterhow to changeexample
cutTab-d or --delimiter optioncut -d ':' -f 1 /etc/passwd
sortwhitespace
(one ore more spaces or Tabs)
-t or --field-separator optionsort -t ':' -k1,1 /etc/passwd
awk

whitespace (one ore more spaces or Tabs)

Note: some older versions of awk do not treat Tabs as field separators.

  • In the BEGIN {  } block
    • FS= (input field separator)
    • OFS= (output field separator)
  • -F or --field-separator option

cat /etc/fstab | grep -v '^#' | awk 'BEGIN{OFS="\t"}{print $2,$1}'

cat /etc/passwd | awk -F ":" '{print $1}'
joinone or more spaces-t option
join -t $'\t' -j 2 file1 file12
perlwhitespace
(one ore more spaces or Tabs)
when auto-splitting input with -a
-F'/<pattern>/' optioncat /etc/fstab | grep -v '^#' | perl -F'/\s+/' -a -n -e 'print "$F[1]\t$F[0]\n";'
readwhitespace
(one or more spaces or Tabs)
IFS= (input field separator) optionNote that a bare IFS= removes any field separator, so whole lines are read each loop iteration.

Other bash resources




  • No labels