Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titleStart an idev session
idev -m 180 -N 1 -A OTH21164 -r CoreNGSday5CoreNGS-Fri
# or
idev -m 90 -N 1 -A OTH21164 -p development

Next set up a directory for these exercises, and copy an indexed BAM file there. This is a renamed version of the yeast_pairedendpe.sort.bam file from our The Basic Alignment workflow Workflow exercises.

Code Block
languagebash
titleSetup for samtools exercises
mkdir -p $SCRATCH/core_ngs/samtools
cd $SCRATCH/core_ngs/samtools
cp $CORENGS/catchup/for_samtools/* .

...

Now we use grep with the pattern '[ID]' to select lines (which are now just CIGAR strings) that have Insert or Deletion operators. The brackets ( [ ] ) denote a character class pattern, which matches any of the characters inside the brackets. Be sure to ask for Perl regular expressions (-P) to be sure so that this character class syntax is recognized.

...

In addition to the rate, converted to a percentage, we also output the literal percent sign ( % ). The double quotes ( " ) denote a literal string in awk, and the comma between the number and the string says to separate the two fields with whatever the default Output Field Separator (OFS) is. By default, both awk's Input Field Separator (FS) is whitespace (any number of space and Tab characters) and its Output Field Separator is a single space.

So what if we want to get fancy and do all this in a one-liner command? We can use echo and backtick evaluation to put both numbers in a string, then pipe that string to awk. This time we use awk body code, and refer to the two whitespace-separated fields being passed in: total count ($1) and indel count ($2).

Code Block
languagebash
titleOne-liner for calculating BAM alignment indel rate
echo "`samtools view -F 0x4 -c yeast_pe.sort.bam` `samtools\
    $( samtools view -F 0x4 yeast_pe.sort.bam | cut -f 6 | grep -P '[ID]' | wc -l`"l )" \
    | awk '{ print 100 * $2/$1,"%" }'

...

Tip

awk also has a printf function, which can take the standard formatting commands (see https://en.wikipedia.org/wiki/Printf_format_string#Type_field).

Code Block
languagebash
awk 'BEGIN{ printf("%.2f %%\n", 100*6697/547664) }'

Notes:

  • The printf arguments are enclosed in parentheses since it is a true function.
  • The 1st argument is the format string, enclosed in double quotes.
    • The %.2f format specifier says to output a floating point number with 2 digits after the decimal place.
    • The %% format specifier is used to output a single, literal percent sign.
    • Unlike the standard print statement, the printf function does not automatically append a newline to the output, so \n is added to the format string here.
  • Remaining arguments to printf are the values to be substituted for the format specifiers (here our percentage computation).

...

Code Block
languagebash
# look at a subset of field for chrI:1000-2000 reads
#   2=flags, 3=contig, 4=start, 5=mapping quality, 6=CIGAR, 9=insert size
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | cut -f 2-6,9 | \
  awk 'BEGIN{FS=OFS="\t"}
       {$1 = sprintf("0x%x", $1); print}'

...