...
Code Block | ||||
---|---|---|---|---|
| ||||
idev -m 180 -N 1 -A OTH21164 -r CoreNGSday5CoreNGS-Fri # or idev -m 90 -N 1 -A OTH21164 -p development |
Next set up a directory for these exercises, and copy an indexed BAM file there. This is a renamed version of the yeast_pairedendpe.sort.bam file from our The Basic Alignment workflow Workflow exercises.
Code Block | ||||
---|---|---|---|---|
| ||||
mkdir -p $SCRATCH/core_ngs/samtools cd $SCRATCH/core_ngs/samtools cp $CORENGS/catchup/for_samtools/* . |
...
Now we use grep with the pattern '[ID]' to select lines (which are now just CIGAR strings) that have Insert or Deletion operators. The brackets ( [ ] ) denote a character class pattern, which matches any of the characters inside the brackets. Be sure to ask for Perl regular expressions (-P) to be sure so that this character class syntax is recognized.
...
In addition to the rate, converted to a percentage, we also output the literal percent sign ( % ). The double quotes ( " ) denote a literal string in awk, and the comma between the number and the string says to separate the two fields with whatever the default Output Field Separator (OFS) is. By default, both awk's Input Field Separator (FS) is whitespace (any number of space and Tab characters) and its Output Field Separator is a single space.
So what if we want to get fancy and do all this in a one-liner command? We can use echo and backtick evaluation to put both numbers in a string, then pipe that string to awk. This time we use awk body code, and refer to the two whitespace-separated fields being passed in: total count ($1) and indel count ($2).
Code Block | ||||
---|---|---|---|---|
| ||||
echo "`samtools view -F 0x4 -c yeast_pe.sort.bam` `samtools\ $( samtools view -F 0x4 yeast_pe.sort.bam | cut -f 6 | grep -P '[ID]' | wc -l`"l )" \ | awk '{ print 100 * $2/$1,"%" }' |
...
Tip | |||||
---|---|---|---|---|---|
awk also has a printf function, which can take the standard formatting commands (see https://en.wikipedia.org/wiki/Printf_format_string#Type_field).
Notes:
|
...
Code Block | ||
---|---|---|
| ||
# look at a subset of field for chrI:1000-2000 reads
# 2=flags, 3=contig, 4=start, 5=mapping quality, 6=CIGAR, 9=insert size
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | cut -f 2-6,9 | \
awk 'BEGIN{FS=OFS="\t"}
{$1 = sprintf("0x%x", $1); print}' |
...