...
Code Block |
---|
language | bash |
---|
title | Start an idev session |
---|
|
idev -p normal -m 180 -A UT-2015-05-18 -N 1 -n 68 --reservation=BIO_DATA_week_1 |
Next set up a directory for these exercises, and copy an indexed BAM file there. This is a renamed version of the yeast_pairedend.sort.bam file from our Alignment workflow exercises.
...
Code Block |
---|
|
module load biocontainers # takes a while
module load samtools
cd $SCRATCH/core_ngs/alignment/samtools
samtools view yeast_pe.sort.bam | head |
...
Code Block |
---|
|
samtools view yeast_pe.sort.bam | cut -f 6 | head -20 |
Next, make sure we're only looking at alignment records that represent mapped reads. The -F 0x4 option says to filter records where the 0x4 flag (read unmapped) is 0, resulting it only mapped reads being output.
Code Block |
---|
|
samtools view -F 0x4 yeast_pe.sort.bam | cut -f 6 | head -20 |
Now we use grep with the pattern '[ID]' to select lines (which are now just CIGAR strings) that have Insert or Deletion operators. The brackets ( [ ] ) denote a character class pattern, which matches any of the characters inside the brackets. Be sure to ask for Perl regular expressions (-P) to be sure that this character class syntax is recognized.
...
Tip |
---|
awk also has a printf function, which can take the standard formatting commands (see https://en.wikipedia.org/wiki/Printf_format_string#Type_field). Code Block |
---|
| awk 'BEGIN{ printf("%.2f %%\n", 100*6697/547664) }' |
Notes: - The printf arguments are enclosed in parentheses since it is a true function.
- The 1st argument is the format string, enclosed in double quotes.
- The %.2f format specifier says to output a floating point number with 2 digits after the decimal place.
- The %% format specifier is used to output a single, literal percent sign.
- Unlike the standard print statement, the printf function does not automaticlly append a newline to the output, so \n is added to the format string here.
- Remaining arguments to printf are the values to be substituted for the format specifiers.
- Here just our percentage computation
|
...
Code Block |
---|
|
cd $SCRATCH/core_ngs/alignment/samtools
# count the number of reads mapped to chromosome 2 (chrII)
samtools view -c -F 0x4 yeast_pe.sort.bam chrII
# count the number of reads mapped to chromosomes 1 or M (chrI, chrM)
samtools view -c -F 0x4 yeast_pe.sort.bam chrI chrM
# count the number of reads mapped to chromosomes 1 that overlap coordinates 1000-2000
samtools view -c -F 0x4 yeast_pe.sort.bam chrI:1000-2000
# since there are only 20 reads in the chrI:1000-2000 region, examine them individually
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000
# look at a subset of field for chrI:1000-2000 reads
# 2=flags, 3=contig, 4=start, 5=mapping quality, 6=CIGAR, 9=insert size
samtools view -F 0x4 yeast_pe.sort.bam chrI:1000-2000 | cut -f 2-6,9 |
...