Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The bwa mem alignment
    • the program's progress output (on standard error) is redirected to a log file (2>hs_rna.bwamem.log)
    • its alignment records (on standard output) is piped to the next step (conversion to BAM)
  • Conversion of bwa mem's SAM output to BAM format
    • recall that the -b option to samtools view says to output in BAM format
  • Sorting the BAM file
    • samtools sort takes the binary output from samtools view and writes a sorted BAM file.

...

Expand
titleAnswer:

Count the FASTQ file reads:

Code Block
languagebash
cd $SCRATCH/core_ngs/alignment/bwamem
zcat ./fastq/human_rnaseq.fastq.gz | wc -l | awk '{print $1/4}'

The file has 100,000 reads.

Generate alignment statistics from the sorted BAM file:

Code Block
languagebash
cd $SCRATCH/core_ngs/alignment/bwamem
samtools flagstat human_rnaseq.sort.bam | tee hs_rnaseq.flagstat.txt

Output will look like this:

Code Block
133570 + 0 in total (QC-passed reads + QC-failed reads)
33570 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
133450 + 0 mapped (99.91% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

There were 133,570 alignment records reported for the 100,000 input reads.

Because bwa mem can split reads and report two alignment records for the same read, there are 33,570 secondary reads reported here.

...

Tip

Be aware that some downstream tools (for example the Picard suite, often used before SNP calling) do not like it when a read name appears more than once in the SAM file. Such reads can be filtered, but only if they can be identified as secondary by specifying the bwa mem -M option as we did above. This option reports the longest alignment normally but marks additional alignments for the read as secondary (the 0x100 BAM flag). This designation also allows you to easily filter out the secondary reads with samtools view -F 0x104 if desired.

...