Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Check the length of the SAM file you generated with wc -l. Since there is one alignment per line, there must be 586266 alignments (minus no more than 100 header lines), which is more than the number of sequences in the FASTQ file. This is bwa mem can report multiple alignment records for the same read, hopefully on either side of a splice junction. These alignments can still be tied together because they have the same read ID.

titleMore advanced piping...

To get an idea of how often each read aligned, and what the 'real' alignment rate is, use the following commands:

Code Block
cut -f 1 alignments/human_rnaseq_mem.sam | sort | uniq -c | less #This
# This gives you a view where each read is listed next to the number of entries it has in the SAM file
cut -f 1 alignments/human_rnaseq_mem.sam | sort | uniq -c | 

$1}' | sort | uniq -c | less
#This gives essentially a histogram of the number of times each read aligned - a plurality of reads aligned twice, which seems reasonable since these are all reads crossing a junction, but plenty aligned more or less
cut -f 1 alignments/human_rnaseq.sam | sort | uniq -c | awk '{print $1}' | 
sort | uniq -
c | less	

#This gives a better idea of the alignment rate, which is how many reads aligned at least once to the genome.  Divided by the number of reads in the original file, the real alignment rate is around 64.19 %.
cut -f 
1 alignments/human_rnaseq.sam | sort | uniq | wc -l	

# NOTE: 
Some of these one-liners are only reasonably fast if the files are relatively small (around a million reads or less).
 For bigger files, there are better ways to get this information, mostly using samtools
, which is a utility explicitly designed for manipulating SAM/BAM files. We'll cover samtools in the next section

This alignment rate is pretty good, but it could get better by playing around with the finer details of bwa mem.


Be aware that some downstream tools (for example the Picard suite, often used before SNP calling) do not like it when a read name appears more than once in the SAM file. To mark the extra alignment records as secondary, specify the bwa mem -M option. This option leaves the best (longest) alignment for a read as -is but marks additional alignments for the read as secondary (the 0x100 BAM flag). This designation also allows you to easily filter the secondary reads with samtools if desired.