Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleMore advanced piping...

To get an idea of how often each read aligned, and what the 'real' alignment rate is, use the following commands:

Code Block
languagebash
# This gives you a view where each read is listed next to the number of entries it has in the SAM file
cut -f 1 alignments/human_rnaseq_mem.sam | sort | uniq -c | less

#This gives essentially a histogram of the number of times each read aligned - a plurality of reads aligned twice, which seems reasonable since these are all reads crossing a junction, but plenty aligned more or less
cut -f 1 alignments/human_rnaseq_mem.sam | sort | uniq -c | awk '{print $1}' | sort | uniq -c | less	

#This gives a better idea of the alignment rate, which is how many reads aligned at least once to the genome.  Divided by the number of reads in the original file, the real alignment rate is aroundmuch 64higher.19 %.
cut -f 1 alignments/human_rnaseq_mem.sam | sort | uniq | wc -l	

# NOTE: Some of these one-liners are only reasonably fast if the files are relatively small (around a million reads or less). For bigger files, there are better ways to get this information, mostly using samtools.

This alignment rate is pretty good, but it could get better by playing around with the finer details of bwa mem.

...