Thus, a large set of computational tools have been developed to quickly, and with sufficient (but not absolute) accuracy align each read to its best location, if any, in a reference. Even though many mapping tools exist, a few individual programs have a dominant "market share" of the NGS world. These programs vary widely in their design, inputs, outputs, and applications. In this section, we will primarily focus on two of the most versatile mappers: BWA and Bowtie2, the latter being part of the Tuxedo suite (e.g. transcriptome-aware Tophat2).
Connect to login8.stampede.tacc.utexas.edu
This should be second nature by now
You have already worked with a paired-end yeast ChIP-seq dataset, which we will continue to use here. The paired end data should already be located at:
bwa mem hg19/hg19.fa fq/human_rnaseq.fastq.gz > human_rnaseq_mem.sam
Now, check Check the length of the SAM file you generated with 'wc -l'. Since Since there is one alignment per line, there must be 586266 alignments (minus no more than 100 header lines), which is more than the number of sequences in the FASTQ file. This is because many reads will align twice or moreThis is bwa mem can report multiple alignment records for the same read, hopefully on either side of a splice junction. These These alignments can still be associated tied together because they will have the same read ID, but are reflected in more than one line. To .
To get an idea of how often each read aligned, and what the 'real' alignment rate is, use the following commands: