Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We might be able to get away with just using this literal alone as our regex, specifying '>' as the command line argument. But for grep, the more specific the pattern, the better. So we constrain where the > can appear on the line. The special carat ( ^ ) character represents "beginning of line". o meansSo ^> means "beginning of a line followed by a > character (followed by anything). (Aside: the dollar sign ( $ ) character represents "end of line" in a regex. There are many other special characters, including period ( . ), question mark ( ? ), pipe ( | ), parentheses ( ( ) ), and brackets ( [ ] ), to name the most common.)

Exercise: How many contigs are there in the sacCer3 reference?

Expand
titleHint

grep -P '^>' sacCer3.fa | wc -l

or use grep's -c option that says "just count the line matches"

grep -P -c '^>' sacCer3.fa

Expand
titleAnswer

There are 17 contigs.

Performing the alignment

Now, we're ready to execute the actual alignment, with the goal of producing a SAM/BAM file from the input FASTQ files and reference.  We will first generate SAI files from each of the FASTQ files with the reference individually using the 'aln' command, then combine them (with the reference) into one SAM/BAM output file using the 'sampe' command.   We need a directory to put the alignments when they are finished, as well as any intermediate files, so create a directory called 'alignments'.  The command flow, all together, is as follows. Notice how each file is in its proper directory, which requires us to specify the whole file path in the alignment commands.

...