Overview:

The problem with yesterday's SAMtools tutorial was determined to be a rather annoying typo in the mpileup command. Specifically, the SAMtools mpileup -f option is supposed to reference the reference.fasta file that you have previously indexed, NOT the indexed file itself. My most profound apologies for not having caught this quicker when it became clear there was a problem.

Solutions:

  1. The SNV_tutorial is now fixed on the website and the commands will work as they appear with the output from the bowtie2 mapping tutorial that you completed in class.
  2. I have created a single text file that you can execute on an idev node (NOT as a commands file submitted to the queue!) This file will execute the entire samtools pipeline that exists within the SNV tutorial in ~7 minutes using the output of the bowtie2 tutorial stored in a permanent directory.

Use a single file to execute the entire pipeline.

At the end of class it was asked if there was a way to execute multiple commands on a personal computer rather than at TACC. The answer is: Yes, there are several ways, all of which should work at TACC as well as on your personal computer. The only thing that is specific to TACC is the use of the idev node or the submission queue. 

  1. Rather than stringing commands together with pipes and redirection,  you can chain them together with the semicolon (;) to force everything on the left to be completed before reading through to the next command. Consider the following example and explanation:
    1. mkdir test;cd test
    2. Will create a new folder named test and then immediately change into that new directory. 
    3. Note that the spacing around the ; symbol does not matter.
  2. Much like the commands file during TACC queue submission, a single file can be used to execute multiple commands. The difference is that when a single file is executed it takes each line as a new command and executes them in the order that it encounters them rather than splitting them up and sending them to multiple processors or computers. The requirement is that the file itself must be executable. To make a file executable the following example commands can be used to make a file executable by everyone:
    1. chmod +x file
    2. chmod 777 file
    3. wikipedia is a good source of information for how to make things executable for narrower swaths of people, or to modify read and write permissions as well

In this case, I have made the full_snv_pipeline executable by everyone.

 

type (or copy) these 3 commands, waiting for the idev session to open after the first.
idev -A UT-2015-05-18
cp $BI/gva_course/snv_tutorial_fix/full_snv_pipeline .
./full_snv_pipeline
This is the output you will see
[samopen] SAM header is present: 1 sequences.
[bam_sort_core] merging from 2 files...
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
[bcfview] 100000 sites processed.
[afs] 0:99968.324 1:23.090 2:8.586
[bcfview] 200000 sites processed.
[afs] 0:99972.734 1:22.265 2:5.001
[bcfview] 300000 sites processed.
[afs] 0:99969.505 1:25.152 2:5.343
[bcfview] 400000 sites processed.
[afs] 0:99977.665 1:19.328 2:3.007
[bcfview] 500000 sites processed.
[afs] 0:99964.202 1:24.792 2:11.006
[bcfview] 600000 sites processed.
[afs] 0:99979.628 1:9.529 2:10.843
[afs] 0:49405.456 1:8.543 2:1.001
full_snv_pipeline contents
mkdir samtools_tutorial_fix  # make a new directory
cd samtools_tutorial_fix/  # change into that directory
cp /corral-repl/utexas/BioITeam/gva_course/mapping/bowtie2/SRR030257.sam .  # copy the *.sam output file from the bowtie2mapping tutorial to the current directory
cp /corral-repl/utexas/BioITeam/gva_course/mapping/bowtie2/NC_012967.1.fasta .  # copy the reference.fasta file from the bowtie2mapping tutorial to the current directory
samtools faidx NC_012967.1.fasta  # index the reference file
samtools view -b -S -o SRR030257.bam SRR030257.sam  # convert the *.sam file to a *.bam file
samtools sort SRR030257.bam SRR030257.sorted  # sort the alignments based on the leftmost alignment position
samtools index SRR030257.sorted.bam  # index the sorted bam file for fast/easy access
samtools mpileup -u -f NC_012967.1.fasta SRR030257.sorted.bam > SRR030257.bcf  # create a BCF file identifying variant locations
bcftools view -v -c -g SRR030257.bcf > SRR030257.vcf  # convert the *.bcf file to a *.vcf format so it can be interogated.

Once the command prompt is returned, you can begin to interrogate the VCF file as you wish. As the tutorial notes, there is some meta data about the file at the top of it with lines preceeded by a double pound sign (##), followed by a quite useful header line, and then row after row after row of potential genetic variants.

  • No labels