...
Expand | ||
---|---|---|
| ||
Parameters are:
|
...
As noted earlier, many microbial genomes are available through repositories like GenBank that use specific file format conventions for storage and distribution of genome sequence and annotations. The "Genbank" file format is a text file that can be parsed to yield other files that are compatible with the pipelines we have been implementing. Go ahead and look at some of the contents of a Genbank file with the following commands:
Code Block |
---|
cd $SCRATCH$WORK/core_ngs/references less vibCho.O395.gbk grep -A 5 ORIGIN vibCho.O395.gbk |
...
To build the reference index for alignment, we actually only need the FASTA file, since annotations are frequently often not necessary for alignment purposes. (This is not always true , as - extensively spliced transcriptomes requires splice junction annotations to align RNA-seq data properly, but for now we will only use the FASTA file.) We build the reference files exactly as we did with mirbase, only swapping out the FASTA files as follows:
Code Block |
---|
bp_genbank2gff3.plmkdir --format Genbank p $WORK/core_ngs/references/bt2/vibCho mv $WORK/core_ngs/references/vibCho.O395.gbk mv vibCho.O395.gbk.gff fa $WORK/core_ngs/references/fasta cd $WORK/core_ngs/references/bt2/vibCho ln -s -f ../../fasta/vibCho.O395.gfffa less vibCho.O395.gff |
Performing the bowtie2 alignment
ls -la |
Now build the index using bowtie2-build like we did for the mirbase index:
Code Block |
---|
bowtie2-build vibCho.O395.fa vibCho.O395 |
This should also go pretty fast: you can see the resulting files using ls like before.
Performing the bowtie2 alignment
Now we will go back to our scratch area to do the alignment, and set up symbolic links to the index in the work area to simplify the alignment command:
Code Block |
---|
cd $SCRATCH/core_ngs/alignment
ln -s -f $WORK/core_ngs/references/bt2/vibCho vibCho |
In the previous exercise, we set specific local alignment parameters for bowtie2 that took into account the fact that the dataset was derived from small RNA. However, in this case the data is from standard mRNA sequencing, meaning that the DNA fragments are typically longer than the reads. There is likely to be very little contamination that would require using a local rather than global alignment, or many other pre-processing steps (e.g. adapter trimming). Thus, we will run bowtie2 with default parameters, omitting any options other than the input, output, and reference index. The command would look like this:
Code Block |
---|
bowtie2 -x vibCho/vibCho.O395.fa -U fastq/cholera_rnaseq.fastq.gz -S cholera_rnaseq.sam |
Expand | ||
---|---|---|
| ||
Parameters are:
|
Create a commands file called bt2_vibCho.cmds with this task definition then generate and submit a batch job for it (time 1 hour, normal queue).
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
Use nano to create the bt2_vibCho.cmds file. Then:
|
When the job is complete you should have a cholera_rnaseq.sam file that you can examine using whatever commands you like. In the past two exercises, we have explored a few different sorts of commands that you might run on this file to get different sorts of information. Those commands will provide similar information here, since this file is also in SAM/BAM format. In the last exercise, we will come back to this SAM file (and the others) to explore ways to compress them effectively and to extract basic quality statistics.(this will stop here because samtools index/idxstats/flagstat will be Exercise #5)
Exercise #4: BWA-MEM - Human mRNA-seq
...
Expand | |||||
---|---|---|---|---|---|
| |||||
|
...