A healthy taste of resources available, specifically for this course - not a comprehensive catalog.
Sequencing Technologies
Community Resources
- SEQAnwers forum - many NGS sequencing questions answered here
- UCSC Genome Browser - visualize and download NGS data (see more below)
- Galaxy website for online sequencing data analysis
- Broad Institute Integrated Genomcs Viewer (IGV) - especially good for bam files
Getting started with Linux and NGS
- Cheat sheet of useful Unix commands
- A funny SEQAnwers post about biologists starting to analyze NGS data
- http://seqanswers.com/forums/showthread.php?t=4589
- Wikipedia FASTQ format page
- FastQC from Babraham Bioinformatics; produces nice quality report for fastq files.
- Cutadapt - An excellent command line tool for adapter sequence removal.
- FASTX Toolkit - Command line tools for fastq analysis and manipulation
- Illumina library construction on GSAF user wiki - useful for contaminant detection or adapter removal.
Alignment and aligners
- Jeff Barrick's introduction to NGS presentation
- Comparison of different aligners
- by Heng Li, developer of BWA and MAQ
- by Nils Homer, developer of BFAST
- Aligners
- bowtie (http://bowtie-bio.sourceforge.net/index.shtml) - very fast, not very sensitive
- BFAST wiki & manual - slow and relatively complicated, but tunable sensitivity
- bwa - fast, sensitive and easy to use
- bowtie2 - fast, sensitive, configurable, easy to use
- File formats
Alignment analysis
- SAM (Sequence Alignment Map) format specification (pdf)
- sam/bam tools
- samtools - sam/bam conversion, flag filtering, bam sort/index
- Picard - sam/bam utilities that are read-group aware
- Translate SAM file flags - type in a decimal number to see which flags are set
- SAMstat - produces detailed graphical statistics for sam/bam files.
- BEDTools - region overlap, merge, coverage & much more, w/bed, bam, vcf, gff support
- BEDTools manual on readthedocs.org
UCSC Genome Browser
- intro on this wiki
- Main UCSC Genome Browser web site
- Beta Test browser site - most up-to-date datasets and features; can be buggy
- File formats - BED format especially is widely used
- Table browser - Browse and download data in different formats
- ENCODE data downloads at UCSC - useful for getting data to work with
Variant calling
- The 1000 Genomes project - catalog of human genetic variants
- Tools
- Broad institute GATK - complex but powerful; used by 1000 Genomes
- File formats
- VCF (Variant Call Format) v4.0 - developed by 1000 Genomes project
Transcriptome analysis
- The Tuxedo pipeline: RNAseq with tophat/cufflinks
- tophat - exon-aware sequence alignment (uses bowtie)
- cufflinks - transcript assembly, differential expression & regulation
- RNAseq analysis protocol article in Nature Protocols
- cufflinks resource bundles for selected organisms (gff annotations, pre-built bowtie references, etc.)
Format converters and miscellaneous tools
- SRA (Sequence Read Archive) from NCBI
- overview on this wiki
- SRA search home page
- SRA Toolkit
- Mason program for simulating second-generation sequencing reads.
Other courses with online tutorials
- 2012 Next-Gen Sequence Analysis Workshop (Michigan State University) has similar tutorials to our course, but also includes introductions to using the Amazon EC2 where you can "rent" Linux machines (useful if you don't have access to TACC), Python, R, ChIP-Seq, etc.
- maintained mRNA-seq protocols from this workshop here