A healthy taste of resources available, specifically for this course - not a comprehensive catalog.
Sequencing Technologies
Community Resources
- SEQAnwers forum - many NGS sequencing questions answered here
- A funny SEQAnwers post about biologists starting to analyze NGS data: http://seqanswers.com/forums/showthread.php?t=4589
- UCSC Genome Browser - visualize and download NGS data (see more below)
- Galaxy website for online sequencing data analysis
- Broad Institute Integrated Genomcs Viewer (IGV) - especially good for bam files
- 2012 Next-Gen Sequence Analysis Workshop (Michigan State University) has similar tutorials to our course
- also includes introductions to using the Amazon EC2 where you can "rent" Linux machines (useful if you don't have access to TACC)
- Python, R, ChIP-Seq, etc.maintained mRNA-seq protocols from this workshop here
Fastq analysis/manipulation
- Wikipedia FASTQ format page
- FastQC from Babraham Bioinformatics; produces nice quality report for fastq files.
- Cutadapt - An excellent command line tool for adapter sequence removal.
- FASTX Toolkit - Command line tools for fastq analysis and manipulation
- Illumina library construction on GSAF user wiki - useful for contaminant detection or adapter removal.
Alignment and aligners
- Jeff Barrick's introduction to NGS presentation
- Comparison of different aligners
- by Heng Li, developer of BWA and MAQ
- by Nils Homer, developer of BFAST
- Aligners
- bowtie (http://bowtie-bio.sourceforge.net/index.shtml) - very fast, not very sensitive
- BFAST wiki & manual - slow and relatively complicated, but tunable sensitivity
- bwa - fast, sensitive and easy to use
- bowtie2 - fast, sensitive, configurable, easy to use
- File formats
Alignment analysis
- SAM (Sequence Alignment Map) format specification (pdf)
- sam/bam tools
- samtools - sam/bam conversion, flag filtering, bam sort/index
- Picard - sam/bam utilities that are read-group aware
- Translate SAM file flags - type in a decimal number to see which flags are set
- SAMstat - produces detailed graphical statistics for sam/bam files.
- BEDTools - region overlap, merge, coverage & much more, w/bed, bam, vcf, gff support
- BEDTools manual on readthedocs.org
UCSC Genome Browser
- Visualize mapped data at UCSC genome browser on this wiki
- Main UCSC Genome Browser web site
- Beta Test browser site - most up-to-date datasets and features; can be buggy
- File formats - BED format especially is widely used
- Table browser - Browse and download data in different formats
- ENCODE data downloads at UCSC - useful for getting data to work with
Format converters and generation tools
- SRA (Sequence Read Archive) from NCBI
- overview on this wiki
- SRA search home page
- SRA Toolkit
- UCSC file format conversion scripts - useful for getting to/from wig and bed to corresponding binary formats.
- Make sure you download the correct script for your operating system!
- A directory containing these tools can be found on stampede at /work/01063/abattenh/local/UCSC_utilities
- Mason program for simulating second-generation sequencing reads.
Transcriptome analysis
- The Tuxedo pipeline: RNAseq with tophat/cufflinks
- RNAseq analysis protocol article in Nature Protocols
- tophat - exon-aware sequence alignment (uses bowtie2/bowtie )
- cufflinks - transcript assembly, differential expression & regulation
- cufflinks resource bundles for selected organisms (gff annotations, pre-built bowtie2 references, etc.)
Variant calling
- The 1000 Genomes project - catalog of human genetic variants
- Tools
- Broad institute GATK - complex but powerful; used by 1000 Genomes
- File formats
- VCF (Variant Call Format) v4.0 - developed by 1000 Genomes project