A healthy taste of resources available, specifically for this course - not a comprehensive catalog.
Linux
Community Resources
Sequencing Technologies
- Overviews
Technology intros
- Illumina (Solexa) – most common "short" (< 300 bp) read sequencing
- Newer "single molecule" sequencing
- "Single cell" sequencing
- Older technologies (less common now)
Fastq analysis/manipulation/QC
Reference genomes
Basic alignment and aligners
- Comparison of different aligners
- by Heng Li, developer of BWA, samtools, and many other
- File formats
- input: fastq format
- output: the SAM (Sequence Alignment Map) format specification (SAM1.pdf)
- Aligners
- The BioITeam has some TACC-aware alignment scripts you might find useful:
- bwa alignment
/work/projects/BioITeam/common/script
/align_bwa_illumina.sh
- bowtie2 alignment
/work/projects/BioITeam/common/script/
align_bowtie2_illumina.sh
- merging sorted BAM files (read-group aware)
/work/projects/BioITeam/common/script/
merge_sorted_bams.sh
- email or come talk to me if you have questions or problems
Transcriptome-aware aligners
Alignment analysis
File formats and conversion
- SAM format specification – http://samtools.github.io/hts-specs/SAMv1.pdf
- crucial for performing format conversions, of which ChIP-seq analysis can have many
- Genome browser file formats – http://genome.ucsc.edu/FAQ/FAQformat.html
- BED, bedGraph, narrowPeak and many more
- SRA (Sequence Read Archive) from NCBI
- UCSC file format conversion scripts - useful for getting to/from wig and bed to corresponding binary formats.
- Make sure you download the correct script for your operating system!
- Directories containing these tools can be found on ls5 at
-
/work/projects/BioITeam/common/opt/UCSC_utils.2013_03
/work/projects/BioITeam/common/opt/UCSC_utils.2017_07
- Mason program for simulating NGS sequencing reads
UCSC Genome Browser
RNAseq/Transcriptome analysis
Variant calling
Genome Annotation