A healthy taste of resources available, specifically for this course - not a comprehensive catalog.
Linux
Community Resources
Sequencing Technologies
- Overviews
Technology intros
- Illumina (Solexa) – most common "short" (< 300 bp) read sequencing
- Newer single molecule sequencing
- Single cell sequencing
- Older technologies (less common now)
FASTQ analysis/manipulation/QC
Reference genomes
Basic alignment and aligners
- Comparison of different aligners
- by Heng Li, developer of bwa, samtools, and many other bioinformatics tools
- File formats
- input: FASTQ format
- output: the SAM (Sequence Alignment Map) format specification (SAM1.pdf)
- Aligners
- The BioITeam has some TACC-aware alignment scripts you might find useful:
- bwa alignment
/work/projects/BioITeam/common/script
/align_bwa_illumina.sh
- bowtie2 alignment
/work/projects/BioITeam/common/script/
align_bowtie2_illumina.sh
- merging sorted BAM files (read-group aware)
/work/projects/BioITeam/common/script/
merge_sorted_bams.sh
- email or come talk to Anna if you have questions or problems
Transcriptome-aware aligners
Alignment analysis
File formats and conversion
- SAM format specification – http://samtools.github.io/hts-specs/SAMv1.pdf
- crucial for performing format conversions, of which ChIP-seq analysis can have many
- Genome browser file formats – http://genome.ucsc.edu/FAQ/FAQformat.html
- BED, bedGraph, narrowPeak and many more
- SRA (Sequence Read Archive) from NCBI
- UCSC file format conversion scripts - useful for getting to/from WIG and BED to corresponding binary formats.
- Make sure you download the correct scripts for your operating system!
- Directories containing these tools can be found at TACC:
-
/work/projects/BioITeam/common/opt/UCSC_utils.2013_03
/work/projects/BioITeam/common/opt/UCSC_utils.2017_07
- Mason program for simulating NGS sequencing reads
UCSC Genome Browser
RNAseq/Transcriptome analysis
Variant calling
Genome Annotation