Child pages
  • DNAseq Variant Calling Pipeline
Skip to end of metadata
Go to start of metadata

Identification and annotation of SNPs and/or somatic mutations compared to reference genome. 10 hour minimum ($730 internal, $930 external) per project.

1. Quality Assessment

Quality of data assessed by FastQC and SAMStat; results of quality assessment will be evaluated prior to downstream analysis.

  • Deliverables:
    • reports generated by FastQC and SAMStat
    • metrics specific to hybrid selection analysis calculated using Picard available as well
  • Tools Used:
    • FastQC: (Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity.
      • Overrepresented sequences: evaluation of adapter contamination.
    • SAMStat: (Lassman et. al. 2011) provides summary statistics at both fastq and SAM/BAM alignment levels.
    • Picard CalculateHsMetrics: ( evaluates hybrid selection protocols (target coverage and AT/GC dropout levels).

2. Mapping

Mapping to genome reference using BWA-mem (alternative algorithms available on request).

  • Deliverables:
    • bam files from both the initial alignment (BWA-mem by default, though other algorithms are available if desired)
    • bam files resulting from further processing using GATK
  • Tools Used:
    • BWA-mem: (Li 2013) primary aligner used to generate first pass read alignments (BWA-aln and BWA-sampe also available if desired, as are bowtie/bowtie2).
    • GATK: (McKenna et. al. 2010, Auwera et. al. 2013) IndelRealigner and BaseRecalibrator applied to correct indel-based misalignments and increase accuracy/dispersion of individual base quality scores

3a. Variant Calling Option 1: GATK

Genome Analysis Toolkit (GATK) used to call SNPs and indels according to best practices recommended by Broad institute.

  • Deliverables:
    • individual sample vcf files output by HaplotypeCaller
    • regenotyped and recalibrated merged vcf file output by GenotypeGVCFs
  • Tools Used (GATK):
    • HaplotypeCaller: reassembles "active regions" and applies PairHMM algorithm to select most likely genotype
    • GenotypeGVCFs: jointly re-genotypes, re-annotates and merges individual sample gVCFs from HaplotypeCaller into single aggregated vcf file
    • VariantRecalibrator: recalibrates variant call probabilities based on call annotations

3b. Variant Calling Option 2: Somatic Mutation Identification

MuTect and MutSig from the Broad institute are available for calling somatic mutations; other methods may be available upon request as well.

  • Deliverables:
    • MuTect and MutSig output files.
  • Tools Used:
    • MuTect: (Cibulskis et. al. 2013) identifies somatic point mutations based on two Bayesian classifiers:
      • LOD for observed tumor data given mutant site compared to observed tumor data given reference site,
      • LOD for observed normal data given reference site compared to observed normal data given mutant site.
    • MutSig: (Lawrence et. al. 2013) assesses significance of mutation calls using null model based on background mutation processes.

4. Annotation

Further annotation of variant calls may be provided using ANNOVAR.

  • Deliverables:
    • ANNOVAR output in tabular format (in plain text, csv, or excel format as desired).
  • Tools Used:
    • ANNOVAR: (Wang et. al. 2010) provides functional annotation of genetic variation encompassing multiple modalities (e.g., gene and region annotation and/or filtration based on established data sets).


  • No labels