The University Wiki Service has upgraded the Confluence Server software, from version 5.9.14 to 5.10.8. Please refer to the knowledge base article, KB0015891, for a high level summary of upgrade changes. Thank you!
Skip to end of metadata
Go to start of metadata

This pipeline identifies regions of significant protein binding ("peaks") based on an annotated genome. 10 hour minimum ($470 internal, $600 external) per project.

1. Quality Assessment

Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

  • Deliverables:
    • reports generated by FastQC
  • Tools used:
    • FastQC: (Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
      • Overrepresented sequences: evaluation of adapter contamination.

2. Mapping

Mapping to genome reference performed using BWA.

  • Deliverables
    • Mapping results, as bam files and mapping statistics.
  • Tools Used:
    • BWA: (Li 2013) primary aligner used to generate read alignments.
    • Samtools: (Li 2009) used to prepare bams and generate mapping statistics.

4. Peak Calling

Counting the number of normalized ChIP-seq reads compared to a background control (Input or mock ChIP) to identify regions of binding enrichment.

  • Deliverables
    • Peak calls as narrowPeak (BED 6+) files, containing p-value, q-value, and fold enrichment scores for each peak.
    • Per-base normalized signal files as bigWigs.
  • Tools Used:
    • MACS2: (Zhang, 2008) used to identify and score peak regions.
    • bedtools (Quinlan, 2010) used for optional blacklist filtering.

5. Significance Threshhold Analysis

Statistical analysis and informed heuristics to determine appropriate significance threshhold(s) for identifying peaks for downstream analysis.

  • Deliverables
    • Summary file outlining peak counts at selected levels (High, Medium, and Low stringency) and master file containing counts over a wide range of q-values and fold enrichment values. Peak count vs q-value and fold enrichment plots.
  • Tools Used:
    • R, in-house scripts and ggplot: used to produce peak count statistics and plots.

 

 

 

 

 

  • No labels