Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This pipeline identifies regions of significant protein binding ("peaks") based on an annotated a reference genome. 10 hour 12 hour minimum ($730 internal$876 internal, $930 $1116 external) per  per project.

1. Quality Assessment

Quality of data assessed by FastQC ; results and aggregated with MultiQC. Results of quality assessment will be evaluated prior to downstream analysis.

  • Deliverables:
    • reports Reports generated by FastQC and MultiQC.
  • Tools used:
    • FastQC: (Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data. 
      • Overrepresented sequences: evaluation of adapter contamination.
    • MultiQC (https://multiqc.info/) used to aggregate FastQC, alignment, and other reports

2. Mapping

Mapping to genome reference performed using BWA.

  • Deliverables
    • Mapping results, as bam BAM files and mapping statistics.
  • Tools Used:
    • BWA: (Li 2013) primary aligner used to generate read alignments.
    • Samtools: (Li 2009) used to prepare bams BAMs and generate mapping statistics.
    • In-house statistics generation scripts 

4. Peak Calling

Counting the number of normalized ChIP-seq reads compared to a background control (Input or mock ChIP) to identify regions of binding enrichment.

  • Deliverables
    • Peak calls as narrowPeak (BED 6+) files, containing p-value, q-value, and fold enrichment scores for each peak.
    • Per-base normalized signal files as bigWigs.
  • Tools Used:
    • MACS2: (Zhang, 2008) used to identify and score peak regions.
    • bedtools (Quinlan, 2010) used for optional blacklist filtering.

5. Significance Threshold Analysis

Statistical analysis and informed heuristics to determine appropriate significance threshhold(s) for identifying peaks for downstream analysis.

  • Deliverables
    • Summary file outlining peak counts at selected levels (High, Medium, and Low stringency) and master file containing counts over a wide range of q-values and fold enrichment values. Peak count vs q-value and fold enrichment plots.
  • Tools Used:
    • R and in-house scripts used to produce peak count statistics and plots.

 

 

 

 

...