Byte Club, October 18 2017. Using MultiQC to produce consolidated QC Reports. Anna Battenhouse, CSSB & CCBB.
ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.
For data, we will use some ATAC-seq datasets produced in Igor Ponomarev's lab in WCAAR. As a proof-of-concept for future work, they performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.
Login to ls5 at TACC. Execute these commands to set up access to the multiqc binary:
module load python export PATH="/work/projects/BioITeam/ls5/bin/multiqc-1.0:$PATH" export PYTHONPATH="/work/projects/BioITeam/ls5/lib/python2.7/annab-packages:$PYTHONPATH" # make sure it is working... multiqc --help |
The FastQC took is great for producing detailed reports for every individual fastq file. For example, for Igor's 2 PE datasets, 4 reports are produced from running fastqc (http://web.corral.tacc.utexas.edu/iyer/igor/fastqc/).
The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.
This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.
For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run:
mkdir -p $SCRATCH/byteclub/multiqc/01_fastq cd $SCRATCH/byteclub/multiqc/01_fastq ln -s -f /work/01063/abattenh/projects/byteclub/multiqc/fastqc ln -s -f $SCRATCH ~/scratch |
Now this is all it takes to produce a basic MultiQC report:
cd $SCRATCH/byteclub/multiqc/01_fastq multiqc . |
When this completes you'll see a new file and directory:
Here's what this basic FastQC report looks like: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/01_basic.multiqc_report.html
To view the file you created in a web browser, it must be copied somwhere where a browser can open it. An easy way to do this is to copy it to your laptop like this, for example, changing the user name from abattenh and scratch path as appropriate.
|
xx
code |
xx
Below are descriptions of two projects I've assisted with lately using MultiQC to help pull together visualizations assessing experiment quality.
I recommend using Chrome to view MultiQC reports. The HTML reports generated by MultQC rely heavily on JavaScript and other dynamic web content scripting tools, and not all browsers support them equally well. |
ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.
Igor Ponomarev's lab (in WCAAR) performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.
The Marcotte lab is working on a deep mutational screening project of a human gene transformed into yeast as an amplicon on a plasmid. Here, the gene is MVK, a gene in the yeast cholesterol biosynthesis pathway. The hsMVK gene is amplified with an error-prone polymerase to produce point mutations. Both the native yeast gene and the human ortholog (with which it shares no sequence similarity) are under on/off promoter control. The idea is to compare the mutations that accumulate in the active hsMVK gene, after many growth cycles, with a background in which the hsMVK gene is present but not active (the yeast MVKis doing the work) to see which mutations are favored or disfavored. As part of this project, Riddhiman Garge produced 19 datasets.