Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • MultiQC produces neat, interactive plots in an HTML file.
    • So it can be used as a basic plotting tool for many kinds of reports and data, not just those produced by NGS tools!

Code Workshop

ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.

For data, we will use some ATAC-seq datasets produced in Igor Ponomarev's lab in WCAAR. As a proof-of-concept for future work, they performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.

Setup to follow along

Login to ls5 at TACC. Execute these commands to set up access to the multiqc binary:

Code Block
languagebash
module load python
export PATH="/work/projects/BioITeam/ls5/bin/multiqc-1.0:$PATH"
export PYTHONPATH="/work/projects/BioITeam/ls5/lib/python2.7/annab-packages:$PYTHONPATH"
 
# make sure it is working...
multiqc --help

Produce a consolidated FastQC report

...

Example Reports from Anna

Below are descriptions of two projects I've assisted with lately using MultiQC to help pull together visualizations assessing experiment quality.

Tip

I recommend using Chrome to view MultiQC reports.

The HTML reports generated by MultQC rely heavily on JavaScript and other dynamic web content scripting tools, and not all browsers support them equally well.

  • These example MultiQC reports below were generated by running the multiqc binary on a command line.
  • After inspecting them locally (by just opening them as files in a web browser), they were copied to a web-accessible location to share with others. Here, that location is Iyer Lab's web-accessible directory on corral 

Igor Ponomarev ATAC-seq data

ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.

Igor Ponomarev's lab (in WCAAR) performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.

...

Marcotte lab amplicon sequencing

The Marcotte lab is working on a deep mutational screening project of a human gene transformed into yeast as an amplicon on a plasmid. Here, the gene is MVK, a gene in the yeast cholesterol biosynthesis pathway. The hsMVK gene is amplified with an error-prone polymerase to produce point mutations. Both the native yeast gene and the human ortholog (with which it shares no sequence similarity) are under on/off promoter control. The idea is to compare the mutations that accumulate in the active hsMVK gene, after many growth cycles, with a background in which the hsMVK gene is present but not active (the yeast MVKis doing the work) to see which mutations are favored or disfavored. As part of this project, Riddhiman Garge produced 19 datasets.

The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.

This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.

For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run:

Code Block
languagebash
mkdir -p $SCRATCH/byteclub/multiqc
cd $SCRATCH/byteclub/multiqc
ln -s -f /work/01063/abattenh/projects/byteclub/multiqc/fastqc

Now this is all it takes to produce a basic MultiQC report:

Code Block
languagebash
cd $SCRATCH/byteclub/multiqc
multiqc .

When this completes you'll see a new file and directory:

  • multiqc_report.html – the MultiQC HTML report with its default name
  • multiqc_data – directory with text files containing  MultiQC data used in the report as well as a log file

...

  • basic FastQC report

      ...

      ...

      ...

      titleTip

      To view the file you created in a web browser, it must be copied somwhere where a browser can open it. An easy way to do this is to copy it to your laptop like this, for example, changing the user name from abattenh and scratch path as appropriate.

      ...

      languagebash

      ...

      ...

      ...

      ...

      ...

        • this tool looks only at the overlapping portions of paired-end R1 and R2 reads

      Code Workshop

      ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.

      For data, we will use some ATAC-seq datasets produced in Igor Ponomarev's lab in WCAAR. As a proof-of-concept for future work, they performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.

      Setup to follow along

      Login to ls5 at TACC. Execute these commands to set up access to the multiqc binary:

      Code Block
      languagebash
      module load python
      export PATH="/work/projects/BioITeam/ls5/bin/multiqc-1.0:$PATH"
      export PYTHONPATH="/work/projects/BioITeam/ls5/lib/python2.7/annab-packages:$PYTHONPATH"
       
      # make sure it is working...
      multiqc --help

      Produce a consolidated FastQC report

      The FastQC took is great for producing detailed reports for every individual fastq file. For example, for Igor's 2 PE datasets, 4 reports are produced from running fastqc (http://web.corral.tacc.utexas.edu/iyer/igor/fastqc/).

      The shortcoming is that you have to browse through all the individual reports one at a time, which can be tedious for large experiments.

      This is where MultiQC's power comes in. You can point MultiQC to a directory where FastQC has been run and it will magically produce a consolidated report.

      For example, logged in to ls5 at TACC, first stage a directory where FastQC has been run

      Add a few customizations

      MultiQC reports can be customized by creating a file called multiqc_config.yaml in the directory where you call multiqc.

      Use your favorite text editor to create a a file called multiqc_config.yaml in your $SCRATCH/byteclub/multiqc directory as shown below. This will add report title lines and change the names of the MultiQC output files.

      Code Block
      titlemultiqc_config.yaml
      # Titles to use for the report.
      title: "ATAC-Seq QC Reports"
      subtitle: null
      intro_text: "MultiQC reports for Igor's ATAC-Seq proof-of-concept project."
      report_header_info:
          - Sequenced by: 'GSAF'
          - Job: 'JA17277'
          - Run: 'SA17121'
          - Setup: '2x150'
      
      # Change the output filenames
      output_fn_name: mqc_report.html
      data_dir_name: mqc_report_data
      Expand
      titleCatch up

      To catch up, just stage Anna's pre-made files:

      Code Block
      languagebash
      mkdir -p $SCRATCH/byteclub/multiqc/
      cd $SCRATCH/byteclub/multiqc/
      rsync -avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/01_fastq/ .

      After saving this file, remove the previous MultiQC outputs and re-run the program:

      Code Block
      languagebash
      mkdir -p $SCRATCH/byteclub/multiqc
      cd $SCRATCH/byteclub/multiqc
      rmln -s -f /work/01063/abattenh/projects/byteclub/multiqc/fastqc

      Now this is all it takes to produce a basic MultiQC report:

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqcrf multiqc_data multiqc_report.html
      multiqc .

      ...

      When this completes you'll see a new file and directory:

      • multiqc

      ...

      • _report.html – the MultiQC HTML report with its default name
      • multiqc_data – directory with text files containing  MultiQC data used in the report as well as a log file

      Here's what this basic FastQC report looks like file in should look like this (note the new title and header)http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/0201_custombasic.mqcmultiqc_report.html.

      Tips for working with the MultiQC configuation file

      Here are a few tips for working with the MultiQC configuration file.

      • Always use spaces (not tabs!) in the multiqc_config.yaml file.
      • Make sure the file is saved with Unix line endings (not Windows or Mac).
      • Pay attention to the output when running multiqc. It will tell you if there are issues parsing the config file.
      • Always delete any previous MultiQC output files before running multiqc
        • While their documentation says existing files will just be updated, I have seen MultiQC get confused when previous reports exist.
      • It is a good idea to change the name of the MultiQC output files
        • If output files with those names are not created, something went wrong!
      • Consult example config files
      • Avoid running multiqc on large complex directory trees.
        • Instead, create a separate directory (or directory tree) only for MultiQC 
          • Copy or link the files you want MultiQC to look for there, and use it as MultiQC's target directory.
        • MultiQC will run much faster and have fewer confusions.

      Add reports from a bowtie2 alignment

      Expand
      titleTip

      To view the file you created in a web browser, it must be copied somwhere where a browser can open it. An easy way to do this is to copy it to your laptop like this, for example, changing the user name from abattenh and scratch path as appropriate.

      Code Block
      languagebash
      # from your laptop:
      scp -p abattenh@ls5.tacc.utexas.edu:/scratch/01063/abattenh/byteclub/multiqc/multiqc_report.html .

      Add a few customizations

      MultiQC reports can be customized by creating a file called multiqc_config.yaml in the directory where you call multiqc.

      Use your favorite text editor to create a a file called multiqc_config.yaml in your $SCRATCH/byteclub/multiqc directory as shown below. This will add report title lines and change the names of the MultiQC output files.

      Code Block
      titlemultiqc_config.yaml
      # Titles to use for the report.
      title: "ATAC-Seq QC Reports"
      subtitle: null
      intro_text: "MultiQC reports for Igor's ATAC-Seq proof-of-concept project."
      report_header_info:
          - Sequenced by: 'GSAF'
          - Job: 'JA17277'
          - Run: 'SA17121'
          - Setup: '2x150'
      
      # Change the output filenames
      output_fn_name: mqc_report.html
      data_dir_name: mqc_report_data
      Expand
      titleCatch up

      To catch up, just stage Anna's pre-made files

      ...

      :

      Code Block
      languagebash
      cd
      mkdir -p $SCRATCH/byteclub/multiqc/
      cd $SCRATCH/byteclub/multiqc/
      rsync -avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/
      bowtie2
      01_fastq/ 
      bowtie2/

      ...

      .

      ...

      After saving this file, remove the previous MultiQC outputs and re-run the program:

      Code Block
      languagebash

      MultiQC will look at all files in this directory looking for report formats it understands. Here, reports that MultiQC will recognize as-is include:

      • <prefix>.flagstat.txt - output from running samtools flagstat 
      • <prefix>.idxstats.txt - output from running samtools idxstats 
      • <prefix>.dupinfo.txt - output from running Picard MarkDuplicates 

       

      mkdir -p $SCRATCH/byteclub/multiqc/
      cd $SCRATCH/byteclub/multiqc
      /
      
      
      rsync
      rm -
      avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/03_bowtie/ .
      Expand
      titleCatch up
      Code Block
      languagebash

      To catch up, just use Anna's pre-made files:

      Code Block
      languagebash

      Now run multiqc again:

      cd $SCRATCH/byteclub/multiqc
      rm -rf mqc_report*
      rf multiqc_data multiqc_report.html
      multiqc .

      If all went well, you should now see a mqc_report.html file and a mqc_report_data  directory. Your newly-generated mqc_report.html file that looks like this report file in should look like this (note the new title and header)http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/0302_bowtiecustom.mqc_report.html, with new sections for Picard and Samtools reports.

      Fix the Picard MarkDuplicates sample name

      Notice there is something odd going on in the  new General Statistics section. We see M Reads Mapped entries for samples called brain_50k_nuclei and brain_5k_nuclei, but % Dups entries for samples named brain_50k_nuclei.sort and brain_5k_nuclei.sort.

      To see where a General Statistics column comes from, hover over the column header. Doing this tells us that the the M Reads Mapped figures came from the samtools flagstat report, while the % Dups comes from Picard MarkDuplicates.

      Take a look at one of the <prefix>.dupinfo.txt files to see what might be going on. Below I've added line breaks to the command line info for clarity.

      .

      Tips for working with the MultiQC configuation file

      Here are a few tips for working with the MultiQC configuration file.

      • Always use spaces (not tabs!) in the multiqc_config.yaml file.
      • Make sure the file is saved with Unix line endings (not Windows or Mac).
      • Pay attention to the output when running multiqc. It will tell you if there are issues parsing the config file.
      • Always delete any previous MultiQC output files before running multiqc
        • While their documentation says existing files will just be updated, I have seen MultiQC get confused when previous reports exist.
      • It is a good idea to change the name of the MultiQC output files
        • If output files with those names are not created, something went wrong!
      • Consult example config files
      • Avoid running multiqc on large complex directory trees.
        • Instead, create a separate directory (or directory tree) only for MultiQC 
          • Copy or link the files you want MultiQC to look for there, and use it as MultiQC's target directory.
        • MultiQC will run much faster and have fewer confusions.

      Add reports from a bowtie2 alignment

      First stage some mm10 bowtie2 alignment data:

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqc
      rsync -avrP /work/01063/abattenh/projects/byteclub/multiqc/bowtie2/ bowtie2/

      Take a look at the contents of the bowtie2 directory. It contains typical output files from running Anna's align_bowtie2_illumina.sh alignment script.

      MultiQC will look at all files in this directory looking for report formats it understands. Here, reports that MultiQC will recognize as-is include:

      • <prefix>.flagstat.txt - output from running samtools flagstat 
      • <prefix>.idxstats.txt - output from running samtools idxstats 
      • <prefix>.dupinfo.txt - output from running Picard MarkDuplicates 

       

      Expand
      titleCatch up

      To catch up, just use Anna's pre-made files:

      Code Block
      languagebash
      mkdir -p $SCRATCH/byteclub/multiqc/
      cd $SCRATCH/byteclub/multiqc/
      rsync -avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/03_bowtie/ .

      Now run multiqc again:

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqc
      rm -rf mqc_report*
      multiqc .

      If all went well, you should now see a mqc_report.html file that looks like this: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/03_bowtie.mqc_report.html, with new sections for Picard and Samtools reports.

      Fix the Picard MarkDuplicates sample name

      Notice there is something odd going on in the  new General Statistics section. We see M Reads Mapped entries for samples called brain_50k_nuclei and brain_5k_nuclei, but % Dups entries for samples named brain_50k_nuclei.sort and brain_5k_nuclei.sort.

      To see where a General Statistics column comes from, hover over the column header. Doing this tells us that the the M Reads Mapped figures came from the samtools flagstat report, while the % Dups comes from Picard MarkDuplicates.

      Take a look at one of the <prefix>.dupinfo.txt files to see what might be going on. Below I've added line breaks to the command line info for clarity.

      Code Block
      titlebrain_5k_nuclei.dupinfo.txt
      ## htsjdk.samtools.metrics.StringHeader
      # picard.sam.markduplicates.MarkDuplicates INPUT=[brain_5k_nuclei.sort.bam] 
      OUTPUT=brain_5k_nuclei.sort.dup.bam 
      METRICS_FILE=brain_5k_nuclei.dupinfo.
      Code Block
      titlebrain_5k_nuclei.dupinfo.txt
      ## htsjdk.samtools.metrics.StringHeader
      # picard.sam.markduplicates.MarkDuplicates INPUT=[brain_5k_nuclei.sort.bam] 
      OUTPUT=brain_5k_nuclei.sort.dup.bam 
      METRICS_FILE=brain_5k_nuclei.dupinfo.txt ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 
      SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false 
      REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false 
      DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES 
      PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates 
      READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> 
      OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 
      MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
      GA4GH_CLIENT_SECRETS=client_secrets.json
      ## htsjdk.samtools.metrics.StringHeader
      # Started on: Wed Jul 05 23:20:57 CDT 2017
      
      ## METRICS CLASS        picard.sam.DuplicationMetrics
      LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED     SECONDARY_OR_SUPPLEMENTARY_RDS  UNMAPPED_READS       UNPAIRED_READ_DUPLICATES        READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION  ESTIMATED_LIBRARY_SIZE
      brain_5k_nuclei 0       28666117        0       16562322        0       12504025        902024  0.436 195     23118972
      
      ## HISTOGRAM    java.lang.Double
      BIN     VALUE
      1.0     1.016471
      ...

      ...

      Code Block
      titlemultiqc_config.yaml
      # Titles to use for the report.
      title: "ATAC-Seq QC Reports"
      subtitle: null
      intro_text: "MultiQC reports for Igor's ATAC-Seq proof-of-concept project."
      report_header_info:
          - Sequenced by: 'GSAF'
          - Job: 'JA17277'
          - Run: 'SA17121'
          - Setup: '2x150'
      
      # Change the output filenames
      output_fn_name: mqc_report.html
      data_dir_name: mqc_report_data
      
      # Ignore these files / directories / paths when searching for reports
      fn_ignore_files:
          - '*.dupinfo.txt'
      
      # Modules that should come at the top of the report
      top_modules:
          - 'generalstats'
          - 'fastqc'
          - 'samtools'
          - 'picard'
      
      # --------------------------------
      # Custom data
      # --------------------------------
      custom_content:
        order:
          - bowtie2_isize_section
          - bowtie2_mapq_section
          - genome_coverage_section
      custom_data:
          bowtie2_isize:
              id: 'bowtie2_isize_section'
              section_name: 'Bowtie2 insert size'
              description: 'distribution for alignments (bowtie2 --local -X2000 --no-mixed --no-discordant)'
              file_format: 'tsv'
              plot_type: 'linegraph'
              pconfig:
                  id: 'bowtie2_isize_plot'
                  title: 'Insert sizes for proper pairs'
                  xlabplot_type: 'Insert sizelinegraph'
                  ylab: 'Count'pconfig:
          bowtie2_mapq:
              id: 'bowtie2_mapqisize_sectionplot'
              section_name: 'Mapping quality'
        title: 'Insert sizes for proper  description: 'distribution for aligned reads before filtering'
      pairs'
                  file_formatxlab: 'tsvInsert size'
              plot_type: 'bargraph'
              pconfigylab: 'Count'
          bowtie2_mapq:
              id: 'bowtie2_mapq_plotsection'
                  titlesection_name: 'Mapping quality scores'
              description: 'distribution for aligned ymax:reads 60000000
          genome_coverage:before filtering'
              idfile_format: 'genome_coverage_sectiontsv'
              sectionplot_nametype: 'Genome coveragebargraph'
              descriptionpconfig: 'of mapped inserts (bedtools genomecov -fs), grouped into coverage count catgories'
      
                  id: 'bowtie2_mapq_plot'
                  file_formattitle: 'tsvMapping quality scores'
              plot_type: 'bargraph'
         ymax: 60000000
          pconfiggenome_coverage:
                  id: 'genome_coverage_plotsection'
                  titlesection_name: 'Position coverage byGenome coverage count category'
              description: 'of mapped inserts logswitch: True
            (bedtools genomecov -fs), grouped into coverage count catgories'
            stacking: null
      sp:
          bowtie2_isize_section: file_format: 'tsv'
              fnplot_type: '*.bowtie2_isizes.tsvbargraph'
          bowtie2_mapq_section    pconfig:
      
              fn: '*.mapq_histogram.tsv'
         id: 'genome_coverage_section:plot'
              fn: 'combined_genomecov.tsv'
       
      # file suffixes totitle: remove'Position whencoverage generatingby sample names...
      extra_fn_clean_exts:coverage count category'
          - type: 'replace'
            patternlogswitch: '.mapq_histogram.tsv'True
          - type: 'replace'
            pattern: '.genomecov.tsv'
      Expand
      titleCatch up

      To catch up, just use Anna's pre-made files:

      Code Block
      languagebash
      mkdir -p $SCRATCH/byteclub/multiqc
      cd $SCRATCH/byteclub/multiqc
      rsync -avrP /work/01063/abattenh/projects/byteclub/multiqc/07_custom_bargraph/ .

      Then the usual...

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqc; rm -rf mqc_report*; multiqc .

      Resulting in a report that includes our new Mapping quality and Genome coverage sections, that should look like this: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/07_custom_bargraph.mqc_report.html.

      Making MultiQC run faster and be less confused

      By default, MultiQC scans all files in the analysis directory you specify. This can take quite a while for complex directory hierarchies with many files that will not be used by MultiQC.

      Additionally, MultiQC can get confused when the same (or similar) data is found in different files, or in different directories.

      To address these issues, it is a good practice to copy everything you want MultiQC to process into a single directory, then either specify just that directory on the multiqc command line (e.g. multiqc for_multiqc), or exclude other directories in the multiqc_config.yaml file.

      ...

      stacking: null
      sp:
          bowtie2_isize_section:
              fn: '*.bowtie2_isizes.tsv'
          bowtie2_mapq_section:
      
              fn: '*.mapq_histogram.tsv'
          genome_coverage_section:
              fn: 'combined_genomecov.tsv'
       
      # file suffixes to remove when generating sample names...
      extra_fn_clean_exts:
          - type: 'replace'
            pattern: '.mapq_histogram.tsv'
          - type: 'replace'
            pattern: '.genomecov.tsv'
      Expand
      titleCatch up

      To catch up, just use Anna's pre-made files:

      Code Block
      languagebash
      cd
      mkdir -p $SCRATCH/byteclub/multiqc
      /for_fastqc ln -s -f ../fastqc cp -p ../bowtie2/*.flagstat.txt . cp -p ../bowtie2/*.idxstats.txt .

      Your for_multiqc directory should now everything we want MultiQC to use:

      Code Block
      brain_50k_nuclei.bowtie2_isizes.tsv
      brain_50k_nuclei.dupmetrics.txt
      brain_50k_nuclei.flagstat.txt
      brain_50k_nuclei.idxstats.txt
      brain_50k_nuclei.mapq_histogram.tsv
      brain_5k_nuclei.bowtie2_isizes.tsv
      brain_5k_nuclei.dupmetrics.txt
      brain_5k_nuclei.flagstat.txt
      brain_5k_nuclei.idxstats.txt
      brain_5k_nuclei.mapq_histogram.tsv
      combined_genomecov.tsv
      fastqc

      ...

      titleCatch up

      ...

      
      cd $SCRATCH/byteclub/multiqc
      rsync -avrP /work/01063/abattenh/projects/byteclub/multiqc/07_custom_bargraph/ .

      Then the usual...

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqc; rm -rf mqc_report*; multiqc .

      Resulting in a report that includes our new Mapping quality and Genome coverage sections, that should look like this: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/07_custom_bargraph.mqc_report.html.

      Making MultiQC run faster and be less confused

      By default, MultiQC scans all files in the analysis directory you specify. This can take quite a while for complex directory hierarchies with many files that will not be used by MultiQC.

      Additionally, MultiQC can get confused when the same (or similar) data is found in different files, or in different directories.

      To address these issues, it is a good practice to copy everything you want MultiQC to process into a single directory, then either specify just that directory on the multiqc command line (e.g. multiqc for_multiqc), or exclude other directories in the multiqc_config.yaml file.

      For example, here we can stage all the reports we want MultiQC to process in our for_multiqc directory:

      Code Block
      languagebash
      mkdir
      cd 
      -p
      $SCRATCH/byteclub/multiqc
      cd $SCRATCH/byteclub/multiqc rsync -avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/08_final/ .

      Run MultiQC again, but this time just point it 

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqc
      rm -rf mqc_report*
      multiqc for_multiqc

      Alternatively, you could exclude the bowtie2 directory entirely via a fn_ignore_dirs section list item in multiqc_config.yaml, like this: 

      Code Block
      fn_ignore_dirs:
          - 'bowtie2'

      In either case, the final report should look just as it did for the previous section: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/08_final.mqc_report.html.

      References

      MultiQC configuration files

      MultiQC custom data support

      Example Reports from Anna

      Below are descriptions of two projects I've assisted with lately using MultiQC to help pull together visualizations assessing experiment quality.

      Tip

      I recommend using Chrome to view MultiQC reports.

      The HTML reports generated by MultQC rely heavily on JavaScript and other dynamic web content scripting tools, and not all browsers support them equally well.

      • These example MultiQC reports below were generated by running the multiqc binary on a command line.
      • After inspecting them locally (by just opening them as files in a web browser), they were copied to a web-accessible location to share with others. Here, that location is Iyer Lab's web-accessible directory on corral 

      Igor Ponomarev ATAC-seq data

      ATAC-seq is a transposon-insertion sequencing method where an engineered, activate transposon inserts in accessible ("open") chromatin. It is considered to be a much simpler protocol to standard DNase-seq, and requires less starting material as well.

      Igor Ponomarev's lab (in WCAAR) performed the ATAC-seq protocol on 5k and 50k cell nuclei from mouse brain, producing 2 paired-end datasets.

      Marcotte lab amplicon sequencing

      The Marcotte lab is working on a deep mutational screening project of a human gene transformed into yeast as an amplicon on a plasmid. Here, the gene is MVK, a gene in the yeast cholesterol biosynthesis pathway. The hsMVK gene is amplified with an error-prone polymerase to produce point mutations. Both the native yeast gene and the human ortholog (with which it shares no sequence similarity) are under on/off promoter control. The idea is to compare the mutations that accumulate in the active hsMVK gene, after many growth cycles, with a background in which the hsMVK gene is present but not active (the yeast MVKis doing the work) to see which mutations are favored or disfavored. As part of this project, Riddhiman Garge produced 19 datasets.

      /for_fastqc
      ln -s -f ../fastqc
      cp -p ../bowtie2/*.flagstat.txt  .
      cp -p ../bowtie2/*.idxstats.txt  .

      Your for_multiqc directory should now everything we want MultiQC to use:

      Code Block
      brain_50k_nuclei.bowtie2_isizes.tsv
      brain_50k_nuclei.dupmetrics.txt
      brain_50k_nuclei.flagstat.txt
      brain_50k_nuclei.idxstats.txt
      brain_50k_nuclei.mapq_histogram.tsv
      brain_5k_nuclei.bowtie2_isizes.tsv
      brain_5k_nuclei.dupmetrics.txt
      brain_5k_nuclei.flagstat.txt
      brain_5k_nuclei.idxstats.txt
      brain_5k_nuclei.mapq_histogram.tsv
      combined_genomecov.tsv
      fastqc
      Expand
      titleCatch up

      To catch up, just use Anna's pre-made files:

      Code Block
      languagebash
      mkdir -p $SCRATCH/byteclub/multiqc
      cd $SCRATCH/byteclub/multiqc
      rsync -avrP --delete /work/01063/abattenh/projects/byteclub/multiqc/08_final/ .

      Run MultiQC again, but this time just point it 

      Code Block
      languagebash
      cd $SCRATCH/byteclub/multiqc
      rm -rf mqc_report*
      multiqc for_multiqc

      Alternatively, you could exclude the bowtie2 directory entirely via a fn_ignore_dirs section list item in multiqc_config.yaml, like this: 

      Code Block
      fn_ignore_dirs:
          - 'bowtie2'

      In either case, the final report should look just as it did for the previous section: http://web.corral.tacc.utexas.edu/iyer/byteclub/multiqc/08_final.mqc_report.html.

      References

      MultiQC configuration files

      MultiQC custom data support