MAINTENANCE OUTAGE: The University Wiki Service will undergo maintenance on September 26th, 2017, from 6 pm to 8 pm. During this 2 hour time period https://wikis.utexas.edu may be unavailable. Users are advised to save content locally that may be needed during this time and to otherwise save all edits as unsaved work may be lost. Please contact the UT Service Desk at 512-475-9400 for any questions.
The University Wiki Service has upgraded the Confluence Server software, from version 5.9.14 to 5.10.8. Please refer to the knowledge base article, KB0015891, for a high level summary of upgrade changes. Thank you!
Skip to end of metadata
Go to start of metadata

If you see in your bam file, that reads are piling up with same start end end coordinates, these may be pcr duplicates, which should be removed/flagged in your bam files.

Picard MarkDuplicates is the preferred tool for this, but it is very fickle with the type of bam file it will work on.

Samtools can be an easier option to start with for removing potential pcr duplicates in your data.

1. (OPTIONAL) samtools fixmate

Because samtools rmdup works better when the insert size is set correctly, samtools fixmate can be run to fill in mate coordinates, ISIZE and mate related flags from a name-sorted alignment.

samtools fixmate <in.nameSrt.bam> <out.bam>

2. samtools rmdup -sS <input.srt.bam> <out.bam>
Remove potential PCR duplicates: if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality. In the paired-end mode, this command ONLY works with FR orientation and requires ISIZE is correctly set. It does not work for unpaired reads (e.g. two ends mapped to different chromosomes or orphan reads).

OPTIONS:
-s Remove duplicate for single-end reads. By default, the command works for paired-end reads only.
-S Treat paired-end reads as single-end reads.

default:
samtools rmdup <input.bam> <output.bam>
or
samtools rmdup -s <input.bam> <output.bam>

Load the output.bam file into IGV to check on areas which showed evidence of pcr duplicates before.

  • No labels