Page History

...

For me the resulting 16 attributes are as follows (they may have a different order for you). I've numbered them below for convenience.

Code Block

title	Converted BED attributes

 1. chrom          2. start   3. end     4. featureType  5. length  6. strand
 7. source         8. frame   9. Alias  10. ID          11. Name   12. Note
13. Ontology_term 14. dbxref 15. gene   16. orf_classification

The final transformation is to do a bit of re-ordering, dropping some fields. We'll do this with awk, because cut can't re-order fields. While this is not strictly required, it can be helpful to have the critical fields (including the gene ID) in the 1st 6 columns. We do this separately for the header line and the rest of the file so that the BED file we give bedtools does not have a header (but we know what those fields are). We would normally preserve valuable annotation information such as Ontology_term, dbxref and Note, but drop them here for simplicity.

Expand

title	Setup (if needed)

Code Block

language	bash

mkdir -p $SCRATCH/core_ngs/bedtools
cd $SCRATCH/core_ngs/bedtools
cp $CORENGS/yeast_rnaseq/*.gff .
cp $CORENGS/yeast_rnaseq/sc_genes.converted.bed .

Code Block

language	bash
title	Re-order the final BED fields

head -1 sc_genes.converted.bed | sed 's/\r//' | awk '
 BEGIN{FS=OFS="\t"}{print $1,$2,$3,$10,$5,$6,$15,$16}
 ' > sc_genes.bed.hdr

tail -n +2 sc_genes.converted.bed | sed 's/\r//' | awk '
 BEGIN{FS=OFS="\t"}
 { if($15 == "") {$15 = $10} # make sure gene name is populated
   print $1,$2,$3,$10,$5,$6,$15,$16}
 ' > sc_genes.bed

...

Exercise: How many features have non-zero overlap counts? (The count from bedtools multicov is in field 9)

Expand

title	Answer

Code Block

language	bash

cut -f 9 yeast_mrna_gene_counts.bed | grep -v '^0' | wc -l
# or
cat yeast_mrna_gene_counts.bed | \
  awk '{if ($9 > 0) print $9}' | wc -l

Most of the genes (6235/6607) have non-zero read overlap counts.

...

Page tree

Versions Compared

Old Version 64

New Version 65

Key