...
For me the resulting 16 attributes are as follows (they may have a different order for you). I've numbered them below for convenience.
Code Block |
---|
title | Converted BED attributes |
---|
|
1. chrom 2. start 3. end 4. featureType 5. length 6. strand
7. source 8. frame 9. Alias 10. ID 11. Name 12. Note
13. Ontology_term 14. dbxref 15. gene 16. orf_classification |
The final transformation is to do a bit of re-ordering, dropping some fields. We'll do this with awk, because cut can't re-order fields. While this is not strictly required, it can be helpful to have the critical fields (including the gene ID) in the 1st 6 columns. We do this separately for the header line and the rest of the file so that the BED file we give bedtools does not have a header (but we know what those fields are). We would normally preserve valuable annotation information such as Ontology_term, dbxref and Note, but drop them here for simplicity.
Expand |
---|
|
Code Block |
---|
| mkdir -p $SCRATCH/core_ngs/bedtools
cd $SCRATCH/core_ngs/bedtools
cp $CORENGS/yeast_rnaseq/*.gff .
cp $CORENGS/yeast_rnaseq/sc_genes.converted.bed . |
|
Code Block |
---|
language | bash |
---|
title | Re-order the final BED fields |
---|
|
head -1 sc_genes.converted.bed | sed 's/\r//' | awk '
BEGIN{FS=OFS="\t"}{print $1,$2,$3,$10,$5,$6,$15,$16}
' > sc_genes.bed.hdr
tail -n +2 sc_genes.converted.bed | sed 's/\r//' | awk '
BEGIN{FS=OFS="\t"}
{ if($15 == "") {$15 = $10} # make sure gene name is populated
print $1,$2,$3,$10,$5,$6,$15,$16}
' > sc_genes.bed |
...
Exercise: How many features have non-zero overlap counts? (The count from bedtools multicov is in field 9)
Expand |
---|
|
Code Block |
---|
| cut -f 9 yeast_mrna_gene_counts.bed | grep -v '^0' | wc -l
# or
cat yeast_mrna_gene_counts.bed | \
awk '{if ($9 > 0) print $9}' | wc -l |
Most of the genes (6235/6607) have non-zero read overlap counts. |
...