BioMart Perl API

BioMart is a great service for looking up just about any bit of info from a wide array of databases:

Perl API's let you automate these lookup tasks so you can script them or run large numbers of queries.

For more details, see the biomart documentation.

The Fourierseq scripts "ensg2ncbiprot" and "ensg2entrezgene" use this API - see them for examples too.

Note that for the fourierseq implementation, you must do two things in your scripts:
a) Explicitly name the package location using:

use lib '/home/scott/Downloads/biomart-perl/lib';

at the start of your code and
b) Explicitly reference the central biomart repository as the source of info (we don't have a local mart):

my $confFile = (grep { m/biomart-perl\/lib+$/ } @INC)[0]."/../conf/apiExampleRegistry.xml";
die ("Cant find configuration file $confFile\n") unless (-f $confFile);

again, at the top of your script, just after (a).

An example script using Biomart API's to fetch gene info given an Ensemble gene ID:

#!/usr/bin/perl
#An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5
use strict;
use lib '/home/scott/Downloads/biomart-perl/lib';

my $confFile = (grep { m/biomart-perl\/lib+$/ } @INC)[0]."/../conf/apiExampleRegistry.xml";
die ("Cant find configuration file $confFile\n") unless (-f $confFile);

use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

# my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
						# http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');


	$query->setDataset("hsapiens_gene_ensembl");
	$query->addFilter("ensembl_gene_id",
["ENSG00000224813","ENSG00000248149","ENSG00000239664","ENSG00000237491","ENSG00000241768","ENSG00000241180"]);
	$query->addAttribute("ensembl_gene_id");
	$query->addAttribute("protein_id");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################


############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################

Version 0.7 is installed on Fourierseq as of 6/16/11 by SPHS.

  • No labels