PharmGKB:  The Pharmacogenetics and Pharmacogenomics Knowledge Base
Search PharmGKB:?
 

PharmGKB Abstract, 2003

Pharmacogenetics and Pharmacogenomics Knowledge Base

Goals

The goal of this Pharmacogenetic Research Network (PGRN) program is to develop, implement, and disseminate a public genotype-phenotype resource focused on pharmacogenetics and pharmacogenomics. In the short term, this resource is designed to facilitate basic research. In the long term, it will impact how medicine is delivered. This resource will serve a broad community including geneticists, molecular biologists, pharmacologists, physicians, policy makers and the lay public. In addition, the PharmGKB team performs research in support of this goal, and to catalyze scientific discovery in both pharmacogenetics/pharmacogenomics and biomedical informatics.

Progress

The PharmGKB is a pharmacogenetic and pharmacogenomic knowledge base built to support the representation, storage, analysis and dissemination of genotype and phenotype data relevant to variation in the response to drugs. PharmGKB currently contains a rich set of genetic sequence data in which genes of pharmacogenetic interest have been studied for polymorphism discovery and characterization in populations. It also has a small but growing collection of phenotype data sets that are, in general, related to particular genotypes. The PharmGKB includes a curated set of literature references to key gene-drug interactions of relevance to pharmacogenetics. The key concepts around which the PharmGKB is built (and supports search and browsing) include: genes, drugs, diseases and category of phenotypic information. PharmGKB uses standard nomenclatures for these classes, and has defined a standard format (using XML Schema) for exchanging detailed genetic polymorphism information. Structuring phenotype information and defining standards for this information is a primary current challenge to the network.

The previous release of PharmGKB (2.X) has been modified in response to multiple focus groups and user interaction sessions in the Spring and Summer of 2003. The new release (3.X) does not remove any of the existing functionality. Additions to the functionality include a more intuitive user interface, genotype display pages with variants, population frequencies, graphical gene viewer and updated links from gene pages to external resources including LocusLink, mandatory login for all users to see any genotype or phenotype data that is linked to individual identifiers for HIPAA compliance, full documentation of Genotype XML and phenotype submission methods, online "pop-up" help, automated dbSNP submissions, semi-automated Pharmacological Reviews report generations, validation of data upon submission, submission of genotype (limited), phenotype and literate via Excel spreadsheet, phenotype data stored in a relational database and linked to genotype data so that all genotype-phenotype associates are made and downloadable in tabular format for local analysis. The first pathway (irinotecan) user interface is being prototyped with live links to genes, drugs, and PharmGKB data for user feedback.

To support our curated literature references, we are developing natural language processing methods to identify important pharmacogenetics data sets and extract pharmacogenetics information from the literature. Specifically, we are developing a method to scan Medline to find articles that contain pharmacogenetics data or which are studies relating to gene-drug associations. We are developing classifiers to automatically assign categories of pharmacogenetics evidence to these articles. The goal is to automate the process of acquiring literature submissions that support gene-drug relationships for PharmGKB. An additional goal is to find important studies containing large pharmacgenetics data sets so that we may contact the authors and request these data be submitted to PharmGKB.

We are also developing innovative methods to improve the quality of data in PharmGKB by applying rule-based inferencing methods to data validation. There are complex relationships among the data fields; thus, simple field-based validation rules can only catch a small percentage of all possible data errors. We are using Algernon to represent and process data validation rules that are executed on data submitted to PharmGKB. Executing these rules will identify suspect data, allowing us to find and correct data errors and improve data quality in PharmGKB.

Conclusions

The PharmGKB project has reached a crucial stage in which we are confident that genotype and phenotype information can be collected, stored and displayed for users. This provides a basic level of functionality that will be useful to the pharmacogenetic community. Upon this framework, we are planning our research in two areas: direct extensions to the current user interface, and longer term informatics approaches for extending and maintaining the database. The next 6 month period of time will stress 1) the addition of pathway graphics (and underlying computational representations of these) that are searchable, integrated with our data sets, and represent current knowledge of each pathway, 2) the continued and accelerated collection of data from the network sites, using tools that can be run locally to enter and validate data before sending to the PharmGKB, 3) the additional of analytic tools to support genotype-phenotype correlation studies, and 4) the integration of current text and inference research into useful production functionality. In this period, we will also begin to plan for data mining experiments upon the knowledge base, with the goal of discovering associations missed in previous analyses.

The PGRN is financially supported by grants from NIGMS, NHLBI, NHGRI, NIEHS, NCI, and NLM within the NIH, HHS. PharmGKB is managed at Stanford University. This work is supported by the NIH/NIGMS Pharmacogenetics Research Network and Database (U01GM61374). ©2001-2008 PharmGKB.