NLP, Genomics/Pharmacogenomics, PheWAS, Genetic association studies

Methods to identify gene-disease associations primarily rely on clinical trials or observational cohorts and, more recently, Electronic Medical Record-linked DNA Biobanks.  At Vanderbilt, we have used an EMR-linked DNA biobank called BioVU to derive case and controls populations using data within the EMR to define clinical phenotypes.  Genetic data for these EMR-linked association studies are redeposited into BioVU for future EMR-linked studies.  This has opened the possibility of "reverse GWAS" or "Phenome-wide association studies" (PheWAS)

This package contains methods for performing PheWAS. Please contact if you encounter any errors or apparent bugs. The documentation is done natively in R. The command ?PheWAS once the package is loaded will direct you to the package description, including references to each function and an example. The command vignette("PheWAS-package") will display the package vignette with further "How to's".

An accurate computable representation of food and drug allergy is essential for safe healthcare. We developed and evaluate a SQL-based method to map free-text allergy/adverse reaction entries to structured entries, using RxNorm as the target vocabulary.  The system was developed and tested using a perioperative management system using a training set of 24,599 entries and a test set of 24,857 entries from Vanderbilt University.
MEDI (MEDication Indication) is an ensemble medication indication resource for primary and secondary uses of electronic medical record (EMR) data.  MEDI was created based on multiple commonly used medication resources (RxNorm, MedlinePlus, SIDER 2, and Wikipedia ) and by leveraging both ontology and natural language processing (NLP) techniques. 
We replicated known genetic associations for five diseases. We genotyped the first 10,000 samples accrued into BioVU (the Vanderbilt EMR-associated DNA biobank) for twenty-one loci were associated with five common diseases (reported odds ratios 1.14-2.36) in at least two previous studies. We developed automated phenotype identification algorithms that used NLP techniques (to identify key findings, medication names, and family history), billing code queries, and structured data elements (such as laboratory results) to identify cases (n=70-698) and controls (n=808-3818).

MedEx process free-text clinical records to recognize medication names and signature information, such as drug dose, frequency, route, and duration.  It uses a context-free grammar and regular expression parsing to process free text clinical notes.  After finding medication information, it maps to RxNorm and UMLS concepts at the most specific match it can find (e.g., medication name + strength would be preferred to medication name alone). It has been applied in 2009 i2b2 Medication Extraction challenge, placing second, and formally evaluated on Vanderbilt discharge summaries and clinical notes.

Executable versions available for Linux and Windows below.

Syndicate content