How do I search in PharmGKB?

Overview

A search query is broken up into terms and operators. There are two types of terms: single terms and phrases. A single term is a single word such as gene or pharmacodynamics. A phrase is a group of words surrounded by double quotes such as "functional assays".

Multiple terms can be combined together with boolean operators to form a more complex query.

Term Modifiers

Query terms can be modified to provide a wide range of searching options.

Single and multiple character wildcard searches are supported. To perform a single character wildcard search use the "?" (question mark) symbol. To perform a multiple character wildcard search use the "*" (asterisk) symbol.

The single character wildcard search looks for terms that match those with the single character replaced. For example, to search for text or test you can use the search:

te?t

The multiple character wildcard search looks for zero or more characters. For example, to search for test, tests or tester, you can use the search:

test*

You can also use a wildcard search in the middle of a term:

te*t

Note: You cannot use a "?" or "*" symbol as the first character of a search.

Fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm, are supported. To do a fuzzy search use the "~" (tilde) symbol at the end of a single word term. For example to search for a term similar in spelling to assay use the fuzzy search:

assay~

This search will find terms like essay and assays.

Support for finding words that are a within a specific distance away is also available. To do a proximity search use the "~" (tilde) symbol at the end of a phrase. For example to search for TPMT and mercaptopurine within 10 words of each other in a document use the search:

"TPMT mercaptopurine"~10

Search results are ordered based on the calculated relevance of the matching document. The importance of a term in determining the relevancy of a matching document can be boosted using the "^" (caret) symbol with a boost factor at the end of a term.

For example, if you are searching for TPMT mercaptopurine and you want the term TPMT to be more relevant you could use the search:

TPMT^4 mercaptopurine

You can also boost phrases:

"mutant alleles"^4 "standard doses"

By default, the standard boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).

Boolean Operators

Boolean operators allow terms to be combined through logic operators. The operators AND, OR, NOT and the "+" (plus) and "-" (minus) symbols are supported as Boolean operators. (Note: Boolean operators must be ALL CAPS).

The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol "||" can be used in place of the word OR.

To search for documents that contain either "mutant alleles" or just alleles use the query:

"mutant alleles" alleles

or

"mutant alleles" OR alleles

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol "&&" can be used in place of the word AND.

To search for documents that contain "mutant alleles" and doses use the query:

"mutant alleles" AND doses

The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The "!" (exclamation point) symbol can be used in place of the word NOT.

To search for documents that contain "mutant alleles" but not doses use the query:

"mutant alleles" NOT doses

Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:

NOT "mutant alleles"

Required Operator

The required operator (the "+" (plus) symbol) requires that the term after the "+" symbol exist somewhere in a single document.

To search for documents that must contain "mutant alleles" and may contain doses use the query:

+"mutant alleles" doses

Prohibit Operator

The prohibit operator (the "-" (minus) symbol) excludes documents that contain the term after the "-" symbol.

To search for documents that contain "mutant alleles" but not doses use the query:

"mutant alleles" -doses

Grouping

Parentheses can be used to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

To search for either "mutant alleles" or doses and TPMT use the query:

("mutant alleles" OR doses) AND tpmt

This eliminates any confusion and makes sure that TPMT must exist and either "mutant alleles" or doses may exist.

Escaping Special Characters

It is possible to escape special characters that are part of the query syntax. The current list special characters are:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these character use the "\" symbol before the character. For example to search for (1+1):2 use the query:

\(1\+1\)\:2