Accelerating Rare Disease Drug Development with NLP: Three Use Cases

December 7, 2020
Jane Z. Reed

Utilizing natural language processing technologies to improve search capabilities and advance the development of therapies.

Drug development for rare diseases presents a host of challenges, primarily because the number of affected patients is so limited for each rare disease. A rare disease is any disease, disorder, illness, or condition that affects fewer than 200,000 people in the U.S. Of the approximately 7,000 rare diseases that have been identified in the U.S., more than 90 percent do not have a treatment approved by the U.S. Food and Drug Administration. However, rare diseases have a huge impact on our healthcare systems, as around one in 17 people will be affected by a rare disease at some point in their lives.

Challenges come at all stages in drug development. For example, clinical researchers are often unable to conduct randomized trials for certain rare diseases because patient populations are too small. Even when there are enough patients for a randomized clinical trial, rare diseases may present the problem of disease heterogeneity; that is, affected patients exhibit a wide range of differences in symptoms, severity, and progression.

The scarcity of data presents additional challenges for drug discovery and development. To advance understanding of rare diseases, researchers at pharmaceutical companies have traditionally searched large volumes of published data to find nuggets of information that link rare diseases with specific genes and gene variants. This can be a tedious task when done manually. More recently, however, utilizing natural language processing (NLP) technologies, pharma companies have improved their search capabilities, advancing the development and delivery of life-enhancing drugs.

Following are three use cases that showcase how pharmaceutical companies are leveraging NLP to accelerate the development of drugs to treat rare diseases.

Identifying specific gene mutations

Biopharmaceutical company Takeda uses NLP tools to examine gene-disease associations to assess potential disease severity for patients with Hunter Syndrome, a rare genetic disorder caused by a missing or malfunctioning enzyme that can lead to permanent damage, affecting appearance, mental development, organ function and physical abilities. Takeda developed an enzyme replacement therapy drug that showed potential to help young patients with the severe forms of Hunter Syndrome (particularly impacting brain function), but the company needed to identify patients with the greatest potential to benefit from the invasive therapeutic delivery.

Takeda researchers developed a suite of NLP queries to search scientific literature (abstracts and full text papers) for any associations between the relevant gene (iduronate-2-sulfatase, IDS), mutations, and cognitive impairment. The queries identified and classified every published mention of patients with Hunter Syndrome or related symptoms, in addition to associated gene variants and mutations.

These queries yielded results that enabled researchers to identify the location of specific mutations within the IDS gene and associate specific gene mutations with specific phenotypes. Takeda’s use of NLP text mining produced excellent results that matched or exceeded results from structured genetic databases. These results enable clinicians to make data-driven decisions, by providing the understanding of which genetic variants in their pediatric patients would be likely to lead to severe cognitive impairment.

Classifying candidate target genes

Other drug developers have used NLP to search significant volumes of published literature for small, valuable chunks of information on rare disease patients to uncover associations between different genes and gene variants. For example, Agios Pharmaceuticals developed a virtual portfolio for orphan diseases by using NLP to systematically map the space around inborn errors of metabolism and link diseases to targets. Agios’ rare disease drug discovery program was founded on an understanding of the space gleaned from NLP queries that the company used to identify candidate diseases and candidate target genes.

Building a rare disease knowledgebase for drug repurposing

Sanofi wanted to improve its search capabilities for drugs that could be used for rare disease patients. For this indication expansion or drug repurposing work, Sanofi needed a rapid systematic method to discover target-indication links, both direct and indirect. The traditional approach to find new target-indication proposals is generally an ad hoc process that relies on expert knowledge and experiment observations, which are usually limited and time consuming.

With NLP text-mining, the Sanofi team built an internal Rare Genetic Diseases Knowledgebase (RGDKb). NLP queries were developed to capture and integrate all the available information around drug targets from scientific literature, including causal mutations associated with diseases, the underlying causal pathways, cell types and clinical phenotypes, and associations with known drugs. These data were integrated with information from rare and genetic disease databases (Orphanet and OMIM) into RGDKB, and visualized in a dashboard, RareView, that illustrates the relationships between the diseases, underlying causal genes, pathways, and drug compounds by network view.

The approach of combining text mining with network visualizations allowed Sanofi researchers to quickly identify new indication opportunities from target-disease pairs that have various commonalities based on broad scientific, medical and strategic values, and the approach has been applied in several disease areas.

NLP – a potential rare disease game-changer

With small patient populations and a shortage of data, pharmaceutical companies and researchers will continue to struggle developing drugs for rare diseases. NLP, however, represents a potential game-changer, relieving pharmaceutical researchers of the need to spend hours scouring sometimes-obscure medical literature for scarce jewels of data linking rare diseases with certain genes or gene variants. With NLP, researchers can significantly advance their search capabilities, speeding the drug development process and accelerating the market availability of life-enhancing therapies for patients with rare diseases.

Jane Z. Reed, Director, Life Sciences, Linguamatics, an IQVIA company