The New Gold Rush

Pharmaceutical Executive

Pharmaceutical Executive, Pharmaceutical Executive-03-01-2002,

Every night at Merck facilities around the world, an automated computer clicks and whirls as it collects proteins and DNA sequences, in an inexorable process that has built a mountainous terabyte of data at four research centers. And that data will double every eight months.

Every night at Merck facilities around the world, an automated computer clicks and whirls as it collects proteins and DNA sequences, in an inexorable process that has built a mountainous terabyte of data at four research centers. And that data will double every eight months.

Perhaps more impressive is that 1,000 Merck scientists, using a web interface, have daily access to that information to perform molecular simulation and modeling in developing new drugs for disorders such as strokes, depression, and chronic pain. The technology they use is called a data warehouse; the technique is data mining.

Data warehousing is the process of storing disparate information such as marketing and sales numbers and DNA sequencing data in a standard form for easy analysis. The data warehouse provides a home for information originating from different applications or external sources. It organizes that information to support management decision making and is ideal for comparative "what if" questions. It provides an enterprise-wide snapshot of all relevant statistics and may contain a variety of information sources, including documents.

A similar technology is a data mart, which focuses on a single business area. It can stand independently or be a subset of a warehouse. It also contains more frequently used information than a warehouse does.

This article focuses on key applications of those technologies and offers general definitions and goals for their use in the pharma industry.

Acquire and Analyze

Data warehouses differ from standard information systems in several ways. First, a warehouse is not an operational system that supports day-to-day business functions. It does not provide or replace operational reports best handled by source systems, nor is it used to store up-to-the-minute details. A data warehouse is a strategic decision support system that aids planning and analysis.

Typically, the goals of a data warehouse are to:

  • acquire information for operational management

  • analyze information for strategic and more effective decision making and planning

  • garner information that it would be impractical or nearly impossible to obtain otherwise

  • achieve a faster path to the information needed for competitive advantage

  • leverage IT investments for better business intelligence.

The use of data warehouses and data marts across all R&D functions grew by more than a third in 2001 to 77 percent. The number of pharma companies that use such data technology will likely grow to 100 percent by 2004.

On average, data warehouses and marts support about 300 users each. About half of all pharma companies use them, most frequently in early discovery, and about 30 percent of the time for clinical trials. Common applications are cleaning and storing information for clinical trials, efficacy and safety profiling, pharmacovigilance, pharmacokinetics, pharmacogenomics, pharmacogenetics, toxicology, patient data, and medical informatics.

There are several advantages to using data warehouses in clinical development, including the ability to integrate information from many studies and to efficiently analyze that information. They also make it easier to track trends and accelerate "go" or "no-go" decisions as well as compile operational data that may be used to assess efficiency and reduce waste. The goal of a data warehouse can be summed up in the phrase, "All the information employees need to do their jobs should be on their desktops."

Another area in which warehouses create significant benefits is pharmacovigilance. A global drug-safety warehouse supports surveillance through identification of trends in adverse drug reactions or signal detection, compliance in both individual patient periodic regulatory reports, and reconciliation of cases with clinical development. The latter task consumes significant pharma company resources in both drug safety and clinical operations and amounts to duplication of effort.

Other data warehouse uses include:

  • protocol simulations

  • predictive modeling - linking genomic attributes to clinical results

  • product-level research costs

  • product lifecycle analysis

  • key feeder system for a portal.

Data warehouses also work with, rather than replace existing legacy systems. A warehouse can reduce systems overload caused by excessive reports and queries and push operational systems toward data-quality standards. The warehouse's position as a relatively autonomous complementary system can also provide a practical solution for merging companies that must integrate disparate computing environments.

Challenges exist, of course. Unfortunately, as with much of IT, return on investment for data warehouses is difficult to measure. Data warehouses involve complex design processes and may be expensive and time-consuming to implement. Companies must ask themselves, "How much are better decisions worth?" Also, because data marts focus on specific business areas, there's the risk that "stovepipe" applications will arise that have little or no relation to the rest of the enterprise.

Pattern Identification

Data mining offers a way to extract new insights from warehouses and other databases. It looks for hidden patterns and helps identify previously unknown relationships within the data. Although it is based on sound statistical methods, it differs from traditional statistical analysis in that data mining is a discovery-based approach while the former is verification-based.

In other words, a clinical trial protocol is based on a hypothesis and a predefined statistical plan that describes how to analyze the data to verify or refute the hy-pothesis. In contrast, data mining's discovery-based approach automatically uses pattern matching and other algorithms to identify interrelationships. And, although traditional statistical analysis is limited by the analysis plan, data mining reviews multidimensional data relationships concurrently and highlights dominant and exceptional patterns for further exploration. Thus, when applied appropriately, data mining is complementary to statistical analysts.

There are many methods of data mining:

Influence-based.Complex in-formation in large databases is scanned for cause-and-effect relationships. Among the data collected from all clinical trials of one anti-arthritic drug, which factors appear to have the greatest influence on pain reduction?

Affinity-based.Similar to influence-based mining, it looks for hierarchical associations of data and defines underlying rules. Which side effects are likely to occur at the same time as drug A? Given its mechanism of action, does that offer any surprises?

Time-delay. The initial data set is small, and patterns are confirmed or rejected as additional data are interpreted. Are there hidden temporal patterns in the bioavailability studies that confirm or refute the drug metabolism model for therapy B?

Trends-based. This method highlights changes in data elements over time. What are the characteristics of patients who experienced a significant reduction in LDL cholesterol in five weeks versus ten weeks of treatment?

Comparative. The process looks for dissimilarities in data collected under similar conditions. For subjects whose blood pressure was lowered by drug C, what genotypes are present in Gene 21 compared with Gene 5?

Predictive. Based on trends identified from current information, the method simulates future data sets. What is the likelihood of finding at least 1,200 subjects who meet those trial inclusion and exclusion criteria within 18 months?

On The Market

Pharsight, a data-mining software and services vendor, offers a computer-aided trial design tool kit that simulates how patient variability may affect a trial's outcome. It also tests assumptions using mathematical and statistical models and a large data warehouse. Its warehouse was developed partly from sponsor data and partly from data it had permission to access from several institutions conducting large clinical trials. The tool is used by 12 of the top 20 pharmaceutical companies, including AstraZeneca, which uses Pharsight to design all its trials in one therapeutic area.

Another vendor, Spotfire, offers a web-based decision analytics tool that interactively and graphically explores and analyzes large chunks of data. Used by 25 leading pharma and 80 biotech companies, the tool aids collaboration among individuals or groups throughout an organization. It also provides a common platform from which researchers can access, analyze, and share information, allowing for faster, more informed decisions and accelerating drug discovery and development.

In clinical studies at Procter & Gamble, drug development teams perform analyses on large healthcare data warehouses containing many critical variables and risk factors. The company uses MineSet, a data-mining software tool to predict drug efficacy. Teams test theories about how risk factors such as blood pressure combined with a therapeutic agent may affect the disease state.

The software finds associations among the data and predicts outcomes through "what-if" scenarios. Major benefits of the technique include the ability to seamlessly access corporate databases on a variety of platforms such as Oracle, Sybase, and Informix and the increased accessibility to warehouse information that was previously limited to statisticians. The resulting turn-around is also much faster, allowing time to explore a greater number of theoretical scenarios.

Other examples include Glaxo SmithKline's use of HelixTree from Golden Helix to understand complex gene/gene, gene/environment, and environment/environment interactions. The technology approach can help identify subpopulations of patients who respond well to treatment or, more important, patients who do not tolerate the therapy well. That information permits sponsors to better manage their risk and protect patients' safety. Merck-Medco is mining its one-terabyte-plus data warehouse to uncover links between illnesses and known treatments and to identify trends that help pinpoint the drugs that are most effective for different types of patients. The results are more effective and less costly treatments. According to the company, Merck-Medco's data-mining project has helped customers save an average of 10-15 percent on their prescriptions.

For pharma and biotech companies, a comprehensive data warehouse provides the foundation for communicating and leveraging information throughout the enterprise. Nearly all companies recognize the importance of data warehouses and are building them. With increasingly more data available in the public domain and more partner relationships, data standards have become an urgent industry priority. Many pharma companies, contract research organizations, and vendors are supporting the Clinical Data Interchange Standards Consortium to that end. Moreover, data mining's ability to find previously unrecognized patterns helps reduce research time while providing more return on research investments. The data-mining rush is on.