Health Economics: Data Mining

September 1, 2007

Pharmaceutical Executive

Volume 0, Issue 0

It's certainly not headline news that these are tough days for the pharmaceutical industry. More than $60 billion in revenue from blockbuster drugs will evaporate as these products go generic over the next five years, while the productivity of clinical development has hit a particularly rough patch. Even in the companies with relatively strong pipelines, many of the new treatments are biotech products acquired out of house. The expected authorization of biogenerics will squeeze profits only further.

IT'S CERTAINLY NOT HEADLINE NEWS THAT THESE ARE TOUGH days for the pharmaceutical industry. More than $60 billion in revenue from blockbuster drugs will evaporate as these products go generic over the next five years, while the productivity of clinical development has hit a particularly rough patch. Even in the companies with relatively strong pipelines, many of the new treatments are biotech products acquired out of house. The expected authorization of biogenerics will squeeze profits only further.

This shift toward biotech products portends a general evolution in healthcare. Such products are often stunningly effective. They are also typically targeted to narrow patient populations. The movement toward targeted therapies for niche markets intersects with a second major trend: consumer-directed healthcare. Because consumers are paying for more and more of their healthcare, patients who need treatment with biologics can face out-of-pocket costs stretching into thousands of dollars. This nexus is making health economics and data mining increasingly important to drugmakers eager to show the value of a new drug.

A few years ago, pharma's use of health economics was largely confined to modeling cost-effectiveness for payers—differentiating a product from the competitors'. But database analyses are now being applied much more widely: to gain knowledge about the size of potential markets, to establish a pricing strategy, to develop protocol design and evaluation, and to model cost-for-value based on a drug's actual clinical performance.

Payers, consumers, and regulators all need answers to two basic questions: Is the drug safe in real-world populations? And is the drug effective in real-world populations? The answers, coupled with information about competing treatments and the product's price, largely determine its market success or failure. And given their higher price, biologics often have to meet even higher standards to be considered cost-effective from a payer's standpoint.


At a drug's launch, real-world data on effectiveness and safety are, of course, not available. Consequently, the health-economics evidence presented to payers generally involves modeling the cost-effectiveness and budget impact compared with other treatments for the same indication. To get technical for a moment, cost-effectiveness models estimate the incremental cost associated with each unit of clinical benefit obtained from treating patients with drug A vs. drug B. Budget-impact models estimate total healthcare costs associated with offering the new therapy. Both models use data from clinical trials.

Once the drug hits the market, real-world data on healthcare utilization—such as doctor's-office visits, emergency room visits, hospitalizations, and concomitant medication use—begins to accumulate in retrospective databases. When a sufficient period of time has passed (say, a year), statistical comparisons often are made between patients who started treatment on the new drug and those on one or more competing treatments. These analyses enable drugmakers to track product uptake, medication-refill persistency, and patient healthcare utilization. Payers are often especially interested in the latter data because they reflect real-world use patterns—the evidence payers are constantly evaluating to revisit coverage and reimbursement decisions.

Despite their usefulness, however, these analyses of retrospective data are limited because they do not fully control for all sorts of confounding issues that can lead to incorrect conclusions. One concern is that the results may be affected by "launch bias"—the tendency of physicians to prescribe new drugs to patients for whom other treatments have stopped working. These patients, who are likely to be sicker than average, may therefore have poorer outcomes. As a result, such comparisons may result in the new drug's appearing to be less effective or safe than the competition.

Launch bias would not be a problem if it were possible to factor in disease severity, but this information is rarely available. For example, when comparing antidepressants by way of retrospective data, medical claims generally contain diagnosis codes but not clinical measures of severity. Statistical methods that reduce such biases exist, but a better (and pricier) solution is to design studies that collect the relevant data in the first place.

In addition to the real-world analysis of patient outcomes, considerable value can be gained from using medical-claims databases to conduct market assessments when products are in Phase II or III. For example, a developer of a new central nervous system agent may want to compare the numbers of patients with major depressive disorder with those who have obsessive compulsive disorder, social phobias, and anxiety disorders—as well as the overlap of these diagnoses. These data can inform strategic decisions about pricing and the prioritization of indications to pursue during product development.

Claims data can also answer such questions as: What are the patterns of medical co-morbidities of patients with different conditions? How are these patients currently being treated? What are the indicators of treatment nonresponse (medication switching, say, or emergency room visits and hospitalizations)? What are the costs associated with these measures of unmet need?

Drugmakers must also demonstrate a product's real-world safety. One of the inherent benefits of targeted therapies is that their "targeted-ness" often limits the size and diversity of patients prescribed the treatment. And when coupled with a diagnostic, the exposed population can be narrowed even further to the group most likely to respond. This personalized-medicine approach reduces the risk and cost of demonstrating real-world safety. However, reliable diagnostic tests and biomarkers are not yet available for most treatments.

Retrospective databases are most accurate at detecting safety signals when a particular treatment cohort is statistically matched with a comparison group. This approach controls for baseline rates of safety events and, therefore, identifies adverse events and their markers that are higher or lower than expected. When possible, these differences should be confirmed with medical-chart reviews for the patients or with more carefully controlled, randomized clinical trials.

Even though all drugs are approved based on effectiveness and safety data from clinical trials, it is worth noting that the withdrawal of a drug from market is most often based on evidence from retrospective databases or observational data collected after launch. It is critically important that policy decisions about a drug's safety be made on the best scientific evidence, and that involves controlling for the baseline risk of adverse events in the entire population that is a candidate for that treatment.


Medical-claims databases are one of the richest sources of real-world retrospective data to support health-economics and safety analyses of drugs. Yet despite their tremendous detail, they are not designed primarily for research purposes. Claims databases exist because they are the record of the financial transactions between healthcare providers and payers (insurance companies, health plans, and government agencies) involving service charges and reimbursement.

The chart "Enhanced Medical Claims Database" (below) illustrates various components of patient data and how they fit together. In the first column are administrative data on health-insurance enrollment. These data are fairly basic, including the unique patient identifier (this appears on every claim, enabling them all to be linked together into a longitudinal record for the patient), demographic information (typically, gender and age), and dates of insurance coverage. Although limited in content, these data are vital for calculating rates of illness and treatments, as well as for distinguishing between a patient's lack of healthcare utilization and the inability to observe utilization because of health-insurance cancellation.

Data from prescription-drug claims are listed in the second column. Note that while all the standard drug-related information (name, dose, etc.) is available, no patient diagnosis is listed. To tie the prescription to a particular indication, the drug claim must be linked to the diagnosis code on the patient's medical claim on the same approximate date.

In the third column are medical claims. In addition to diagnosis codes, these inpatient and outpatient claims contain information about procedure codes and the utilization of healthcare services that provide most of the clinical content available from retrospective databases, such as indicators of medical comorbidities.

The next two columns show data less commonly found in retrospective databases. Column four lists laboratory test results, which typically contain the claims for lab tests performed but rarely the lab values associated with these tests. Such results can be good measures of both a patient's response to treatment and of the severity of illness in certain diseases, including diabetes (HbA1c levels) and cardiovascular disease (total, low, and high cholesterol; lipid levels). They are also increasingly valuable for diagnostic testing.

The fifth column lists a much broader set of demographic and socioeconomic variables than those found in a typical claims database. Linking retrospective data from large payer organizations to information of the sort that credit agencies collect on individuals enables a much more robust analysis of medication-refill behaviors. In particular, income and net worth provide measures of the ability of consumers to pay.


Fewer than 5 percent of commercial-health-plan members are treated with biologics. These products are mainly approved for complex conditions with limited treatment options, such as cancer or multiple sclerosis. However, the importance of biologics has been expanding rapidly—to the point where they comprise half of all products in late-stage development. And many of these new drugs are targeted at conditions—diabetes, osteoporosis, rheumatoid arthritis—with much higher prevalence rates.

This flood of costly new biologics has payers very concerned. Despite the potential clinical benefits, it is not obvious how commercial and government payers will be able to afford them.

Health Economics Made Easy

As a starting point, drugmakers need to work with payers to help them manage their financial risk. From the payers' perspective, the real problem is how to obtain the maximum clinical benefit for each dollar spent treating a particular patient population. They do not want to pay, say, $1,000 per month for a novel biologic when a patient would fare just as well on a conventional pharmaceutical at $90 a month. On the other hand, they may very well be willing to pay $1,000 per month for a treatment that has no equally effective substitutes.

Because of their high prices, targeted therapies tend to have very high cost-effectiveness ratios when based on a patient's own healthcare utilization. However, to the extent that a new drug is highly effective compared with already available treatments, it may be possible to reduce the costs associated with nonresponse. (For example, treatment nonresponse for asthma or depression can be as high as 50 percent.) New treatments that work in formerly nonresponsive patient groups will raise overall average response rates. Thus, there is likely to be a mix of large- and small-molecule therapies that is most cost-effective from the payer's perspective.

A targeted therapy, in this sense, does not necessarily have to be a biologic; it can even be one of those much-maligned me-too drugs. Drugs whose effectiveness is not demonstrably superior to their competitors' may still work better in certain patients. All else being equal, more product choices ought to lead to more price competition—and lower costs for payers. When clinical trials have demonstrated similar average efficacy for alternative drugs, payers are generally skeptical of claims that me-too drugs can offer value. But if there were biomarkers for a drug's or class's effectiveness, me-toos could offer much more value than they currently do because the markers would embody the scientific rationale for response differences. Biomarkers would also help doctors prescribe the right drug for a patient the first time—rather than by trial and error, which is very expensive for the payer.

The move to consumer-directed healthcare requires knowledge of the drivers of patient-buying patterns. We know remarkably little about how people make choices regarding their healthcare—and much of what we think we know is probably wrong.

In the early 1990s, commercial payers began introducing two-tiered drug formularies to "guide" patients toward using generic drugs rather than more expensive, brand-name products. Over the years, payers experimented with how big the differential in patient co-pays for generics and brands needed to be to induce changes in consumer behavior. The two-tier plans added on a third tier (distinguishing between preferred and nonpreferred brands)—and, most recently, some payers have introduced a fourth tier for some biologic products.

Throughout this evolution in plan design, research into patient sensitivity to out-of-pocket co-payments has been measured mainly using payments made for the drug received, rather than using the prices of competitor drugs that might have been prescribed for the same condition. In other words, payers use formularies to manage prescription-drug utilization—but there are no good studies that indicate how much patient variation in prescription-refill behavior is due to drug-benefit design compared with other factors.

The most glaring omission, from an economic standpoint, is the failure to control for a patient's ability to pay as measured by factors like income or net worth. It is reasonable to assume that patients with lower incomes are more sensitive to co-pay differentials. This raises potential issues of differential access to novel treatments based on a consumer's income level. There is also growing evidence that price sensitivity is affected by the nature of the condition being treated.

A recent study in Health Affairs of benefit design's effect on use of biologics found that cancer patients were relatively insensitive to co-pay differences. The fact that out-of-pocket co-payments for biologics and other targeted products can be several hundred dollars per month makes it even more important to understand the buying behavior of consumers.


As useful as retrospective data can be in estimating a product's value for cost, ultimately payers, consumers, and regulators want stronger clinical evidence. Similarly, although much can be learned from modeling refill behavior against co-payments and income, the best way to understand the reasons for medication nonadherence is to ask the patient.

The administrative data typically used to describe drug-use patterns do not contain the variables that explain what motivates patient decision-making. Did patients discontinue their medication because (a) it was too expensive, (b) they didn't feel that it was working, (c) side effects, (d) they hated feeling dependent on it, or (e) some other reason. You need to survey patients to get answers.

Enhanced Medical Claims Database

Understanding medication adherence, particularly for patients with chronic conditions, is critical not only because adherence is necessary to obtain the clinical effect. It is also essential to the drugmaker—building patient use of one's product is partly accomplished by minimizing discontinuation or switching. So understanding the reasons for nonadherence helps manufacturers develop strategies to support patient compliance.

The situation is complicated further by the fact that it is the physician who writes the prescription. Data on physician prescribing patterns is available, and drug companies routinely use it in their communications with doctors. But the growth of consumer-directed healthcare has increasingly involved patients in decision making. Still, little is known about the doctor–patient interaction and how it influences prescribing patterns.

Such needs for more detailed clinical and behavioral information can best be met by primary data collection. As a result, the pharmaceutical industry is moving rapidly to design and implement late-phase studies containing safety, health-economics, and patient-reported outcomes, and real-world effectiveness measures. Moreover, the FDA is under increasing pressure to require manufacturers to monitor real-world drug safety after their products are approved for marketing. Such studies can range from simple product registries with no comparison group to cohort studies validating safety signals found in medical-record claims to Phase IV trials.

Use of retrospective healthcare data may make the recruitment of investigators and patients more efficient, provide a preliminary assessment of protocol feasibility, and help determine the sample size needed to detect statistically significant differences in outcomes. The data also have a variety of other uses, including determining the length of follow-up needed to detect certain endpoints, such as hospitalization risk. In fact, many features of a study protocol can be tested this way.

Dramatic changes in the marketplace have greatly increased the need for health economics and data mining in all phases of development—especially for targeted therapies. The value proposition for such treatments needs to be carefully developed and presented to payers. Payers may be willing to pay premium prices for targeted therapies, but only if they are demonstrably safer or more effective than existing therapies.

William H. Crown is the president of i3 Innovus. He can be reached at