Defensible Data Disposal

Sep 30, 2012
By Pharmaceutical Executive


Lorrie Luellig
For pharmaceutical companies, the strategic importance of effective information governance has never been greater. Processes related to research and development, clinical trials, pharmacovigilance, drug registration, and the pharma supply chain face increasingly complex information management regulations. Changes to IP laws relating to patents and "first to file" make it imperative that companies identify and properly retain all critical patent information. Pharma companies operating globally, whether in manufacturing, R&D, or marketing, must continually adapt to the diverse and evolving legal and regulatory environments around the world. All this at a time when pharma companies face exploding amounts of data and ever-increasing data storage costs.

This column will explore how a cross-functional "defensible disposal" program can help companies satisfy their legal and regulatory requirements around the world while also controlling costs and meeting other research and business objectives.

Saving everything doesn't work

According to McKinsey & Company, 90 percent of data in the world today was created over the last two years. For pharma companies already burdened by the cost and complexity of the vast amounts of research data they generate, this new onslaught of information in the form of social media, RFD tagging, electronic lab notebooks, raw data, and more is far outpacing the ability to effectively collect, analyze, store, produce, archive, and delete it. As a result, many companies opt to save everything.

Pharma companies may believe there is ample justification for saving all data. Scientists may believe that by definition all research data has business value and is critical to regulatory compliance. Legal and compliance officers may believe that the safest response to the complex requirements of the FDA, FTC, SEC, IRS, and health authorities around the world is to save everything. And business users and executives may believe that saving everything is a thrifty way to keep a permanent record of business activities while also reducing risk.

Unfortunately, none of this is true. A huge portion of stored research data is redundant. Storing it all makes it harder for scientists to find the data they need when they need it, and makes it more difficult to extract new results from old data. In addition, positioning a company for an effective response to an e-discovery request, as well as new regulations related to privacy (e.g., HIPPA in the United States and the European Directive on Protection of Personal Data in the European Union) require companies to delete some data. Companies must realize that the supposed safe harbor of "saving everything" can actually put them in legal jeopardy and at risk of regulatory violations and penalties.

Storing data isn't cheap

A key justification of saving everything is the misconception that storage is relatively cheap and that constantly investing in new storage infrastructure won't impact the bottom line. But the McKinsey & Company research showing an overall data growth rate of 40 percent means that companies that stored 15 petabytes of data in 2011 will need to find space for some 39 petabytes by the end of 2014. Even with a 20 percent decline in storage unit costs, the per petabyte cost of tier one storage for most large companies will likely swell to between $1.5 million and $5 million, consuming close to 20 percent of the typical IT budget.

But deleting the right data isn't easy

Clearly, data that has no legal, regulatory, research, or business value should be deleted. But who is in a position to delete it? Only IT has the power to perform the physical disposal of electronic information, but on its own, IT has no way to determine what is of value. In fact, even in a relatively small pharma company, IT may need to know which of 100 legal holds and 300 record categories apply to which of 10,000 people working in which of 2,000 departments whose data is located in which of 1,000 servers or apps. That's a billion possible choices with no mechanism for making good ones.

But how can scientists and business users determine what is of legal or regulatory value? How can legal determine what is of scientific or business value? And even if these determinations can be made, how can they be communicated to IT?