AI-Powered De-Identification and Data Analysis

Anita Kawatra discusses advancing clinical and pharmaceutical progress while protecting patient privacy.

We live in unprecedented times for healthcare advances and life sciences discovery. The primary driver, of course, is digital technology, but the real fuel for today’s scientific breakthroughs is the data being processed — most significantly, de-identified patient health information, including clinical notes, test results, pathology reports and other valuable intelligence from electronic health records (EHRs). Within this data is the knowledge that, once extracted, generates insights that inspire innovations, leading to improved patient care and setting the stage for even more groundbreaking innovations in the future. Yet this vital knowledge had historically been inaccessible — until now.

What changed? In simple terms, the introduction of new, advanced technologies capable of both rapidly curating siloed, unstructured and previously “uncomputable” biomedical data and removing identifying patient markers — since protected health information (PHI) is rightfully protected by privacy regulations. Previously, the only way to accomplish all this was through manual intervention, a process so slow and cumbersome we were only able to scratch the surface of the knowledge locked within the world’s vast treasury of EHRs. Finding a way to efficiently and effectively keep patient data private while making it accessible for analysis has been one of the greatest obstacles preventing dramatic healthcare advances.

Now, by employing the latest artificial intelligence (AI)-enabled curation and de-identification technologies, scientists can include the collective wisdom from thousands, if not millions, of EHRs in their research, using data from real-world patient outcomes to cast a bright new light on disease diagnosis and pharmaceutical development.

Clearer paths to action

Among the most common challenges in healthcare are hidden disease and elusive diagnoses, either because a patient does not yet present visible symptoms or current symptoms do not indicate a clear cause. Leading-edge algorithms built upon the knowledge gleaned from de-identified patient records can advance diagnoses by months, even years, by looking at past data and drawing conclusions from patterns that would not be obvious in the course of human analysis. What’s more, this data can illuminate clearer paths for clinical action and treatment that improve patient quality of life and outcomes, while also expanding opportunities for pharmaceutical companies to extend and develop therapeutics that benefit patients everywhere.

Pulmonary hypertension (PH), for example, is a chronic and life-threatening condition whose symptoms are often mistaken for other common conditions — or missed altogether. Diagnosis typically takes place in the later stages of the disease, when it can only be identified through difficult-to-obtain tests and treated with costly and invasive procedures. Now, through an artificial intelligence-powered retrospective analysis of patient electrocardiogram (ECG) records, however, researchers at nference and Mayo Clinic were able to identify previously unseen signs of PH, and then create a screening tool that identifies patients at risk from easily obtainable ECGs. Using this technology, a clinician can identify and predict the onset of PH in a patient an average of 18 months after the patient first presents with symptoms, instead of the four-to-six years that has been the norm.

The consequences of this advance for millions of people cannot be underestimated. It also serves as a powerful example of the potential impact of analyzing de-identified EHRs with AI: By using the knowledge gleaned to create new diagnostic algorithms that physicians apply at the point of care, they can predict with greater accuracy what could be impacting a patients’ health, even if they don’t see any indicators. This empowers doctors to choose the right treatments for patients sooner than ever before, increasing their effectiveness, all thanks to an early diagnosis.

Faster, more effective drug development

Predicting and diagnosing disease is just one aspect of how AI is transforming healthcare. Algorithms based on de-identified EHRs can also help pharmaceutical companies accelerate drug discovery and development and decide which drugs should or should not be studied in clinical trials, speeding time to market while enhancing regulatory compliance.

Traditionally, this process has been driven by subjective analysis. Such a scattershot approach, coupled with an estimated 1.8 million scientific articles published each year, makes it challenging for any one team or organization to make informed, data-driven decisions about targets and therapeutics to pursue. Synthesizing the overwhelming volume of data available and accurately conducting all the real-world research involved in ascertaining if a pharmaceutical drug is worth the next step in clinical trials, is simply impossible. Thus, pharma firms have been more reliant on trial and error than they would like, costing time and money and holding back advances, while raising healthcare costs for everyone.

AI-enabled data synthesis, de-identification and curation is leading the way forward and changes all this. By unlocking the knowledge in the world’s vast treasuries of biomedical information, pharmaceutical companies can now connect the dots between previously incompatible and unavailable data to provide real world evidence in real time, delivering on-demand answers to specific research questions and accelerating the development of pharmaceuticals while identifying new opportunities for drugs that may already be in the pipeline.

Again, the key element at the heart of this newfound power is automated de-identification technology centered around data and patient privacy. That privacy is central to ensuring that patient data remains secure, while also making available the data that is vital to the work researchers and clinicians are doing every day. There can be no cutting corners in this area. “In partnership with nference, we have developed a de-identification approach that takes patient privacy to the next level,” says John Halamka, MD, president, Mayo Clinic Platform. “We are using a multi-layered defense referred to as ‘data behind glass,’ which means that the de-identified data is stored in an encrypted container, always under control of Mayo Clinic Cloud. This prevents merging the data with other external data sources, so no bad actor can take the data and attempt to re-identify it. At Mayo Clinic, the patient always comes first, so we have committed to continuously adopt novel technologies that keep information private.”

To the next level

Strategic partnerships between researchers, clinicians, and technology developers are now signaling what the future of data-driven research means, but these industries have only scratched the surface of what the world’s collective biomedical wisdom can bring to the table.

Leaders in the de-identification technology space are constantly looking at ways to improve how we apply technology and data security to science and medicine, and that is ultimately what will move us forward. The best data and technology in the world are worth nothing without people — whether they are physicians, research biologists, computer scientists — who are invested in pushing approaches that blend technology and bioscience to push medical understanding and insights forward. Together we are transforming the business AND the benefits of healthcare in every meaningful way, advancing diagnoses and treatments by months — even years — and helping thousands of patients faster while saving countless research hours and billions of dollars in investment.

Anita Kawatra is Chief Corporate Affairs Officer, nference.