Commentary|Articles|March 5, 2026

Transforming Drug Safety Through Artificial Intelligence, Large Language Models

Listen

0:00 / 0:00

Large language models and natural language processing are reshaping drug safety surveillance by enabling automated adverse event detection, large-scale analysis of regulatory labeling data, and faster, citation-grounded safety assessments while maintaining human oversight and regulatory compliance.

Introduction

Adverse drug reactions are a major cause of mortality, making the effective monitoring of drug safety a critical priority. For decades, this process has relied on manual, time-consuming, and expensive expert review.

Today, artificial intelligence (AI), particularly large language models (LLMs) and natural language processing (NLP), is revolutionizing the analysis of pharmaceutical safety data. These technologies amplify human expertise, enabling regulatory scientists and researchers to extract critical safety insights with unprecedented speed and accuracy.¹

This article explores the practical applications, real-world implementations, and the current state of AI in drug safety, providing essential knowledge for professionals in an increasingly AI-driven industry.

The Challenge: Why AI Matters in Drug Safety

The FDA manages a database of over 140,000 drug labeling documents, which are constantly updated with new safety information.² With 82% of American adults taking at least one medication, the sheer scale of this data necessitates automated solutions.

“AI systems can query drug label databases for reported adverse events, categorize results with citations to specific label sections, identify novel safety signals across multiple drugs, and flag potentially severe reactions for expert review.”

Traditional methods of safety data extraction fall into three categories, each with significant drawbacks. Manual annotation requires highly trained experts and is slow, expensive, and prone to inconsistency.

Keyword matching and pattern-based systems are faster but inflexible, often missing varied phrasing for the same medical concept. Traditional machine learning requires extensive manual preparation of training data and struggles with evolving terminology.

These limitations create what researchers call "the data gap” critical safety information locked in unstructured text that cannot be easily accessed or analyzed at scale.³

How Large Language Models Work in Drug Safety

Large language models (LLMs), built on the transformer architecture, represent a paradigm shift in text processing. Unlike traditional approaches, LLMs learn contextual relationships between words across entire documents, enabling a deeper understanding of complex medical language.⁴

Several domain-specific models have been developed for pharmaceutical applications:

PharmBERT is a BERT-based model pre-trained specifically in drug labeling documents. In a recent study, PharmBERT achieved 89% accuracy in identifying adverse events from drug labels, outperforming both human review and generic BERT models.
RxBERT was developed by the FDA's National Center for Toxicological Research. Trained directly on the FDA's corpus of 44,990 human prescription drug labeling documents, RxBERT achieved an 86.5 F1-score in adverse event classification and 87% accuracy in labeling section classification.⁵
AskFDALabel combines retrieval-augmented generation (RAG) with locally hosted large language models, allowing regulatory scientists to query FDA labeling documents in natural language. The system achieved an F1-score of 0.978 for drug-induced liver injury (DILI) classification and 0.931 for drug-induced cardiotoxicity (DICT) classification.⁶

These implementations demonstrate remarkable real-world performance: 91.1% accuracy in adverse event extraction, 99.4% accuracy in identifying DILI risk, and 99.5% accuracy for cardiotoxicity classification.⁶ Generative models such as GPT-4 now match BERT-based approaches for drug-drug interaction identification without requiring custom training data.⁷

Systems trained on FDA labels also perform comparably when evaluating European drug safety data.⁸ A manual process that once took two to three weeks can now be completed in hours, with improved consistency across all 140,000+ labels.

Real-World Applications in 2025/26

Automated Adverse Event Detection and Classification. AI systems can query drug label databases for reported adverse events, categorize results with citations to specific label sections, identify novel safety signals across multiple drugs, and flag potentially severe reactions for expert review. In one practical example, a pharmacovigilance team used AskFDALabel to assess whether a cardiovascular medication might cause renal injury. The system identified both explicitly stated renal effects and subtler indicators—such as elevated creatinine and reduced glomerular filtration rate—providing exact label citations for immediate verification.⁹
Rapid Packaging and Label Evaluation. AI-powered sentiment analysis and machine learning are being used to analyze customer reviews and complaints, predict patient attention patterns on medication labels, and detect design elements that improve the likelihood that patients will read critical safety information. One study applied deep learning to enhance label design for over-the-counter medications, predicting attention patterns with 90% accuracy.
Drug Property Prediction and Comparative Analysis. AI systems can extract and compare indications, contraindications, dosages, and precautions across a portfolio of products. This capability is especially powerful for comparing drugs across regulatory jurisdictions, where different terminology may describe identical safety concerns. AI systems trained on multiple-label databases can recognize these equivalences, enabling researchers to identify safety signals across markets that might otherwise remain isolated.

Implementation in Regulatory Environments

Data security is a primary concern for regulatory agencies, driving the development of locally hosted LLM solutions. AskFDALabel, for example, runs on in-house GPU servers using open-source models such as Llama 3.1-70B, ensuring that drug-labeling data never leaves FDA systems.⁸

This approach provides data protection, customization, reproducibility, and compliance with audit trail requirements. While training domain-specific models require significant GPU resources (230+ computing hours for comprehensive pre-training), inference is increasingly practical on standard enterprise hardware.¹⁰

AI is being integrated into existing workflows through template-based query systems, retrieval-augmented generation (RAG) for grounding outputs in actual regulatory documents, and human-in-the-loop validation to ensure expert oversight.⁸ Organizations that have successfully implemented these systems report that phased rollout, staff training, quality management, and change management are essential.

One FDA-based implementation found that reviewing 211 drugs for DILI risk using AskFDALabel took approximately 15 minutes, compared to an estimated 40–60 hours for manual review.⁸

Emerging Capabilities and Future Directions

Future systems will integrate data from multiple sources—electronic health records, clinical trial data, published literature, and real-time pharmacovigilance data from spontaneous reporting systems—enabling more comprehensive safety assessments.¹¹ Beyond identifying correlations, future AI will address causality, determining whether an adverse event is caused by a drug or by the patient's underlying condition.

Columbia University researchers are developing methods to classify adverse reactions as causally associated versus coincidental using AI-assisted analysis.¹² Proactive signal detection, by monitoring shifts in adverse event frequencies, new safety information from different regions, and patterns across structurally similar compounds, could accelerate identification of safety problems before they affect large populations.¹³

Practical Considerations and Limitations

Despite their capabilities, current AI systems have important limitations. Context dependency means systems may misinterpret safety information when drugs are used in novel ways or in underrepresented patient populations.

Class imbalance makes rare adverse events harder to detect; specialized techniques like SMOTE help but do not fully solve this problem.¹⁴ Explainability remains a challenge, as some stakeholders require full transparency into how "black box" deep learning models reach conclusions.¹⁵

Cost and computational requirements, including GPU infrastructure, data science expertise, and ongoing model maintenance, are also significant considerations. Any AI system used in regulatory science must undergo rigorous validation, including testing against benchmark datasets, cross-validation, error analysis, and continuous monitoring.

For drug safety applications, this validation must meet regulatory standards with thorough documentation, audit trails, and periodic revalidation.¹⁰

Best Practices for Implementation

For regulatory professionals, it is recommended to start with a pilot project on a specific task, establish performance baselines before implementing AI, invest in team training, maintain human oversight, and document all AI system versions, outputs, and validation results.

For organizations building these systems, key practices include using domain-specific training data (generic LLMs significantly underperform compared to those trained on pharmaceutical text),⁷ implementing local hosting for regulatory use, designing for interpretability with citations and confidence scores, planning for rigorous validation, and building continuous feedback loops based on expert input.

Conclusion

Artificial intelligence and large language models are fundamentally changing how the pharmaceutical industry approaches drug safety. What once required weeks of manual expert review now takes hours, with improved consistency across thousands of documents.

These are operational realities in 2025: FDA scientists are using these tools, pharmaceutical companies are deploying them, and regulatory reviewers are incorporating AI-generated safety assessments into their decision-making. The transition requires investment in infrastructure, training, and validation. Regulators must ensure AI systems meet rigorous standards, and scientists must maintain oversight and critical judgment.

But the trajectory is clear: the future of drug safety lies in combining AI’s capacity for scale and consistency with human expertise in judgment and oversight. That combination, properly implemented, will make medications safer for the billions of patients who depend on them.

About the Authors

Partha Anbil is at the intersection of the Life Sciences industry and Management Consulting. He is currently SVP, Life Sciences, at Coforge Limited, a $1.7B multinational digital solutions and technology consulting services company. He held senior leadership roles at WNS, IBM, Booz & Company, Symphony, IQVIA, KPMG Consulting, and PWC. Mr. Anbil has consulted with and counseled Health and Life Sciences clients on structuring solutions to address strategic, operational, and organizational challenges. He was a member of the IBM Industry Academy, a very selective group of professionals inducted into the academy by invitation only, the highest honor at IBM. He is a healthcare expert member of the World Economic Forum (WEF). He is also a Life Sciences industry advisor at MIT, his alma mater.

Niraj B. Patel is a technology executive and AI strategist with over 25 years of experience driving digital transformation and AI integration across financial services, real estate, fintech, and life sciences sectors. He has held senior leadership roles including CIO and Chief AI Officer at Greystone, President of AI, Analytics, and Platforms at DMI, and CIO at IBM's lending platforms. His work has earned industry recognition, including the Best AI Implementation in Commercial Real Estate from RealComm, and the InfoWorld CTO 25 and CIO 100 Awards. A Temple University graduate with degrees in Finance and MIS, Niraj completed the Wharton Advanced Management Program. He has taught AI and Digital Business Strategy at the Fox School of Business, where he mentored MBA students on practical AI implementation and governance. His cross-industry perspective brings valuable insights to life sciences organizations navigating AI industrialization, regulatory compliance, and sustainable capability building.

References

1. Wu, L., Fang, H., Qu, Y., Xu, J., & Tong, W. (2025). Leveraging FDA labeling documents and large language models to enhance annotation, profiling, and classification of drug adverse events with AskFDALabel. Drug Safety, 48(655–665). https://doi.org/10.1007/s40264-025-01520-1

2. Wu, L., Gray, M., Dang, O., Xu, J., Fang, H., & Tong, W. (2024). RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling. Experimental Biology and Medicine, 248, 1937–1943. https://doi.org/10.1177/15353702231220669

3. Gísladóttir, Ú. O. (2025). The data gap: Challenges in extracting and utilizing drug safety information. In Leveraging large language models to enable drug safety research (pp. 20–23). Columbia University.

4. Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

5. Wu, L., Gray, M., Dang, O., Xu, J., Fang, H., & Tong, W. (2024). RxBERT model pretraining and evaluation. Experimental Biology and Medicine, 248, 1938–1940.

6. Wu, L., Fang, H., Qu, Y., Xu, J., & Tong, W. (2025). AskFDALabel framework and implementation. Drug Safety, 48, 656–665.

7. Gísladóttir, Ú. O. (2025). Generative language models in drug safety information extraction. In Leveraging large language models to enable drug safety research (Chapter 3). Columbia University.

8. Wu, L., Fang, H., Qu, Y., Xu, J., & Tong, W. (2025). FDA drug labeling documents: An invaluable resource. Drug Safety, 48, 655.

9. Wu, L., Fang, H., Qu, Y., Xu, J., & Tong, W. (2025). AskFDALabel application example. Drug Safety, 48, 658–665.

10. Wu, L., Gray, M., Dang, O., Xu, J., Fang, H., & Tong, W. (2024). RxBERT model pretraining infrastructure and requirements. Experimental Biology and Medicine, 248, 1939.

11. Gísladóttir, Ú. O. (2025). Multi-source data integration for drug safety research. In Leveraging large language models to enable drug safety research (Chapter 5). Columbia University.

12. Gísladóttir, Ú. O. (2025). Identifying causal relationships in adverse reactions. Leveraging large language models to enable drug safety research, Chapter 4.

13. Wu, L., Fang, H., Qu, Y., Xu, J., & Tong, W. (2025). Advanced AE surveillance and future applications. Drug Safety, 48, 664.

14. Referenced in the original article in the context of SMOTE (Synthetic Minority Over-Sampling Technique).

15. Bilska, M. (2022). Model explainability and performance assessment. In Extracting drug indications from structured product labels using deep learning techniques (pp. 45–67). University of Groningen.

Disclaimer: The views expressed in the article are those of the authors and not of the organizations they represent.

Lead with insight with the Pharmaceutical Executive newsletter, featuring strategic analysis, leadership trends, and market intelligence for biopharma decision-makers.