Feature|Articles|October 3, 2025

More Than an Algorithm: Building Medical AI

Listen

0:00 / 0:00

Key Takeaways

AI in medicine enhances early diagnosis and personalized treatment, but developing viable models is complex, requiring problem definition and data preparation.
Ensuring AI models are safe, validated, and explainable is crucial, with adherence to medical regulations and standards like ISO 13485 and ISO 14971.
Successful AI models must be evaluated for performance across diverse populations and monitored for data and concept drift to remain relevant.
Future medical AI advancements promise integrated, personalized systems, emphasizing trust and clinical alignment for widespread adoption.

Developing a medical AI model is defining the parameters of the problem that needs to be solved.

Artificial intelligence has been used in medicine for decades in various forms, quietly revolutionizing healthcare, but only now does it have potential to reshape the entire medical field. There are numerous ways in which AI could do this, from earlier diagnoses to personalized treatment. Whilst the vision is compelling, the path from concept to clinically viable AI is complex. Medical AI models must be safe, validated and trustworthy, so how are these sophisticated algorithms built?

There’s No Solution Without a Problem

The core purpose of AI is to solve problems, no matter how simple or complex. Naturally the first step in developing a medical AI model is defining the parameters of the problem that needs to be solved. Reframing the idea into a clear, clinically grounded problem is essential to then enable an appropriate technical strategy to be developed. Without this step, teams risk building models that may perform well but are disconnected from clinical needs.

A crucial part of this phase is involving clinical experts in problem definition as they possess the domain knowledge that data scientists may lack. Their insight helps ensure the problem is truly grounded in real-world needs and that the AI solution will be relevant and practical for end users.

Not All Data is Equal

Once your problem has been defined, the relevant data required to develop the solution must be identified and prepared. This data is used to learn, generalize and make predictions from. It is important that the data is relevant, sufficient and clean. This stage can be a surprisingly intensive process as it can involve multiple iterations of investigations to define the parameters of the final collated dataset. Examples of questions that can be asked during this stage include:

Which populations are relevant to the data I need?
Across which hospital specialties will this model be used?
Will this model be relevant to emergency procedures, or should this data be excluded?

The reality is that clinical data is often messy and complex, with a lot of data requiring cleaning. This includes simple checks such as removing duplicates, or identifying and handling missing data, for example if patient age is missing from a portion of records, although many data preparation decisions are more nuanced and highly problem specific. Consider a patient who is admitted for routine surgery and discharged the next day, but undergoes emergency surgery 10 days later, then develops complications. It may be unclear whether the negative complications should be attributed to the initial procedure, the emergency procedure, or a combination of both. These are difficult questions that require careful clinical input and expertise to answer appropriately.

These added complexities mean data preparation can take significant time and effort. Conducting this comprehensive data audit early in the development process is critical. This includes evaluating whether there is enough high-quality data to train a model, assessing for imbalances, clarifying the definition and measurability of the target outcome, and understanding any ethical or legal restrictions on data use.

Designing a Safe and Trusted Model

Designing AI for healthcare isn’t just about building a model that solves a problem. It also means ensuring the system is safe and compliant with medical regulations including the internationally recognized standards ISO 13485 and ISO 14971 for medical devices. Failing to plan from the outset to meet these regulations can lead to costly rework or outright failure to gain regulatory approval.At Camgenium, we follow the IEC 62304 framework for Software Lifecycle Processes of medical devices, which establishes requirements for planning, design and development through to testing, verification and maintenance. For example, a Clinical Evaluation Report will document dataset parameters and model performance to provide an audit trail and facilitate regulatory approval.

One aspect of development that is often overlooked in medical AI is the importance of model explainability, the ability for developers to understand how and why a model makes its decisions. AI models that affect patient care must be interpretable by clinicians’ overseeing development so they can understand the rationale behind predictions, trust the system’s output, and, if necessary, challenge it. Furthermore, all decisions made during the model development process are documented and traceable to ensure that design choices are not guesswork, but evidence-based and justifiable.

Camgenium in partnership with C2-Ai developed an AI risk triage system for hospital-acquired pneumonia (HAP) and acute kidney injury (AKI), two of the most common and serious complications that result from prolonged hospital stays. The model, which has been deployed in multiple NHS Trusts, allows hospital staff to individually assess incoming patients within a couple of minutes, providing clinicians with data needed to evaluate the level of risk each patient faces. By enabling clinicians to prioritise high-risk patients at the point of admission, the model has the potential to significantly reduce avoidable complications and shorten unnecessary hospital stays. Importantly, the model was developed in close collaboration with medical expert clinicians, and it adhered to regulatory standards to ensure that its outputs were safe, interpretable, and relevant.

Evidence of a Successful Model

Prior to deployment, the design, training and optimization of the model also involves its evaluation against expected performance metrics. Particularly in medical settings where positive cases might be infrequent, a highly accurate model could still be unsafe if it fails to detect rare but critical cases. Appropriate evaluation metrics must therefore be selected to best understand how well the model performs on both negative and positive cases. Additionally, the training data should be representative of population variability and the model performance checked on subpopulations. Any imbalances in performance across specialties, gender, age or ethnic groups should be addressed to prevent systemic disparities.

Testing the model on unseen data assures that it has learned generalized patterns, rather than memorizing the exact patterns of the training data. Visualizations of model output can also help highlight when a model is performing well versus when it’s producing biased or misleading results. Ultimately, a combination of visual and statistical evaluations will build a stronger picture of a model’s performance.

Keeping Relevant in a Changing World

Deploying an AI model into a clinical setting isn’t the end of the process. Models that perform well at launch can degrade over time as the real-world shifts in ways the model wasn’t trained for. This can happen through two major phenomena:

Data drift: When the characteristics of the input or output data change over time. For example, changes in the way symptoms are reported (e.g., due to new guidelines) could alter their distribution in the recorded data.
Concept drift: When the underlying relationships between inputs and outputs evolve. For instance, the adoption of a new surgical technique could change the risk profile of a procedure, invalidating previous predicted success outcomes used by the model.

To address these challenges, monitoring is essential. This might include regular re-evaluation of performance metrics, monitoring for drift in input features, revalidating and recalibrating the model with new or updated datasets and engaging clinicians to identify if model outputs no longer align with observed reality. This prevents even a well-designed AI model from becoming outdated or misleading.

The Future of Medical AI

There is an abundance of new and exciting technology in the medical field, with the next wave of medical AI moving towards more integrated, personalized and clinically embedded models. Multi-modal AI systems that can analyze and synthesize multiple types of medical data simultaneously such as medical images, 3D scans, clinical notes, and lab results could give richer insights than relying on a single source. This holistic view has the potential to allow for more accurate diagnoses and better-tailored treatment recommendations.

Medical AI is also driving the shift from one-size-fits-all approaches in healthcare to tailored treatment for individuals, considering genetics, medical history and environmental factors. Personalized models might help predict how a patient will respond to a particular drug, identify those at higher risk of complications, or shape preventative care strategies long before symptoms appear.

Though exciting and rapidly advancing, the success of future AI systems will continue to rely on trust and clinical alignment. To be adopted at scale, medical AI must integrate seamlessly into existing workflows and, more importantly, be trusted by the people who use it.

Lead with insight with the Pharmaceutical Executive newsletter, featuring strategic analysis, leadership trends, and market intelligence for biopharma decision-makers.

Subscribe Now!