Human–AI collaboration in radiology

AI models should complement clinicians rather than replace them. What recent research on human–AI teams in radiology — and our conversations with radiologists — tell us about building an intelligent layer for collaboration.

Most AI models simply discover and memorize patterns from their training data, with no notion of the humans they will ultimately work alongside. Our view is that these models should complement clinicians rather than replace them, a point reinforced by recent work on human–AI collaboration in radiology.[1]

Complementary teams outperform either alone

The researchers conceptually show that rather than AI and humans working in silo, each with non-overlapping errors in diagnostic performance, a well-designed human–AI team can significantly outperform either one alone (Figure 1 of the paper).[1] Capturing that gain depends on placing AI in the workflow strategically, in the specific spots where it complements the radiologist. Figure 3 from the same work lays out a range of radiology workflows where this can happen.[1]

What we are building at Nambi

At Nambi AI, we are building an intelligent layer to make that kind of human–AI collaboration effective. One signal we rely on is uncertainty, which we demonstrate in the demo in our recent blog post [2] and which is especially useful in the triage/worklist-prioritization and image-interpretation workflows from Figure 3 in the paper.[1] In addition to model predictions, uncertainty enables more effective triaging of cases and review of AI model outputs. Our current goal at Nambi is to build such tools to support the clinician by helping them systematically review model outputs and develop a clear picture of their error boundaries by surfacing failure patterns.[3]

Lessons from a kidney imaging workflow

These priorities are grounded in what we hear from radiologists in practice. In one kidney imaging workflow we looked at, an in-house segmentation model had cut the time to process a case from a full day to roughly ten minutes. That gain matters, but the more interesting lessons were about collaboration. The model was an optional add-on rather than a requirement, and how much radiologists leaned on it varied by their goal: for volume estimation, for instance, perfectly precise segmentation was not essential, so some errors were reasonable to leave in place. Other failures mattered far more, such as weak performance on specific labels like identifying the largest cyst in the right kidney, and these only came to light after the tool had been in use for a while.

Beyond surfacing such failure patterns, the radiologists pointed to needs that go to the heart of human–AI collaboration: adaptive elicitation of relevant clinical context for systematic review of model output, and a clearer baseline of their own error patterns to weigh against the model’s. Designing for that, rather than for standalone accuracy alone, is what we mean by building an intelligent layer for collaboration.

Beyond reliability: jointly optimizing human–AI systems

Reliability is where we are starting at Nambi, but it is only one part of improving human–AI collaboration. The broader problem of human–AI collaboration has been studied extensively: Vodrahalli et al., for example, found evidence supporting the joint optimization of human–AI systems, and showed that an AI algorithm that is optimal in isolation may not be optimal for human use.[4] As done by Vodrahalli et al.,[4] collecting the data needed to jointly optimize human–AI systems is a direction we are keen to explore.

Stakes and autonomy

The question of stakes and autonomy is just as important. A recent frontier model card notes:[5]

“Importantly, we find that when used in an interactive, synchronous, ‘hands-on-keyboard’ pattern, the benefits of the model were less clear. When used in this fashion, some users perceived [our model] as too slow and did not realize as much value. Autonomous, long-running agent harnesses better elicited the model’s coding capabilities.”

In healthcare, the equivalent of fully hands-off autonomy (Figure 2 of Kocak and Cuocolo[1]) will not be possible, because the stakes are too high. Thinking Machines Lab is prioritizing human–AI collaboration for general-purpose models;[6] we are doing the same for clinical AI models, and as we work with more radiologists we are excited to discover and develop methods that empower them.

Human–AI collaboration in radiology

Complementary teams outperform either alone

What we are building at Nambi

Lessons from a kidney imaging workflow

Beyond reliability: jointly optimizing human–AI systems

Stakes and autonomy

Work with us

Thanks — we’ll be in touch.

References