Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free

McGill University, Mila - Quebec AI Institute
In-Review

*Indicates Equal Contribution
Architecture

Diffusion classifiers extract implicit classifiers from conditional diffusion models. First, a sample, \( \boldsymbol{x} \), is noised at a randomly chosen noise level, \( (\boldsymbol{\epsilon}_k, \lambda_k) \). The noised sample is then denoised by the diffusion network with each possible conditioning input, \( c_j \). The conditioning variable, \( c_j \), that results in the denoised output, \( \hat{\boldsymbol{x}}_\theta (\boldsymbol{z}_\lambda, c_j) \), with the smallest reconstruction error is selected as the class. This process is repeated for a set of \( N \) noise levels \( (\boldsymbol{\epsilon}, \lambda) \) with the reconstruction errors aggregated (e.g., average/majority voting) for a more accurate prediction.

Abstract

Discriminative classifiers have become a foundational tool in deep learning for medical imaging, excelling at learning separable features of complex data distributions. However, these models often need careful design, augmentation, and training techniques to ensure safe and reliable deployment. Recently, diffusion models have become synonymous with generative modeling in 2D. These models showcase robustness across a range of tasks including natural image classification, where classification is performed by comparing reconstruction errors across images generated for each possible conditioning input. This work presents the first exploration of the potential of class conditional diffusion models for 2D medical image classification. First, we develop a novel majority voting scheme shown to improve the performance of medical diffusion classifiers. Next, extensive experiments on the CheXpert and ISIC Melanoma skin cancer datasets demonstrate that foundation and trained-from-scratch diffusion models achieve competitive performance against SOTA discriminative classifiers without the need for explicit supervision. In addition, we show that diffusion classifiers are intrinsically explainable, and can be used to quantify the uncertainty of their predictions, increasing their trustworthiness and reliability in safety-critical, clinical contexts.

Novel Majority Voting Algorithm

We propose a simple but effective majority voting scheme that, instead of cumulating errors at each timestep, tallies the amount of times a reconstruction error was smaller for each test condition and then chooses the class with the most votes. Using a majority voting scheme increases classification performance across the board, and specifically so at larger values of $N$. This result is intuitive; at greater values of $N$ there are more reconstructions attempted from high noise disturbance which can introduce large sources of variance in an average error scheme. Results here are shown for the CheXpert classification task.

Majority Voting Algorithm
Performance Comparison

Competitive vs. SOTA Discriminative Classifiers

Our experiments on the CheXpert and ISIC Melanoma skin cancer datasets demonstrate that foundation and trained-from-scratch diffusion models achieve competitive performance against SOTA discriminative classifiers. Notably, diffusion classifiers achieve this performance with minimal hyperparameter tuning, no augmentations, and without being trained on a classification objective. $^*$ and $^\dagger$ denote fine-tuned and zero-shot versions, respectively. Diffusion classifier results are with 501 classification steps, and a majority vote.

Intrinsic Explainability

Importantly, diffusion classifiers are able to produce counterfactual explanations, as opposed to other interpretability methods that simply highlight regions of interest. The counterfactual image of a sick patient shows decreased disease pathology in the left and right lungs, while the factual reconstruction shows minimal differences. The natural interpretability of diffusion classifiers provides both transparency on how the model is learning (thus allowing the identification of shortcut learning), and specific class information which improves understanding of disease.

Explainability Visualization
Uncertainty Quantification

Uncertainty Quantification

In medical imaging, uncertainty measures are validated by confirming that when the model is confident, the prediction is correct, and when it is uncertain, it is incorrect. We therefore validate the diffusion model's uncertainty quantification by filtering out the most uncertain predictions and examining the change in performance.Each of the models show accuracy increases as the most uncertain predictions are filtered out for CheXpert (- -) and ISIC (-). This indicates that these models are most uncertain about their incorrect predictions, which confirms the effectiveness of their uncertainty measure and high value across medical applications.

BibTeX

@misc{favero2025conditionaldiffusionmodelsmedical,
        title={Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free}, 
        author={Gian Mario Favero and Parham Saremi and Emily Kaczmarek and Brennan Nichyporuk and Tal Arbel},
        year={2025},
        eprint={2502.03687},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2502.03687}, 
  }