Discriminative classifiers have become a foundational tool in deep learning for medical imaging, excelling at learning separable features of complex data distributions. However, these models often need careful design, augmentation, and training techniques to ensure safe and reliable deployment. Recently, diffusion models have become synonymous with generative modeling in 2D. These models showcase robustness across a range of tasks including natural image classification, where classification is performed by comparing reconstruction errors across images generated for each possible conditioning input. This work presents the first exploration of the potential of class conditional diffusion models for 2D medical image classification. First, we develop a novel majority voting scheme shown to improve the performance of medical diffusion classifiers. Next, extensive experiments on the CheXpert and ISIC Melanoma skin cancer datasets demonstrate that foundation and trained-from-scratch diffusion models achieve competitive performance against SOTA discriminative classifiers without the need for explicit supervision. In addition, we show that diffusion classifiers are intrinsically explainable, and can be used to quantify the uncertainty of their predictions, increasing their trustworthiness and reliability in safety-critical, clinical contexts.
We propose a simple but effective majority voting scheme that, instead of cumulating errors at each timestep, tallies the amount of times a reconstruction error was smaller for each test condition and then chooses the class with the most votes. Using a majority voting scheme increases classification performance across the board, and specifically so at larger values of $N$. This result is intuitive; at greater values of $N$ there are more reconstructions attempted from high noise disturbance which can introduce large sources of variance in an average error scheme. Results here are shown for the CheXpert classification task.
Our experiments on the CheXpert and ISIC Melanoma skin cancer datasets demonstrate that foundation and trained-from-scratch diffusion models achieve competitive performance against SOTA discriminative classifiers. Notably, diffusion classifiers achieve this performance with minimal hyperparameter tuning, no augmentations, and without being trained on a classification objective. $^*$ and $^\dagger$ denote fine-tuned and zero-shot versions, respectively. Diffusion classifier results are with 501 classification steps, and a majority vote.
Importantly, diffusion classifiers are able to produce counterfactual explanations, as opposed to other interpretability methods that simply highlight regions of interest. The counterfactual image of a sick patient shows decreased disease pathology in the left and right lungs, while the factual reconstruction shows minimal differences. The natural interpretability of diffusion classifiers provides both transparency on how the model is learning (thus allowing the identification of shortcut learning), and specific class information which improves understanding of disease.
In medical imaging, uncertainty measures are validated by confirming that when the model is confident, the prediction is correct, and when it is uncertain, it is incorrect. We therefore validate the diffusion model's uncertainty quantification by filtering out the most uncertain predictions and examining the change in performance.Each of the models show accuracy increases as the most uncertain predictions are filtered out for CheXpert (- -) and ISIC (-). This indicates that these models are most uncertain about their incorrect predictions, which confirms the effectiveness of their uncertainty measure and high value across medical applications.
@misc{favero2025conditionaldiffusionmodelsmedical,
title={Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free},
author={Gian Mario Favero and Parham Saremi and Emily Kaczmarek and Brennan Nichyporuk and Tal Arbel},
year={2025},
eprint={2502.03687},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.03687},
}