Swin-UNet: A Unified Transformer–CNN Framework for Multi-Organ Medical Image Segmentation

Xiuru Li

doi:10.1051/itmconf/20268401003

Open Access

Issue		ITM Web Conf. Volume 84, 2026 2026 International Conference on Advent Trends in Computational Intelligence and Data Science (ATCIDS 2026)


Article Number		01003
Number of page(s)		5
Section		Intelligent Computing in Healthcare and Bioinformatics
DOI		https://doi.org/10.1051/itmconf/20268401003
Published online		06 April 2026

ITM Web of Conferences 84, 01003 (2026)

Swin-UNet: A Unified Transformer–CNN Framework for Multi-Organ Medical Image Segmentation

Xiuru Li^*

School of Information Science and Engineering, Lanzhou University, Lanzhou, China

^* Corresponding author’s email: This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

Transformer-based architectures have demonstrated significant promise in medical image segmentation due to their strong ability to model long-range contextual relationships. However, standard Vision Transformer (ViT) modules used in hybrid networks such as TransUNet are limited in representing both fine-grained and coarse features effectively. To overcome this limitation, this paper introduces Swin-UNet, a hybrid framework that combines the hierarchical Swin Transformer encoder with a U-Net-inspired decoder. The encoder utilizes shifted-window self-attention for efficient local-global feature learning, while the decoder integrates residual convolutional paths and multi-scale patch embeddings for improved reconstruction and scale robustness. Evaluated on the Synapse multi-organ CT dataset, the model achieves competitive Dice scores and lower Hausdorff distances compared to U-Net and TransUNet, highlighting its potential as a robust and generalizable approach for medical image segmentation. These results suggest that the Swin-UNet effectively balances computational efficiency with segmentation accuracy, offering a strong foundation for future medical imaging applications.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.