Exploring The Current State of Transformer'S Application in The Field of Target Detection

Open Access

Issue		ITM Web Conf. Volume 78, 2025 International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)


Article Number		04035
Number of page(s)		8
Section		Foundations and Frontiers in Multimodal AI, Large Models, and Generative Technologies
DOI		https://doi.org/10.1051/itmconf/20257804035
Published online		08 September 2025

Ren, S., He, K., Girshick, R., Sun, J.: ‘Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks’, Advances in Neural Information Processing Systems (NeurIPS), 2015. [Google Scholar]
Bodla, N., Singh, B., Chellappa, R., Davis, L. S.: ‘Soft-NMS: Improving Object Detection with One Line of Code’, IEEE International Conference on Computer Vision (ICCV), 2017. [Google Scholar]
He, K., Zhang, X., Ren, S., Sun, J.: ‘Deep Residual Learning for Image Recognition’, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [Google Scholar]
Vaswani, A., Shazeer, N., Parmar, N., et al.: ‘Attention Is All You Need’, Advances in Neural Information Processing Systems (NeurIPS), 2017. [Google Scholar]
Carion, N., Massa, F., Synnaeve, G., et al.: ‘End-to-End Object Detection with Transformers (DETR)’, European Conference on Computer Vision (ECCV), 2020. [Google Scholar]
Zhu, X., Su, W., Lu, L., et al.: ‘Deformable DETR: Deformable Transformers for End-to-End Object Detection’, International Conference on Learning Representations (ICLR), 2021. [Google Scholar]
Liu, Z., Lin, Y., Cao, Y., et al.: ‘Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows’, IEEE International Conference on Computer Vision (ICCV), 2021. [Google Scholar]
Dai, J., Qi, H., Xiong, Y., et al.: ‘Deformable Convolutional Networks’, IEEE International Conference on Computer Vision (ICCV), 2017. [Google Scholar]
Wang, W., Xie, E., Li, X., et al.: ‘Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions (PVT)’, IEEE International Conference on Computer Vision (ICCV), 2021. [Google Scholar]
Fivetrees: ‘Performance Evaluation Metrics for Target Detection’, Zhihu, 2019. Available: https://zhuanlan.zhihu.com/p/70306015 [Google Scholar]
Advanced AI: ‘YOLOV5 Target Detection - Evaluation Indicators’, Zhihu, 2021. Available: https://zhuanlan.zhihu.com/p/398530997 [Google Scholar]
Kim, S., Kim, D., Cho, M., et al.: ‘ViDT: Vision Transformer with Deformable Attention for Object Detection’, European Conference on Computer Vision (ECCV), 2022. [Google Scholar]
Howard, A. G., Zhu, M., Chen, B., et al.: ‘MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications’, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [Google Scholar]
Radford, A., Kim, J. W., Hallacy, C., et al.: ‘Learning Transferable Visual Models from Natural Language Supervision (CLIP)’, International Conference on Machine Learning (ICML), 2021. [Google Scholar]
Li, Y., Chen, H., Cheng, Z., et al.: ‘Efficient Medical Image Analysis with Vision Transformers’, Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023. [Google Scholar]
Lu, J., Batra, D., Parikh, D., Lee, S.: ‘ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision’, International Conference on Machine Learning (ICML), 2021. [Google Scholar]
Chen, K., Wang, J., Pang, J., et al.: ‘Scaling Vision Transformers to 22 Billion Parameters’, arXiv preprint, 2023. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.