Transformer-Based Knowledge Distillation with Ghost Attention for Multimodal Edge-Based Smart Surveillance

Zahrah Sataar

doi:10.1051/itmconf/20257901033

Open Access

Issue		ITM Web Conf. Volume 79, 2025 International Conference on Knowledge Engineering and Information Systems (KEIS-2025)


Article Number		01033
Number of page(s)		8
DOI		https://doi.org/10.1051/itmconf/20257901033
Published online		08 October 2025

ITM Web of Conferences 79, 01033 (2025)

Transformer-Based Knowledge Distillation with Ghost Attention for Multimodal Edge-Based Smart Surveillance

Zahrah Sataar^*

Department of Computers Techniques Engineering, College of Technical Engineering, The Islamic University, Najaf, Iraq

^* Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

In the modern era, knowledge distillation has gained attention as an important technique for edge-based smart surveillance that integrates accurate yet lightweight deployable models on resource-constrained devices. However, the existing YOLOv8 based method which integrates Coordinate Attention (CA) and Masked Generative Distillation (MGD) has faced challenges, such as relying only on infrared data, losing potential features due to excessive Learnable Dilated Convolution (LDConv) usage, and the rigidity of fixed-mask distillation. This research proposes an enhanced framework that integrates cross-architecture knowledge distillation. Infrared (IR) images are collected from the Forward Looking Infrared (FLIR) dataset, and Red Green Blue (RGB) images are collected from the Korea Advanced Institute of Science and Technology (KAIST) dataset. This followed by preprocessing using letterbox resizing, mosaic augmentation, and class-balanced sampling. In the proposed cross-architecture distillation setup, a transformer-based detector is employed as the teacher to capture long-range dependencies and contextual relations across the image, whereas a lightweight YOLOv8n optimized with Ghost Attention (GA) and a hybrid convolutional design are employed as students. Finally, Adaptive Masked Generative Distillation (A-MGD), which dynamically adjusts the mask ratio and distills multilevel features, is used to enhance knowledge transfer. The experimental results demonstrated that the proposed Transformer-teacher Knowledge Distillation for YOLOv8n student (TransKD-YOLOv8n) framework achieved higher precision (74.85%) and recall (68.90%).

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.