Facial Image Analysis for Autism Spectrum Disorder Detection Using Vision Transformer and State of the Art Deep Learning Frameworks

Sivakumar S; Lakshmi D

doi:10.1051/itmconf/20268203022

Open Access

Issue		ITM Web Conf. Volume 82, 2026 International Conference on NextGen Engineering Technologies and Applications for Sustainable Development (ICNEXTS’25)


Article Number		03022
Number of page(s)		6
Section		Information and Technology
DOI		https://doi.org/10.1051/itmconf/20268203022
Published online		04 February 2026

ITM Web of Conferences 82, 03022 (2026)

Facial Image Analysis for Autism Spectrum Disorder Detection Using Vision Transformer and State of the Art Deep Learning Frameworks

Sivakumar S¹^* and Lakshmi D²

¹ Assistant Professor, Department of Electrical and Electronics Engineering, St. Joseph’s College of Engineering, OMR, Chennai 600119, India.
² Associate Professor, Department of Electronics and Communication Engineering, St. Joseph’s College of Engineering, OMR, Chennai 600119, India.

^* Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition that can make it difficult for individuals to communicate and connect with others. It touches families and communities everywhere, regardless of geography or background. Those who living with ASD, even everyday tasks can become significant hurdles. Diagnosing autism typically requires a series of medical evaluations, which are not only time-consuming but can also be costly. Unfortunately, many people with ASD go undiagnosed or unsupported. This often happens because there’s still not enough awareness or understanding about autism, and resources like diagnostic services and specialized therapies such as speech or behavioral support .This work conducts an extensive evaluation of automated early Autism Spectrum Disorder (ASD) detection utilizing children facial images. A range of advanced deep learning models including Vision Transformer (ViT), AlexNet, MobileNet V2, MobileNet V3, DenseNet-121, DenseNet-169, and ResNet x-400MF—were implemented and systematically compared. The models were trained on preprocessed datasets, integrating both convolutional and transformer-based models to extract features relevant to ASD specific facial markers. Results from this work indicate that these architectures achieve strong classification performance for identifying ASD in facial images for early, scalable, and non-invasive ASD detection and further used to identify levels of ASD , timely diagnosis and intervention.

Key words: Autism Spectrum Disorder (ASD) / Deep Learning / Vision Transformer (ViT) / Transformer networks / convolutional neural networks (CNN) / Autism detection / Neuro developmental disorders

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.