| Issue |
ITM Web Conf.
Volume 78, 2025
International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)
|
|
|---|---|---|
| Article Number | 02006 | |
| Number of page(s) | 7 | |
| Section | Machine Learning Applications in Vision, Security, and Healthcare | |
| DOI | https://doi.org/10.1051/itmconf/20257802006 | |
| Published online | 08 September 2025 | |
Multiple Machine Learning Models-Based Diabetes Prediction and Feature Importance Analysis
School of Data Science and Engineering, South China Normal University, Shanwei, China
Due to the increasing number of diabetic patients in recent years and the inadequacy of traditional diabetes prediction methods, machine learning models with many advantages should be used to predict diabetes. The study selects the data from data set which is on the Kaggle and analyzes them through four models which are used to predict diabetes. The four models are logistic regression, k-nearest neighbor, decision tree and random forest. The optimal model is derived from comparing the prediction accuracy of these four models for diabetes. Based on the optimal model, important features for predicting diabetes are analyzed. Through the above methods, the findings of this study indicate that the random forest model is the most effective, achieving an accuracy rate of 79.870%. At the same time, the results show that the decision tree model has the worst prediction effect on diabetes, with an accuracy of 72.727%. On the basis of random forest as the optimal model, this study finds that glucose, Body Mass Index (BMI) and age are the top three influencing factors, respectively.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

