Issue |
ITM Web Conf.
Volume 70, 2025
2024 2nd International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2024)
|
|
---|---|---|
Article Number | 03001 | |
Number of page(s) | 11 | |
Section | Image Processing and Computer Vision | |
DOI | https://doi.org/10.1051/itmconf/20257003001 | |
Published online | 23 January 2025 |
Text vectorization in sentiment analysis: A comparative study of TF-IDF and Word2Vec from Amazon Fine Food Reviews
ECS, University of Southampton, SO16 1BJ, United Kingdom
* Corresponding author: jl29u23@soton.ac.uk
Sentiment analysis is a practical tool for marketing and branding teams. Companies can collect and analyze opinions or reviews from social media platforms, blog posts, and other numerous forums. It may help them acquire positive feedback to reinforce strengths or identify negative emotions to make improvements. The research is to compare two text vectorization methods in opinion mining: Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec, using Amazon Fine Food Reviews dataset. This study will use these two methods to vectorize preprocessed text data and also input the vectorized data to the emotion classification model, analyzing the performance of two methods in the emotion classification task. The consequence indicates that the former outperforms the latter in handling large datasets, particularly in distinguishing between different sentiment categories, but latter is superior in capturing the semantic relationship of words. Therefore, it is suggested that the advantages of the two methods be combined in practical applications to improve the accuracy and efficiency.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.