Issue |
ITM Web Conf.
Volume 56, 2023
First International Conference on Data Science and Advanced Computing (ICDSAC 2023)
|
|
---|---|---|
Article Number | 02002 | |
Number of page(s) | 10 | |
Section | Data Science | |
DOI | https://doi.org/10.1051/itmconf/20235602002 | |
Published online | 09 August 2023 |
Categorical Embeddings for Tabular Data using PyTorch
1 Multidisciplinary Engineering Department, Vishwakarma Institute of Technology, Pune, 411037, Maharashtra, India
2 Multidisciplinary Engineering Department, Vishwakarma Institute of Technology, Pune, 411037, Maharashtra, India
3 Multidisciplinary Engineering Department, Vishwakarma Institute of Technology, Pune, 411037, Maharashtra, India
4 Multidisciplinary Engineering Department, Vishwakarma Institute of Technology, Pune, 411037, Maharashtra, India
* Corresponding author: sanskruti.khedkar21@vit.edu
Deep learning has received much attention for computer vision and natural language processing, but less for tabular data, which is the most prevalent type of data used in industry. Embeddings offer a solution by representing categorical variables as continuous vectors in lowdimensional space. PyTorch provides excellent support for GPU acceleration and pre-built functions and modules, making it easier to work with embeddings and categorical variables. In this research paper, we apply a feedforward neural network model in PyTorch to a multiclass classification problem using the Shelter Animal Outcome dataset. We calculate the probability of an animal's outcome belonging to each of the 5 categories. Additionally, we explore feature importance using two common techniques: MDI and permutation. Understanding feature importance is crucial for building better models, improving performance, and interpreting and communicating results. Our findings demonstrate the usefulness of embeddings and PyTorch for deep learning with tabular data and highlight the importance of feature selection for building effective machine learning models.
© The Authors, published by EDP Sciences, 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.