A sight on defect detection methods for imbalanced industrial data

Product defect detection is a challenging task, especially in situations where is difficult and costly to collect defect samples. Which make it quite difficult to apply supervised algorithms as their performances decrease by training the model on imbalanced data. To tackle this problem, researchers used data augmentation and one-class classification to detect defects in industrial areas. In this paper, we list defect detection applications for imbalanced industrial data and we report the benefits and limitation of those methods.


INTRODUCTION
Nowadays, it is quite obvious that efficient defect detection system is one of the key pillars of successful industrial company [1] As we witness the transition to the era of quality 4.0, moving towards automatic defect detection solution is more than ever a high priority for manufacturers. Recently, approaches using artificial intelligence and deep learning are increasing. Neogi et al. [2] introduced a thresholding method to detect the defective regions on steel surfaces. Shi e al [3] proposed an edge-based method by using an improved Sobel algorithm. LIN et al [4] presented a deep learning method to detect steel defects via the integration of a Faster-Region Convolutional Neural and the Single Shot MultiBox Detector, in [5] a Cascading convolutional neural network was proposed; a system that includes two phases, the first one learns possible defects while the second one aims to classify the types of defects.
In fact, a lot of those methods have proven to be promising and effective [6]. However, they still encounter difficulties to be implemented in real applications, mainly due to the large amount of data required to train the classifier. In industrial areas, the number of good samples far exceeds the number of defective ones, and collecting a fair amount of defect class data is either expensive or very difficult, which result in imbalanced data. This limited availability of data makes automatic defect detection based on machine learning algorithms more challenching. Traditional classification methods fail to provide good results in imbalanced data scenarios, the algorithm pays more attention to the majority class and thus the algorithm's performance decreases.
To handle class imbalance, data augmentation is a popular strategy used in conjunction with binary or multiclass classifiers. However, when there is extreme class imbalance, one-class classification algorithms are more suited to deal with this task [7].
In this paper, we list applications in industrial area for imbalanced data based on data augmentation and one class classification and discuss the benefits and limitations of those methods.

DEFECT DETECTION METHODS WITH DATA AUGMENTATION
Data augmentation is a technique used to enlarge small data set without having to gather more data. This operation can be done in two ways: geometric transformation using functions as noise, rotation, and blurring. The second way is generating synthetic samples using generative models trained on the existing data.
Han et al. [8] proposed defect detection method by using the stacked convolutional autoencoders. The autoencoders were trained on non-defected data and ITM Web of Conferences 43, 010 ICAIE'2022 https://doi.org/10.1051/itmconf/20224301012 12 (2022) synthetic defected data by using expert-based knowledge of defect characteristics. However, this system has few limitation, since the synthetic defect data were generated based on the knowledge of the experts, the classifier fails to detect unknown defects. Jain et al. [9] suggested a data augmentation method using Generative adversarial networks to generate synthetic data then they used Convolutional Neural Network to classify surface defects in hot-rolled steel strips . Shon et al [10] proposed automatic data augmentation with : rotation, flipping, shifting, shearing range, and zooming techniques and deep learning method to identify defects of wafer. Zheng et al. [11] developed a generic semi-supervised deep learning model for automated surface inspection using data augmentation. However, a large of unlabeled dataset is required to achieve high performance.

3.ONE CLASS CLASSIFICATION ALGORITHMS
One-class classification denotes a category of classification algorithms that address cases where few to none defect samples are available for training; which is quite common in industrial areas ,and with that, defects are seen as a deviation from defect-free class. One-class classification algorithms involve only defect-free samples during training [12].
Liu et al. [13] presented anomaly detection method utilizing variational autoencoders ,while Wang et al. [14] addressed the same task using the Vector Quantised-Variational AutoEncoder (VQ-VAE).In [15] One-class learning method is applied for defect detection. Firstly, normal samples are fed into deep autoencoder to create a reference descriptor feature vector via the encoder's output layer, and then in the test phase, the trained model generates a test descriptor to be compared to the reference vector via L2-norm. Works adopting autoencoders as one class classifier trained only on normal class, presume that the model generates higher reconstruction error for defective samples. However, in some cases, abnormal samples could be well reconstructed as the normal samples. In order to mitigate this situation ,works [16] and [17] developed two approaches. Bergmann et al. [16] replaced the per-pixel error by structural similarity metric to measure the reconstruction accuracy of the autoencoder, as this metric takes luminance, contrast, and structural information into account. While Gong et al. [17] augmented the autoencoder with a memory that records relevant normal data during training.
Most of those one-class classification methods use reconstruction error as a criterion for defect detection, and as stated above, in some applications the autoencoder tend to reconstruct well the abnormal samples as the normal samples, especially when it comes to complex texture images.

DISCUSSION AND CONCLUSION
In industrial area, owing to nature of some sectors or to the improvement approaches adopted to reducing defective products, it is difficult to collect enough data to train effectively binary or multiclass classifiers. Researchers have adopted data augmentation and oneclass classification as solutions to handle imbalanced data.
Data augmentation with binary classifiers or oneclass classification, have shown promising results in handling the problem of imbalanced industrial data. However, there is still room for improvement. It would be fruitful to evaluate those methods on more datasets with different imbalance rates to get clear understanding of the impact of imbalance rates on each algorithm's performance, and as a result, get a better understanding of the choice between binary classification with data augmentation and one-class classification. And more efforts are needed to examine the limits of amount of data that could be added to provide better performance. Another valuables research directions, are investigating the algorithms that benefit more from data augmentation and developing more hybrid one-class classification model to create powerful defect detection systems.