Detecting and recognizing seven segment digits using a deep learning approach

. Recognizing seven-segment digits is a specific task within the broader field of text detection and recognition. Seven-segment digits are commonly used for displaying numerical information in various applications. However, accurately detecting and recognizing these digits can be challenging due to factors like LED bleeding, glare, and the presence of printed text alongside the digits. The experiment described in this paper aims to identify the most effective models for detecting and recognizing texts and assess their accuracy and performance under different environmental conditions. The experiment reveals that DBNet from PaddleOCR is the best model for text detection, while PARSeq has the best accuracy for recognizing seven-segment digits on the 7Seg dataset. PARSeq also performs well on a custom dataset with lower LED ratios but struggles with glare conditions. Excluding special characters improves accuracy in all conditions.


Introduction
Seven-segment displays are widely used for showing numbers on screens, known for their energy efficiency and affordability in applications like industry and households.This has given rise to Seven-Segment Optical Character Recognition (SSOCR) as part of Optical Character Recognition (OCR), driven by deep learning progress.While SSOCR has mainly focused on specific fields such as healthcare [1,2] and industry [3,4,5,6], its use in everyday settings with hardware limitations and environmental factors has been limited.In these scenarios, challenges arise when images contain both printed text and seven-segment displays.Assessing text detection in real-life situations, especially in devices with both text and seven-segment digits, can be complex.Pre-trained models may struggle to differentiate them, leading to unreliable outcomes.Furthermore, issues like LED bleeding, which blurs the decimal point with nearby digits, make precise seven-segment digit detection difficult.Fortunately, recent advancements in scene text detection and recognition offer more robust solutions for diverse scenes and font variations.

OCR Pipeline
OCR, a computer vision subfield, involves converting text images into machine-readable format.It traditionally extracts text from scanned documents and modernly focuses on scene text recognition.Figure 1 depicts the core OCR pipeline processes: text detection and recognition.

Research methodology
Pretrained models have been chosen based on benchmark results from ICDAR 2013 and ICDAR 2015.These models are selected for transfer learning.The datasets used in this experiment were captured using smartphones and consist of horizontal seven-segment digits, often with slight rotation and blurriness.The similarity between these datasets and the pretrained models allows for the application of features learned from the models to improve seven-segment digit detection and recognition.This approach of transfer learning results in significantly better performance compared to training with a limited amount of data.The detailed information about the ICDAR 2013 and ICDAR 2015 datasets presented in Table 2.

Experiment of seven segment digits detection using scene text detection pre-trained models
The purpose of the experiment is to identify the appropriate scene text detection model, which can detect the presence of seven segment digits from images.The metric used for evaluation is the F-measure, specifically the balanced F-measure, which is defined as the weighted harmonic mean of precision and recall.By considering both precision and recall, it provides a single score for assessing the performance of classification machine learning models.The ICDAR 2015 Evaluation Protocol for text detection [9], based on IoU (Intersection over Union), is employed to automatically calculate the F-score for each model.The experiment will involve running each model on the same dataset and comparing the computed F1 score.The evaluation script, ground truth, predicted texts, and results are accessible at: https://github.com/Milsk01/FYP.

Datasets collection and pre-processing
For simplicity, the testing dataset from the Seven Segment Display (SSDI) Dataset, which consists of 257 labelled images featuring only seven-segment digits, will be used for the evaluation.The SSDI dataset uses .pickleformat for labels, containing bounding boxes for each seven-segment digit in test images.To align with the ICDAR 2015 evaluation script, a Python script has been created.It creates ground truth text files for each test image, consolidating multiple bounding boxes into one for streamlined evaluation.The watermark logo from the bottom right of test images have been removed to ensure accurate results.

Evaluation metrics
Apart from precision and recall, Intersection Over Union (IoU) is also used in this experiment.IoU is a measure based on the Jaccard Index that quantifies the degree of overlapping between two bounding boxes.Figure 2 shows the equation used for the calculation of IoU.

Result and Discussion
Table 3 shows

Experiment of seven segment digits recognition using scene text recognition pre-trained model
Seven-segment digits detection is the process of predicting seven-segment digits from input images, as shown in Figure 3.The purpose of the experiment is to identify the appropriate scene text recognition model, which can reliably recognise seven-segment digits from images under different environmental conditions.The datasets which will be used in this experiment are 7Seg [10] and a custom dataset collected from data contributors from Universiti Tenaga Nasional.

Fig.3. Example of text recognition result using PARSeq model.
The first stage of the experiment will be conducted by running each text recognition algorithm on the 7Seg dataset.Then, the list of predicted text is compared against the ground truth.The ICDAR 2015 evaluation protocol based on a standard edit distance metric [11] will be used to evaluate text recognition accuracy and character error rate.Edit distance, also known as Levenshtein distance, is a metric to measure the difference between two strings and is given by: The more similar the two string sequences are, the lower the number of edits needed.Thus, it results in a lower Levenshtein distance.For comparison purposes, the Levenshtein distance will be used to evaluate the Character Error Rate (CER).It can be represented using this formula below: The well-performing pretrained models will be used in the second stage of the experiment to identify the performance of the selected scene text recognition algorithm under various environmental conditions.

Result and Discussion
Table 4 shows the experimental results conducted using the 7Seg dataset.Two main evaluation metrics, accuracy and average character error rate are used to tabulate the result.The average character error recognition is obtained by totalling the character error rate for each image and dividing it by the total number of images.Thus, the pretrained models can be evaluated from word-level accuracy and character-level accuracy.Overall, the PARSeq model has the best accuracy of 56.97 % with the lowest average CER compared to other pretrained models from PaddlOCR and MMOCR.One of the possible reasons for the performance gap is the use of multiple datasets, including synthetic and real-world datasets for training PARSeq.Besides that, as all text recognition pretrained models from PaddleOCR using MJSynth and SynthText for training, it is safe to say the pretrained model with the best performance from PaddleOCR on the 7Seg dataset is SVTR, which only has a single vision mode.Finally, although NRTR has the best accuracy among all the pre-trained models from MMOCR, ABINet has the lowest average character error rate.Table 5 shows the result of using PARSeq to detect seven-segment digits on the custom dataset.Compared to the performance on the 7Seg dataset, PARSeq has a higher accuracy on the custom dataset due to the lower ratio of LEDs in the dataset.A common problem that exists in recognizing seven-segment digits on LEDs is the noise from the turned-off segment and improper cropping.After excluding all the special characters, PARSeq shows a significant improvement in accuracy in all 4 conditions.This is due to the lack of representation of these characters in the scene text.Besides that, the decimal point is easily fused into adjacent digits due to the 'bleeding' effect as shown in Figure 4

Conclusion
The objective of this paper is to evaluate the performance of various pretrained models for text detection and recognition on different datasets, specifically focusing on the detection and recognition of seven-segment digits.The experiment reveals the performance of various pretrained models for text detection and recognition on different datasets.Overall, DBNet from PaddleOCR is the best model for text detection, while PARSeq has the best accuracy for recognizing seven-segment digits on the 7Seg dataset.PARSeq also performs well on a custom dataset with lower LED ratios but struggles with glare conditions.Excluding special characters improves accuracy in all conditions.
The successful completion of this paper and the associated research owes its realization to the outstanding support extended by the Yayasan Canselor Universiti Tenaga Nasional (YCU), under Grant No: 202210037YCU.

Fig. 1 .
Fig.1.Overview of the OCR pipeline.Scene text detection in scenes is a critical step in OCR pipelines, as it pinpoints text areas using bounding boxes.Text detection methods can be classified into two main groups: regression-based and segmentation-based.Regression-based techniques adapt object detection for regular text but struggle with irregular text.Segmentation-based methods excel at diverse text shapes but may be slower and have difficulty with overlapping text.Scene text recognition predicts text from cropped scene images, often obtained through text detection.These algorithms can be grouped into two categories: regular text recognition, handling horizontal text like printed fonts, and irregular text recognition, managing curved, blurred, covered, or non-horizontal text.There are more recognition options available for irregular text.Several OCR toolboxes are available, including Paddle OCR, MMOCR, EasyOCR, and Tesseract.Refer to Table 1 for the comparison between open-source OCR toolboxes.
the result of the experiment.Due to the incompatibility issue, TextFuseNet and CharNet are run on Google Collab instead of the local environment.Thus, the Frame Per Second (FPS) results for both TextFuseNet and CharNet are not included.TextFuseNet, DB, and PANet are the best models from the ICDAR 2013 &2015 benchmark, PaddleOCR and MMOCR, respectively, in terms of F-score and FPS on the SSDI dataset.In general, all models excluded CharNet have achieved an average F-score of 79.88 ±0.0822.A low precision in all models except CharNet is observed, which reflects the prevalence problem of misdetection in text detection algorithms.Due to the limited size and variability of the dataset, the result showed no correlation between the F-score on the ICDAR 2013 & 2015 benchmark and the performance of seven segment digits detection.Overall, DBNet from PaddleOCR is the best model among all tested pretrained models in terms of F-score and FPS.
of substitution D = Number of deletion I = Number of insertion N = Number of characters in text
(b).Out of 4 conditions, the PARSeq model has the worst performance under the glare condition.Parts of the digits might be occluded by the glare or reflection of the light, making it difficult to recognize the digit, as shown in Figure 4(c).

Table 3 .
Effectiveness of scene text detection algorithms in detecting seven segment digits.

Table 4 .
Effectiveness of scene text recognition algorithms in recognizing seven segment digits.

Table 5 .
Results of using PARSeq.

Table 6 .
Effectiveness of PARSeq in recognizing seven segment digits in different conditions.