The Research and Improvement of SDT Algorithm for Historical Data in SCADA

. With the rapid development of Internet of things and big data technology, the amount of data collected by SCADA(Supervisory Control And Data Acquisition)system is growing exponentially, which the traditional SDT algorithm can not meet the requirements of SCADA system for historical data compression. In this paper, ASDT(Advanced SDT) algorithm based on SDT algorithm is proposed and implemented in the Java language, which is based on the deep research of the data compression method, especially the Swing Door Trending. ASDT algorithm through the sine curve fitting data to achieve data compression, compared with the performance of the traditional SDT algorithm, which it can achieve better compression results. The experimental results show that compared with the traditional SDT algorithm, the ASDT algorithm can improve the compression ratio in the case of no significant increase in the compression error, and the compression radio is increased by nearly 50%.


Introduction
SCADA system is one of the process industry system based on configuration software, through the real-time detection and control of industrial equipment to achieve real-time data acquisition and storage of process data, the parameters of the device settings, real-time display of warning signals, and other functions.
With the development of Internet of things and cloud storage technology, amount of data collected by RTU is more and more large which acquisition cycle is also getting shorter and shorter [1].Historical database constantly storing the historical data is an important part of the SCADA system, which can predict and analyze the system fault in advance to ensure the safe and stable operation of the SCADA system.Because the massive historical data in SCADA system has caused great burden to data storage and data processing, the efficient data compression technology can save storage space and improve the utilization rate of storage space.In order to ensure the efficient storage of mass data, it is very useful to data compression technology in the process of historical data storage in SCADA system.
In industry, there are two methods for data compression ---lossless compression and lossy compression [2].For lossless compression, it is that the data is totally the same after and before the compression; for lossy compression, it is that the compressed data will be uncompressed, by which the uncompressed data will be deviated from the original data, however, the deviation will not affect the variation trend [3].For SCADA system, there are three methods for compressing historical data ---piecewise linear method, vector quantization and signal conversion method.For piecewise linear method, it includes boxcar method, backward slope method, SDT method and PLOT method [4].SDT method is used for industrial flow most because it is easy and simple to be operated, as well as that it can be executed quickly.Signal conversion includes lots of methods, such as DCT (discrete cosine transform) and WT (Wavelet Transform) [5].Today, WT is being widely used.However, compared with SDT, WT is more complicated.
2 Strategy for SCADA historical data compressing

Compressing quantity of state
For quantity of state signal, there are only two values --0 and 1, which will be collected by the gathering module at a fixed interval.In real situation, the data related to quantity of state will not change within a while, that is the state signal collected right now is the same as the signal collected last time.For compressing quantity of state, the method that once the data changes, the data shall be collected and saved shall be followed.That is only when the data changes can it be saved.See Figure1, the time t0-t7 has the quantity of state 1,0,0,1,0,0,0,0.Then according to this method, only the quantity of state of t0, t3 will be saved while others will not.

SDT algorithm
SDT is a kind of data compression method based on linear fitting, which saves the data by establishing parallelogram while the data which is covered by the parallelogram will not be saved.That is only data out of the parallelogram will be saved [6].See Figure2 for SDT steps.In Figure2, it is assumed that point a is the first data in the first compression duration, and the coordinate point (o, o') whose distance to point a is E, will be as the two supporting points for the revolving door.Once there is new data, the slope between o,o' and new data will be calculated.If there is only one data (a), o,o' and a is actually perpendicular to o,o'.The slope is 0, which means that the revolving door is closed; if there are more and more data, the revolving door will be gradually opened, whose slope will be changed at the same time, where we can find that the upper supporting point will have the maximal slope while the lower supporting point will have the minimal slope [7].Within a while, once the door is opened, it cannot be closed until the slope of upper supporting point is the same as that of lower supporting point.As long as the two doors are not parallel to each other, or the sum of two interior angles is less than 180 degrees, t-he operation on revolving door will be continued.In Figure 2  From what mentioned above (how SDT works), it can be found that there are some problems: first, the threshold value is the only controllable parameter, so E will directly affect the compression performance.Once E is defined, it cannot be changed during the compression.How to define E shall be realized via experimenting for a long time; if E is not proper, SDT performance will be greatly lowered.Second, if the data change is slow during a time, the time used by data compression will be long, which makes the instantaneity of linear trend lost [8].Third, in most situations, the data change is not linear.It is known that SDT is a method to fit data via linear trend.

Improved SDT
For the aforementioned problems, the researchers have proposed how to improve it.For example, Wang Ju [9], on the basis of the defect suffered by SDT --E cannot be changed, proposed that E can be adjusted according to the initial threshold value; Yu Songtao [10] proposed an ideology --dynamic regulation on allowance, that is the threshold value can be set dynamically according to the allowance.Qu Yilin, [11] depending on feedback control system, by which the allowance is able to have a dynamic change during the compression.Zhang Jingtao [12] proposed that the performance of data compression is largely determined by the compression deviation, so the compression deviation parameter shall be selected in a dynamic way.Duan Peiyong [13] proposed that a slope of a straight line cannot be used to show the change trend of several data, all the saved data shall be obtained by calculation but not by the compressed data.Ning Hainan [14] proposed that fitting of change trend shall be done via parabola, by which data compression can be realized; Zhang Jian [15] proposed that the abnormal points shall be inspected and removed, and the length of recording limit shall be set etc.

Basic principle of ASDT
For the improved SDT, the fitting is done between the straight line and compressed data; however, in real situation, the data change will not keep increasing or decreasing in a while.Even though there is fitting via curve, the change trend of data cannot be changed radically.It means that within a while, the data change is monotonous.Meanwhile, the improved SDT proposed that FSRL (Forced Storage-Recording Limit) can be set, which will make the calculation more complicated [16].For ASDT, the data fitting is done via sine curve.So during a duration, the data will both increase and decrease, which is consistent with the change rule.The cycle of sine curve can be used to solve that how to set the recording limit.For this experiment, the data is the remote temperature got by SCADA, 4 groups of data are selected.The digit quantity is 1000, 5000, 10000 and 50000 respectively.In Figure 4, there is a comparison on the compression ratio between SDT and ASDT.In Figure 5, there is a comparison on the compression error between SDT and ASDT.In Table 1, the data is about the experiment result; from the chart and the table, we can know that the compression error of SDT and ASDT almost the same; however, the compression ratio of ASDT is much higher than that of SDT.

Conclusion
In this thesis, it proposed an improved calculation based on the compression of revolving door, as well as an analog computation on the test data by compiling program.ASDT solves the problems suffered by traditional method, like fitting data via straight line does not consider the actual change happened during the data collection; most importantly, ASDT is no need to set the length of compression interval.By the experiments above, it compares SDT and ASDT, which concludes that ASDT can greatly enhance compression ratio without increasing error.

Figure 1 .
Figure 1.Compressing Quantity of State , the first compression duration is started from a to d, the straight line from a to d stands for data(a,b,c,d);Similarly, the DOI: 10.1051/ , 01009 (2017) 71101009 11 ITM Web of Conferences itmconf/201IST2017 second compression duration is started from e, the straigh line from e to h stands for the data (e, f, g, and h).

Figure 2 .
Figure 2. The basic steps of SDT algorithm

Figure 4 .Figure 4 .
Figure 4. Comparison on the compression ratio between SDT and ASDT