Optimized use of Wavelet Packet Trees for the analysis of electrical waveforms

. Wavelet packet trees represent a topic which grows in popularity when it comes to analysis of electrical waveforms. It allows for time-frequency analysis providing information on narrower ranges of frequency (as compared to the faster Discrete Wavelet Decomposition), but the computational resources are significantly greater than that involved in other types of wavelet-based analysis. In order to allow for this type of analysis to be usable in real-time applications, that is – to reduce the runtime, original algorithms were conceived and tested. In the first part of this work, previously implemented algorithms are briefly described, along with their pros and cons. Afterward, a new runtime optimization algorithm is proposed. Details on data structures, workflow, tests and study of errors are provided. This algorithm diminishes with up to 59% the runtime required by the application of the superposition theorem in order to evaluate the contribution of clustered harmonics using WPT.


Introduction
Over the years engineers and researchers realized that the conventional Fast Fourier Transform (FFT) was not suitable to process signals of complex, dynamic nature, often transient and non-stationary [1]. Among other disadvantages, FFT lacks time localization and assumes that the 2-nd half period is symmetrical to the 1-st one. To address this problem, time-frequency representations were sought and developed [1,2].
In this context the wavelet transforms (WT), which transform a function from the time domain to the time-scale domain (the scale being indirectly associated with frequency), became more and more popular. Furthermore, the WT is a reversible transform, which makes the reconstruction or evaluation of certain signal components possible. WT is particularly useful for the transaction of 2 major tasks in signals of complex (transient and/or nonstationary) nature: de-noising and feature extraction [1,2]. Special capabilities related to "low-pass" filtering were revealed by some of the authors of this work in [3].
An important fact observed when using wavelet mothers (WM) from the Daubechies family, was the overlap between the frequency bands (frequency aliasing) associated with the Discrete Wavelet Transform (DWT) decomposition of their signals [1,2]. The decompositions relying on Wavelet Packet Trees (WPT) reduce up to a certain extent this drawback [4]. WPT provides a uniform cover of the signal and thus its frequency resolution is superior to that provided by DWT [5].
When using a high-order Daubechies wavelet for signal decomposition, this effect is less intense than when using a low-order one. But longer filters mean more computational resources, the runtime becoming critical [6].
The authors obtained useful results relative to filtering properties of terminal nodes from a WPT tree with the following topology (referred from this point on as T7): 512 components in the root node, 7 levels and filters from the Daubechies family with 28 components [7]. The associated wavelet mother will be referred from this point on as ‚db14', this name being also used by the wavelet toolbox from Matlab.

Previous work -pros and cons 2.1 Using truncated filters
Some of the authors of this paper were firstly concerned with shortening the runtime required by the T7 decomposition when using truncated versions of the low-pass and high-pass filters [6]. They started from the idea that, when rounded to 4 decimal places, 6 coefficients from the 'db14' filters become 0. The number of iterations supporting the convolution products involved by the Wavelet analysis with a wavelet mother that uses filters with only 18 components (all non-zero) instead of 24 is restricted to 3/4 as compared to those involved by the use of 'db14'. The wavelet mother with shorter filters obtained from 'db14' as described above will be referred from this point on as 'db14r'.
Statistical evaluations (considering sets of 20 consecutive programs executions) were performed considering full decompositions and evaluation of power quality parameters of test signals. The mean runtime savings proved to be 20.4% [6].
Tests relative to the errors introduced by truncations were made on synthetic signals whose harmonic spectra is similar to electric signals from power applications.
The analysis of the relative percentage errors (RPE) introduced by 'db14r' when evaluating the energies of the nodes from the 7-th level (nodes referred from this point on as N7 or terminal nodes) revealed that their absolute value did not exceed 2%. Similar maximal deviations of 2% in absolute value were noticed for the weights of harmonic orders which are more significant in power applications (e.g. 5,7, 11,13, 17 and 19) [6].
The advantages provided by this runtime optimization technique are: • a constant runtime saving, irrespective to the decomposed signal; • simple implementation, not involving additional data structures and consequently not requiring more memory; • applicability to hardware devices of combinational type (it can provide energies of terminal nodes using only blocks which perform summations and products).
The main drawbacks associated to this runtime optimization technique are: • inherent errors introduced by filters' truncation (their evaluation presented above was made on restricted sets of data, but it is impossible to make an exhaustive study such as to take into account all possible combinations of harmonic weights and phase shifts); • significant runtime savings obtained when only nodes affected by low order harmonics are computed are not possible in this "pure truncation-based" implementation.

Using joint analysis relying on DWT and WPT trees
A tree segmentation in WPT subtrees was conceived and implemented, relying on the T7 terminal nodes' connection to the harmonic orders (HO) which affect their energies and also affect the energies of the vectors of details for various levels of an unbalanced DWT tree with 7 levels and relying on 'db14' [2]. Details on WPT and DWT trees implementation are provided in [6]. Deviations in the harmonic spectra involving relatively narrow ranges of harmonic orders (e.g. electromagnetic interferences -EMI) can be correctly detected and identified by means of a time-efficient joint analysis, relying on DWT and WPT trees [2]. The appropriate properties relative to adjacent harmonic ranges exhibited by the energies of DWT vectors of details allow for the identification of (combination of) levels with energy deviations. Once the combination is known, the WPT decomposition is restricted to a specific subtree. This is significantly reducing the number of nodes from the WPT tree which have to be computed, strictly to those which are affected by the harmonic spectrum change. Afterward the energy deviations of the selected WPT terminal nodes are computed in order to identify the deviations.
To an end, the clustering properties of the terminal WPT nodes with individual harmonic orders [7] are applied in order to evaluate the specific harmonic orders responsible for the modification of the harmonic spectrum. The tests on both synthetic and real signals validated the algorithm and proved runtime savings varying from 10% to 90% [2].
The main advantages provided by this technique are: • It can be successfully applied both for faults detection and EMI detection and evaluation; • It can provide maximum runtime savings of 90%. The main drawbacks associated to this technique are: • It has a limited applicability (when more than 3 levels from the DWT tree are affected, it is not applicable); • Requires additional data structures which have to be stored and this can be a problem for controllers with limited memory.

Using time-efficient WPT decompositions to analyse Electromagnetic Compatibility issues
The decompositions relying on WPT trees can be used as a reliable tool for evaluation of (un)steady signals' components characterized by high frequencies belonging to bandwidths narrower than those obtained with techniques addressing the same goal [3]. A time-frequency analysis is possible as well, as an intrinsic feature of any wavelet-based technique, the identification of frequency ranges being more accurate than that provided by DWT. The vectors hosted by all WPT tree nodes can be decomposed in an optimal way, requiring significantly smaller runtimes. This is possible by using labels for each node, which instruct the underlying decomposition algorithm to compute for each node, when possible, only the vector from one of the "children nodes" and decide that the vector from the other node is 0, or even to avoid any decomposition, the associated children nodes remaining "all 0" vectors. The algorithm relies on original functions conceived by authors, which implement the WPT decomposition by using DWT based decompositions of vectors hosted by nodes by performing either only high pass filtering, or only low pass filtering [3].
Tests on synthetic signals and respectively those on real electric waveforms acquired from the terminals of a converter used for auxiliary services from a locomotive validated the algorithm and proved its usability in analysing EMC interferences [3]. The mean runtime value computed across 14 combinations of the input parameters (cutoff frequency and accuracy) is equal to 0.0052 sec., revealing a 67.5% runtime saving in the global picture and a 77.5% runtime saving for lower cut-off frequencies [3]. The advantages provided by this technique are: • Possibility to compute only the nodes affected by harmonics lower than a "cut-off" frequency, provided as input data, considering a selectable accuracy, with high usability at signals where the Fast Fourier Transform cannot be applied; • Usability in analysing EMC interferences, which is an intensively studied topic today. The main drawbacks associated to this technique are: • Requires additional data structures which have to be stored; • There is a limited number of cut-off frequencies which can be used as input data (smaller for higher accuracies).
3 Time-efficient application of the superposition theorem to evaluate the contribution of clustered harmonics using WPT

Clusters of terminal nodes affected only by low harmonic orders
A new runtime optimization technique related to WPT decompositions/recompositions is proposed in this paper. It was conceived because statistics revealed that the most significant harmonic orders which pollute the electric waveforms from power systems are odd and lower than 41 (e.g. 5,7,11,13,17 etc). Therefore, most probably a deviation from stationarity is produced by a variation in the weight/phase shift of such harmonic order. According to the " pairing" mechanism explained in [7], the odd HOs with orders lower than 43 can be grouped in disjunct sets (clusters) with 2 or 4 members. Harmonics from such set can affect only the vectors hosted by certain N7 nodes in a sort of "exclusive" manner. In other words, for a certain set of 4 N7 nodes (CN7x), the odd harmonic orders Hx1,Hx2,Hx3 and Hx4 associated to CN7x can affect at most the vectors hosted by the N7 nodes Nx1,Nx2,Nx3 and Nx4 from CN7x and the only odd harmonic orders which can affect these nodes are Hx1,Hx2,Hx3 and Hx4. Table 1 gathers information of the sets consisting of 2 nodes whilst Tables 2...4 gathers the sets consisting of 4 nodes [7], [8].
One can notice that the maximum N7 node identifier appearing in these tables is 31. As long as the full WPT tree has 128 N7 nodes, it means that, when only HOs lower than 43 are addressed, the runtime involved by the WPT tree decompositions/ recompositions is expected to be reduced by a factor close to the value 4.
Yet, the data structures presented below will consider a number of 32 N7 nodes, because one uses binary trees structures which have an even number of N7 nodes. The weights in these tables were determined as follows [8]: a perfect sinusoid was polluted by a single harmonic H, whose energy EH is known, in this way getting a polluted signal S. S was decomposed with T7 and only a limited number of N7 nodes (from a set SH) were found to host non-zero vectors (NV), as an effect of H. The sum of all energies of NVs equals EH, so the non-zero weight W associated to H and a node N from SH means that the vector V hosted by N is affected by H and the energy of V due to H equals W x EH. Unfortunately, even for the case when paired harmonics are perfect sinusoids along the decomposed signal, the pairing mechanism cannot provide the harmonic magnitudes as solutions yielded by a linear system solving. This is happening [8] when more harmonics from a set act jointly, because the phase-shift between them influences the ratio between the paired nodes energies for identical harmonic magnitudes, as depicted in Figs. 1,2

Data structures
A first step in evaluating the parameters of the harmonics associated to a specific cluster of N7 nodes (CN7) consists in estimating the signal (SH) obtained from their superposition. This involves the applying of inverse WPT transform functions in a "bottom-up" approach all over the 7 levels from T7 considering as input values the nodes from CN7, and assuming that all the remaining N7 nodes are hosting null vectors.
Original data structures were conceived in order to reduce the runtime associated to the obtaining of SH.
Firstly a "membership matrix" M(7,4) was built in order to describe the content of sets according to the tables presented above. Each row from this matrix is associated to a set of nodes. The last 2 components of a row associated to a set consisting of only 2 members are filled with 0. For example M(5,3)=14, because the identifier of the 3-rd N7 node from the 5-th cluster is 14, whilst M(2,3)=0 because only 2 N7 nodes are clustered in the 2-nd set.
Based on M, a matrix of flags (C(32,7)) was deduced, in order to obtain a fast retrieve of the N7 nodes belonging to a given CN7. If the N7 node "y" belongs to the CN7 "x", then C(y,x)=1. Otherwise, it is 0.
C(32,7) will be afterward used to determine, for each CN7: -a minimal recomposition subtree (made of the nodes to be computed for each level) ; -a minimal number of operations related to recomposition for every node from each level L in this subtree, based on the labels of the nodes from the level L+1 which are used to compute it, as described below.     F6(16,7) was built in order to allow for the identification of the nodes from the 6-th level (nodes referred from this point on N6) which are associated to each subtree and the fastest modality to compute them.
An element from F6(x,y) is used for labeling a N6 node "x" with respect to the CN7 set "y" as follows: • the value 0 means that x does not belong to the subtree associated to y (both C(2*x-1,y) and C(2*x,y) are 0) and therefore this node will be set to 0 and should not be computed (all runtime associated to this recomposition will be saved); • the value 1 means that a "missing left member" recomposition should be done (C(2*x-1,y)=0 and C(2*x,y) = 1)). The original recomposition function (called i_dwt_l()) used for this case can save up to 50% of the runtime associated to this recomposition, as long as the N6 left node (2*x-1) from the 7-th level involved in the recomposition is 0; • the value 2 means that a "missing right member" recomposition should be done (C(2*x-1,y)=1 and C(2*x,y) = 0)). The original recomposition function (called i_dwt_r()) used for this case can save up to 50 % of the runtime associated to this recomposition, similar to the particular case explained above as long as the right N6 node (2*x) from the 7-th level involved in the recomposition is 0; • the value 3 means that a regular recomposition should be done, involving both N6 nodes associated to the node x (but no such values were found!). Table 6 gathers few rows from the matrix F6.  In the same idea, a matrix of labels F5(8,7) was built in order to allow for the identification of the nodes from the 5-th level (nodes denoted by N5) which are associated to each subtree and the fastest modality to compute them.
An element from F5(x,y) is used for labeling a N5 node (x) with respect to the set y as follows: • the value 0 means that x does not belong to the subtree (both F6(2*x-1,y) and F6(2*x,y) are 0) and therefore this node will not be computed; • the value 1 means that a "missing left member" recomposition should be done (F6(2*x-1,y)=0 and F6(2*x,y) >0)). i_dwt_l() will be used, saving runtime; • the value 2 means that a "missing right member" recomposition should be done (F6(2*x-1,y)=1 and F6(2*x,y) = 0)). i_dwt_r() will be used, saving runtime; • the value 3 means that a regular recomposition should be done, involving both N6 nodes associated to the node x.    The elements of the matrices F2...F4 are gathered by Table 8.  Id of N2 node 1 1 1 3 1 3 3 1

Metrics
Synthetic data were used to test the runtime related efficiency of the algorithm described in this section. In all of them, harmonics are superposed over a perfect sinusoid. The harmonic weights were chosen such as to have spectra similar to those for electric waveforms in power applications whilst the phase-shifts were generated in a random manner. Sets of 50 consecutive runs (decomposition followed by recomposition) were performed considering 3 scenarios. In the 1-st one, one used only the matrix C. In the 2-nd one, C and F6 were used whilst in the 3-rd one, all matrices were used to reduce the runtime. The computed mean runtimes are gathered by  The analysis of data from Table 9 reveals that the worst mean runtime is associated to the scenario when only the matrix C is used. The use of C cannot be avoided, because it contains the minimal information on the nodes from cluster. Fig. 4 is dedicated to the other 3 sets. 3 figures can be identified for each set: left -the perfect sinusoid overlapped with the signal composed from the superposition of the harmonics associated to the associated CN7; middle -generated and harmonic content reconstructed with WPT and right -difference between the generated and reconstructed harmonic content.

Fig.3 is dedicated to the first 4 sets (clusters) of nodes and
As expected, the smallest errors were noticed for the CN7s with only 2 N7 nodes, characterized by almost perfect filtering properties (weights close to 1). On the other hand, higher errors are noticed for clusters with 4 nodes and weaker filtering properties. The magnitude of these errors is not due to the runtime optimization algorithm. In order to deduce this, we computed the energies of all nodes when only harmonics clustered in a set were present. We noticed the presence of spectral leakages (N7 nodes not belonging to the associated cluster might be slightly affected for particular combinations of magnitudes and phase-shifts). For other wavelet mothers (e.g. Daubechies with filter of length 40), characterized by better filtering properties, the errors are smaller whilst the runtime optimization algorithm can be used in a similar way, but one has to consider that the runtimes is direct proportional to the length of filter and therefore they are higher than those associated to 'db14'.

Conclusions
Various techniques can be applied to WPT decompositions and/or recompositions in order to diminish the required runtime.
The simplest of all, which does not require additional data structures and consequently no additional memory is needed is the one relying on truncated filters. This technique yields a constant runtime saving, equal to 20.4% (which is actually small as compared to the other techniques presented in this paper), irrespective to the decomposed signal. A major advantage of this technique is related to its applicability to hardware devices of combinational type (it can compute the energies of N7 nodes using only blocks which perform summations and products). Unfortunately in this case inherent errors can be introduced by filters' truncation and it is impossible to make an exhaustive study on this topic because the harmonic weights   . Results of decompositions/recompositions yielded by the optimized algorithm for synthetic signals. From top to bottom, 5-th CN7, 6-th CN7 and 7-th CN7. From left to right : perfect sinusoid overlapped with the signal composed from the superposition of the harmonics associated to the cluster, middle -generated and harmonic content reconstructed with WPT, right -difference between the generated and reconstructed harmonic content. and phase shifts can take any real value. Moreover, significant runtime savings provided by assuming as 0 the N7 nodes affected only by high order harmonics are not possible in this "pure truncation-based" implementation.
The 2-nd approached technique is related to a joint analysis relying on DWT and WPT trees. It can be successfully applied both for faults detection and EMI detection and evaluation, being able to provide significant runtime savings within the range [10. . . 90]%.
This technique can be applied only when at most 3 levels from the DWT tree are affected and requires additional data structures.
The 3-rd runtime optimization technique approached in this paper involves the use of labels for each WPT node and functions which can perform only half of the operation required by per-node decomposition/ recomposition. A sort of "low-pass" filtering (which is correct even for cases when Fast Fourier Transform cannot be applied) is optimized . with input data under the form of configurable cut-off frequencies and accuracy. The mean runtime saving is ITM Web of Conferences 49, 01006 (2022) Fourth ICAMNM 2022 https://doi.org/10.1051/itmconf/20224901006 equal to 67.5% and can reach a value of 77.5% for lower cut-off frequencies. For example this technique can be used intensively in analysing EMC interferences. Unfortunately, additional data structures are needed and there is a limited number of cut-off frequencies which can be used as input data.
The runtime optimization technique introduced in this paper is dedicated to the application of the superposition theorem to evaluate the contribution of clustered harmonics using WPT. 7 sets of clustered even harmonics with orders lower than 43 were studied. Original data structures with labelling role were conceived and tested.
A significant diminishing of the mean runtime (with up to 45%) can be noticed when using both the matrix of flags for the terminal nodes and the matrix with labels for the nodes in the 6-th level. A maximum runtime diminishing (with up to 59%) is obtained for the fully optimized algorithm. These values should be considered when choosing one of the algorithms if the available memory is restricted.
One should consider the usability scenarios of the runtime optimization techniques presented in this paper when using the last-hour practical applications of wavelets , as those presented in [9] and [10].