Integrated extreme gradient boost with c4.5 classifier for high level synthesis in very large scale integration circuits

. High-level synthesis (HLS) is utilized for high-performance and energy-efficient heterogeneous systems designing. HLS is assist in field-programmable gate array circuits designing where hardware implementations are refined and replaced in target device. However, the power-process-voltage-temperature-delay (PPVTD) variation in VLSI circuits undergoes many problems and reduced the performance. In order to address these problems, C4.5 with eXtreme Gradient Boosting Classification based High Level Synthesis (C4.5-XGBCHLS) Method is designed for afford better runtime adaptability (RA) with minimal error rate. VLSI circuits are designed using the behavioral input and results are measured at running condition. When VLSI circuit’s results get reduced, the language description of the circuit is considered as an input. Then, compilation process convert high level specification into Intermediate Representation (IR) in control/data flow graph (CDFG). CDFG computes data and control dependencies among operations. eXtreme Gradient Boosting (XGBoost) Classifier is exploited in C4.5-XGBCHLS method to classify the error causing functional unit (FU) with minimal error rate. XGBoost Classifier exploited C4.5 decision tree as base classifier to enhance classification of error causing FU in VLSI


Introduction
HLS algorithm is employed to synthesize the design depending on different technologies by different module libraries. In [1], GenFin include non-dominated logic circuits where timing, leakage power and dynamic power yields are established. However, RA for PPVDT variations was not considered. A multi-objective mixed-integer linear programming model (MOMILP) was designed in [2] to resolve complex issues. The designed MOMILP was employed to consider scheduling, allocation and binding algorithm. It integrated the decisions about multiplexers that are necessary components of integrated circuit. MOMILP model was addressed through the augmented ε-constrained method. But, the ER was higher. In order to overcome the issue, a novel method termed C4.5-XGBCHLS Method is designed.
In [3], a high-level data-flow optimization and synthesis method was presented. An optimization method was introduced through functional decomposition of multivariate polynomial to acquire good building blocks and vanishing polynomials to add/delete redundancy. But, the optimal FU was not selected for providing RA. Based on motion energy with VLSI architecture and inexpensive embedded systems, a bio-inspired visual motion assessment algorithm was designed in [4].The computational cost was reduced, the performance of VLSI circuits was not improved. An analytical optimization model was introduced in [5] depending on Integer Linear Programming for prediction of hardware cost. But, the time consumption was higher.
In [6], Power Shut-Off (PSO) was joined with model-based hardware flow to acquire automated Low Power-HLS (LP-HLS) methodology. The designed methodology was introduced for reducing the design effort and attain target system implementations. Design Space Exploration (DSE) methodology was introduced in [7]. But, the computational complexity remained unaddressed using DSE methodology. In [8] Dynamic MCML circuits integrate the benefits of MCML circuits with those that of the dynamic logic families for achieving great performance using a low-supply voltage along with lowpower dissipation.
High level synthesis (HLS) methodologies were introduced in [9] with two approaches to lessen complexity and time consumption. But, the ER was higher. A set of new methods was introduced in [10] to enhance the HLS solutions and to enhance conventional design metrics. In [11], a transparent HLS approach with the scheduling and binding process was introduced. The issues reviewed from the above existing works are lesser FUSA, high computational complexity, higher ER, higher CAT, higher computational cost and so on. To resolve these issues, a novel method termed C4.5-XGBCHLS Method is designed.
The major contribution of the C4.5-XGBCHLS method is explained as,

•
The main contribution of proposed C4.5-XGBCHLS method is introduced for improving the performance of VLSI circuits with better RA and lesser error rate.
• Compilation process changed language description of VLSI circuit to IR in terms of CDFG through code optimization.
• C4.5-XGBCHLS method applies the XGBoost Classifier to classify the error causing FU with minimal error rate. XGBoost Classifier used C4.5decision tree as base learner for finding the error causing FU in VLSI circuits. The entire C4.5 decision tree is combined to form the strong classification results. The novelty of Hinge loss function is used to measure the actual output and the predicted outcome with improve the classification accuracy and reduce the error rate.

•
The new design of a suitable functional unit is allocated, scheduled and binded by the functional library to replace error causing functional units based on design objectives and PPVT constraints.

•
The new initiative of RTL Generation is carried out for efficient VLSI design with better RA and minimal time consumption.
This article is ordered as follows: Related works are described in Section 2. Section 3 explains the brief description of the C4.5-XGBCHLS method. Simulation settings and results analysis of C4.5-XGBCHLS method are presented in section 4. Conclusion is given in Section 5.
High-level synthesis (HLS) is utilized for high-performance and energy-efficient heterogeneous systems designing. HLS is assist in field-programmable gate array circuits designing where hardware implementations are refined and replaced in target device. However, the power-process-voltage-temperature-delay (PPVTD) variation in VLSI circuits undergoes many problems and reduced the performance. In order to address these problems, C4.5 with eXtreme Gradient Boosting Classification based High Level Synthesis (C4.5-XGBCHLS) Method is designed for afford better runtime adaptability (RA) with minimal error rate. VLSI circuits are designed using the behavioral input and results are measured at running condition. When VLSI circuit's results get reduced, the language description of the circuit is considered as an input. Then, compilation process convert high level specification into Intermediate Representation (IR) in control/data flow graph (CDFG). CDFG computes data and control dependencies among operations. eXtreme Gradient Boosting (XGBoost) Classifier is exploited in C4.5-XGBCHLS method to classify the error causing functional unit (FU) with minimal error rate. XGBoost Classifier exploitedC4.5 decision tree as base classifier to enhance classification of error causing FU in VLSI circuits. After that, FU gets allocated in place of error causing FU from functional library based on the design objectives and PPVTD variations. Finally, Operation scheduling and binding process is executed for Register Transfer Level (RTL) generation to form VLSI circuits with improved RA. The simulation results shows that the C4.5-XGBCHLS method enhances the performance of functional unit selection accuracy (FUSA) with minimal error rate (ER) and circuit adaptability time (CAT).

Related Works
Hardware implementation of Jacobi algorithm was introduced in [12]. However, the CAT was not reduced using Jacobi algorithm. In [13], A parametric yield-driven resource binding algorithm was designed to categorize the power and delay distributions to improve power yield. But, delay and power variations failed to provide better RA in VLSI circuits.
Auto Pilot was introduced in [14] for unbiased performance, usability and productivity of HLS. Auto Pilot was employed by embedded benchmark kernels. For providing better HLS suitability of real-world applications, stereo matching techniques were exploited. But, the ER was not reduced using Auto Pilot. An integrated approach was introduced in [15] for high-level verification with formal HLS tool. But, the CAT was not reduced using integrated approach.
In [16], Bacterial foraging optimization algorithm (BFOA) was designed for design space exploration (DSE) of datapath in HLS. However, the computational cost was not reduced using BFOA.
The profit enhancement was carried out in [17]. But, the ER was not lessened. A new approach was introduced in [18] to create multi-cycle transient and multiple transient fault resilient design in HLS via dual modular redundancy. However, the computational complexity remained unaddressed.
In [19], a new designed methodology was designed for nested uneven, All loops synthesis via outer vectorization. The port assignment algorithm was introduced in [20] to identify the valid solution. However, the FUSA was not improved using port assignment algorithm.
For RDR architectures, a thermal-aware HLS algorithm was presented in [21]. The designed algorithm equalized the energy consumption between islands. The designed algorithm balance energy consumption between the islands and lessened the peak temperature. However computational complexity was more.

Methodology
HLS is an automatic design procedure which understands algorithmic description of particular behavior and creates digital hardware. Code is examined, limited and scheduled to produce RTL hardware description language (HDL). HLS main objective is to allow hardware designers to build and authenticate hardware via providing enhanced control over optimization of design architecture and explaining design at higher level of abstraction. The conventional HLS methods failed to afford RA in PPVTD variations.
For improving VLSI circuit's performance with better RA, C4.5-XGBCHLS Method is introduced. The main aim of C4.5-XGBCHLS method is to identify error causing FU to offer better RA with lesser ER. Architectural diagram of C4.5-XGBCHLS method is portrayed in Fig 1. Initially, VLSI circuits are taken as input. Then, language description is given for input VLSI circuits. High-level transformation is to transform one behavioral description into another behavioral description. CDFG analysis indicates series of operations in DFG to implement specified behavior. To categorize error causing FU from other FU, Classification is performed. After classification, scheduling, allocation and binding are three essential processes for HLS. Scheduling is used to describe the states in finite-state machine. Each control step comprises one small section carried out in single clock cycle in hardware.

Fig. 1. Architectural diagram of C4.5-XGBCHLS method
Allocation and binding process in HLS maps the instructions and variables to hardware components, multiplexers and registers of the data path.

C4.5 with eXtreme Gradient Boosting Classification based High Level Synthesis (C4.5-XGBCHLS) Method
High level synthesis in C4.5-XGBCHLS method translates the behavioral specification of process into structural description. Structural description is provided in netlist at RTL. Synthesis indicates process of transforming digital system from behavioral specification into execution structure. Input to HLS process is provided in algorithmic-level specification. The specification offers necessary mapping from inputs sequences to sequences outputs. From input specification, synthesis system creates the data path description, registers, FU, multiplexers and buses. In HLS, essential steps of C4.5-XGBCHLS method are behavioral analysis, classification, design-style selection, operation scheduling, data-path allocation and module binding.

Compilation
In C4.5-XGBCHLS method, compilation is a one-to-one transformation of initial specification into new internal representation of behavior for synthesis. Graphs are employed for internal representation. Code optimization is an initial process carried out by the compiler. The compiler enhances the quality of program in runtime called code optimization. Program enhancement includes the change of instruction sequence, removal of instructions and variation in instruction itself while maintaining the original code. The variations are termed as the code transformations. Code optimization comprises the four components, namely discovering transformation opportunities, safe transformation, profitable guaranteeing application and code rewriting. Code optimization lessens the runtime through evading the unnecessary computations. Code optimization minimizes the number of circuit modules with lesser area, lesser power and higher frequency. Compiler optimization comprises the constant propagation, dead-code elimination, sub-expression elimination, global flow analysis, inline development of subprograms and loop unrolling.

Control/Data Flow Graph Analysis
Control/data-flow analysis is employed for information gathering of estimated values at diverse points in computer program. CDFG identifies the parts of program where the particular value gets allocated to the variable. The information gathering is employed when optimizing program. The process of HLS initiates through examines data dependencies among different steps in algorithm. The analysis resulted in the Data Flow Graph (DFG) description.

Fig. 2 Data Flow description
From Fig 2, ' ', ' ', ' ' and ' ' are the inputs for the data flow. ' 3 ' is the output of the data flow description. The data-flow analysis set up data-flow equations for each control flow graph node and resolves them via calculating outcome from input at every node till system stabilizes. Every node of DFG symbolizes operation described in C++ code with add operator. The connection amid nodes indicates data dependencies and operations order. CDFG denotes design specification at various level than final hardware execution. Nodes symbolize hardware operators. It does not comprise any multiplexer's specification and handle logic needed for execution. Edges in CDFG signify the hardware value, the register value based on schedule.

eXtreme Gradient Boosting (XGBoost) Classification
By using decision trees, gradient boosting addresses the issues of regression and classification with prediction model. Gradient boosting creates model in stage-wise manner and generalizes via optimization of arbitrary differentiable loss function. XGBoost Classifier used Gradient Boosting concept to control the over-fitting.

Fig. 3. Flow process of EXtreme Gradient Boosting(XGBoost) Classifier
In VLSI circuits, C4.5 with XGBoost Classifier categorizes the error causing FU from other FU with minimal ER for providing RA. The flow process of the C4.5 with XG Boost Classification is described in Fig 3. Fig 3 describes the XG Boost classifier with C4.5 decision tree to identify the error causing FU for providing RA. Let us consider the number of functional units' 1 , 2 , 3 … . . , ' is collected from the input VLSI circuits. It is given as, From (1), ' ' represent the input VLSI circuits. XG Boost classifier model exploits number of weak learners and combines them to produce the strong classification results. XG Boost classifier model utilizes the training set {( 1 , 1 ), ( 2 , 2 ), … , ( , )} where ′ ′represents the input (i.e. functional unit) and ' ' represent the classifier output. In the C4.5 decision tree, the training set and test set are divided for predicting the class labels. The training set is considered as the root. In the tree, each node act as a test set for several attribute. Every edge descending from the node respective to the possible answers to the test set. This process is repeated for each subtree rooted in the fresh node. Predictive accuracy is based heavily on a choice of the test and training data. Then, the C4.5 decision tree is used to provide the better predictive accuracy. From Fig, C4.5 decision tree is utilized as base learner for ensemble boosting classifier. C4.5 creates decision trees from training set samples with information entropy. At each decision node tree, C4.5 pick attribute of data which partitions training set samples into subsets to produce classification outcomes. The splitting criterion is depends on normalized information gain. Information gain is computed as, From (2), ' 'represents an information gain, ℎ( )denotes entropy of FU, ' 'is the subset produced from splitting set ' 'of node. The FU entropy is given as, From (3), ' ( )'signifies probability of ' ' belonging to class ' '.When all functional units belong to the same class, it creates the leaf node for the decision tree to select that class. The process gets repeated through splitting the nodes and adding those nodes as children of that node. By this way, the classification results are produced for every C4.5 decision tree. The each decision tree results are considered as base or weak classifiers. However, the accurate classification was not carried out with minimum loss function. For enhancing classification accuracy, XG boost classifier computes loss function for C4.5 decision trees to create strong classifier. Strong classifier output is the summation of each individual C4.5 decision tree is given by, From (7), ' 'represent pseudo residuals for every functional unit. After that, fit base learner to the pseudo residuals. The steepest descent is used in pseudo residual values for identifying the minimum loss function of the base classifier. It is given by, From (8), ' 'symbolizes the steepest descent step-size, argument minimum (arg min) function discover the minimum error of weak learner,' 'represents the coefficient to identify the local minimum of the function. The predictive model is updated to categorize FU as error causing FU or normal FU to offer the RA. The updation model is defined as follows, From (9) ' 'denotes the classification results of functional units. The attained strong classifier output categorizes FU for better RA. Ensemble of weak classifier is combined to create strong classifiers for discover the error causing FU.
Input: Number of functional units 1 , 2 , 3 … . . , Output: Error causing the functional unit Step 1: Begin Step 2: For each training data ' ' Step 3: Construct a C4.5 decision tree using information gain and entropy Step 4: Classify the based on Step 5: For each iteration Step 6: Compute pseudo residuals Step 7: Fit base learner to the Step 8: Find steepest descent step-size Step 9: Update the model ' ' Step 10: Obtain strong classification results Step 11: End for Step 12: End for Step 13: End

Algorithm 1 Extreme Gradient Boosting Classifier Algorithm
XGBoost classifier is illustrated in Algorithm 1 to classify the error causing FU from other FU. Let us take number of functional units from the input VLSI circuits. Initially, the FU in VLSI circuits is taken as an input. Decision tree classifies FU as error causing FU or normal unit by using information gain and entropy. After that, the XGBoost classifier combines all C4.5 decision trees to form strong classifier. The pseudo-residual is calculated and then base learner is fit to pseudo-residuals. The best steepest descent step size is calculated to attain strong classifier results. Then, the ER is reduced. Consequently, a C4.5-XGBCHLS method effectively identifies the error causing FU.

Operation Scheduling
HLS includes design time during the process called scheduling. Operation scheduling is allocation of each operation to time slot depending on time interval. The task input comprises CDFG with hardware resources and limitations. Schedule created data control dependency and performance constraints are limited. Scheduling is operations explained in CDFG and choose when to be performed. Scheduling operations are allocated to similar time slot and affects concurrency degree. Maximum number of concurrent operations in schedule is minimal bound on crucial hardware resources. The schedules differ implementation cost and scheduling plays a crucial part in HLS.

Resource Allocation
Resource allocation manages issues where resources are utilized in physical implementation. Resources include registers, memory units and dissimilar FU and communication channels. Main objective is to distribute resources and additional design criteria are satisfied. Allocation and binding present's selection and hardware resources assignment for VLSI design to offer better RA. Allocation identifies type and hardware resources for design. From Fig 5, every operation is mapped onto the hardware resource during the scheduling process called as resource allocation. The resource connects to physical implementation of hardware operator. The implementation is annotated in scheduling process with timing and area information. Operator includes various hardware resource executions and different area/delay/latency trade-offs. Resources are selected from pre-characterized library which includes sufficient data points to indicate broad range of bit widths and clock frequencies. Designers handle resource allocation to include pipeline registers or limit accessible resources.

Binding
Binding assigns the hardware instance resource to particular data path node. Data path operations distribute similar hardware resource if they are not performed simultaneously. An adder is shared with two additions as they are not performed during similar clock cycle. A register are employed to store values of two variables when lifetime of two variables do not overlap.

Synthesizable RTL
Finally, the RTL architecture gets synthesized by using all design decisions. Control generation synthesized the controller to create the suitable control signals with given schedule and binding of resource. As described in algorithm 2, VLSI circuit is taken as an input. PPVTD variations are monitored in VLSI circuits for HLS. Then, language description is compiled to form the CDFG. After that, the classification process classifies the error causing functional units due to PPVTD variation from other functional units. Allocation process select FU to meet design objectives. Then, scheduling and binding process provides improved RA. RTL gets synthesized to form efficient VLSI design with RA.

C4.5-XGBCHLS Method is implemented in MATLAB Simulink with 3.4 GHz
Intel Core i3 processor, 4GB RAM, and windows 7 platform to offer RA via HLS in VLSI circuits. C4.5-XGBCHLS Method performed the experiment with ISCAS-89 benchmark circuits. ISCAS-89 benchmark circuits comprises four inputs, one output, three D-type flipflops, two inverters and eight gates(one AND + one NAND + two OR + four NOR). C4.5-XGBCHLS Method is designed for finding the error causing FU and compared with [1] and [2]. The C4.5-XGBCHLS Method is evaluated with metrics like ER, FUSA and CAT. In the proposed HLS algorithm, the PPVTD measurement/estimation scenarios are include C4.5 with XGBoost Classifier. To conduct the experiments, the PPVTD variation settings comprise gate length, oxide thickness, fin thickness, and fin height. Let us consider VLSI circuits as input. The language description is taken from the VLSI circuits. The compilation process alters high level specification into IR in control/data flow graph (CDFG). CDFG measures data and control dependencies between operations. The error causing FU is determined by using classification process. Therefore, error rate is minimized. XGBoost Classifier by C4.5 decision tree as base classifier for improving classification of error causing FU. Followed by, the scheduling, allocation, binding process are performed to choose the FU depended on the design objectives and PPVTD variations. Lastly, the efficient VLSI design is created by using RTL

Algorithm 2 C4.5 with eXtreme Gradient Boosting Classification based High Level
Synthesis Algorithm .

Error Rate (ER)
ER is measured as difference of predicted value and obtained value to offer better RA based on PPVTD variation in VLSI circuits. It is calculated in percentage (%) and formulated as,  [2]. This is because, the C4.5 with XGBoost classifier accurately discover the error causing FU in the VLSI circuits to offer improved RA during PPVTD variation with lesser ER.
Let us take ten different benchmark VLSI circuits for experimentation. C4.5 classifier classifies the error causing FU from other functional units in VLSI circuits. After that, XGBoost classifier boosts the C4.5 classifier performance. C4.5-XGBCHLS Method lessen ER by 59% and 33% than the existing [1] and [2].

Functional Unit Selection Accuracy (FUSA)
FUSA is defined as ratio of number of FU which are accurately selected based on PPVTD variation for providing better RA to total number of FU. It is computed in percentage (%) and given by, From (11), the FUSA is determined. Higher FUSA, more efficient the method is said to be.  [2], 84 functional unit, and 87 functional unit, are accurately classified and the FUSA is 95% respectively. For each method, ten different results are observed. The performance of the proposed C4.5-XGBCHLS Method is better than the other existing methods. XGBoost classifier is used in C4.5-XGBCHLS Method to classify the error causing FU for improving the performance of RA in VLSI circuits during PPVTD variation. After classification process, appropriate FU is assigned to offer improved RA.

Circuit Adaptability Time (CAT)
CAT is described as difference of starting and ending time of circuit to getting adapted to PPVTD variation in VLSI circuits. In addition, it is given by total amount of time taken for providing better circuit adaptability. It is computed in milliseconds (ms). = − (12) From (12), CAT is computed. Minimal CAT, more effective the method is said to be.  s38584  41  30  15  s38417  44  33  18  s35932  42  30  15  s15850  47  36  20  s9234  51  40  22  s5378  44  34  18  s1494  46  37  20  s1488  43  31  17  s1423  40  28  16  s1238  47  39 Table 2 explains results of CAT with various benchmark circuits. The CAT of C4.5-XGBCHLS Method is compared with existing two methods namely GenFin Technique [1] and MOMILP Model [2] for HLS in VLSI circuits. From above mentioned table, C4.5-XGBCHLS Method consumes 18ms for Circuit Adaptability in s38417Benchmark Circuit while GenFin Technique [1] and MOMILP Model [2] consume 44ms and 33ms correspondingly. The CAT gets varied for different benchmark circuits.
The result of CAT is portrayed in Fig 8 with different benchmark circuits.   Fig. 8. Comparison of Circuit Adaptability Time From Fig 8, the CAT using C4.5-XGBCHLS Method is minimal than existing [1] and [2] due to the use of XGBoost classifier. The designed classifier classifies the error causing FU accurately with minimal ER. After categorizing FU, suitable FU gets allocated and scheduled in place of error FUs. Finally, the FU gets bind in the VLSI circuits for providing better RA with lesser time.
Consider various VLSI circuits for HLS with improved RA in PPVTD variations. In each input benchmark circuit, CAT is varied for three techniques. XGBoost classifier classifies error causing FU and schedules suitable FU for circuit adaptability with lesser time consumption. The C4.5-XGBCHLS Method minimizes the CAT by 59% and 46% than the existing [1] and [2].

Conclusion
C4.5-XGBCHLS Method is designed for better RA in VLSI circuits. Initially, VLSI circuits are constructed with behavioral input and results are measured at running condition. In C4.5-XGBCHLS method, the language description of circuit is considered as an input and compilation process converts the high level specification into CDFG to reveal dependencies between the operations. XGBoost Classifier classifies the error causing FU with lesser ER. Finally, suitable FU gets scheduled, allocated and binded from the functional library based on the PPVTD constraints for RTL Generation with better RA. From simulation results, the C4.5-XGBCHLS method improves the RA. The results ofC4.5-XGBCHLS method minimizes the ER by 46% and enhances the FUSA by 10% as compared to conventional works.