Attribute Reduction Algorithm Based on Structure Discernibility Matrix in Composite Information Systems

Attribute reduction, as an important preprocessing step for knowledge acquiring in data mining, is one of the key issues in rough set theory. It can only deal with attributes of a specific type in the information system by using a specific binary relation. However, there may be attributes of multiple different types in information systems in real-life applications. A composite relation is proposed to process attributes of multiple different types simultaneously in composite information systems. In order to solve the time-consuming problem of traditional heuristic attribute reduction algorithms, a novel attribute reduction algorithm based on structure discernibility matrix was proposed in this paper. The proposed algorithms can choose the same attribute reduction as its previous version, but it can be used to accelerate a heuristic process of attribute reduction by avoiding the process of intersection and adopting the forward greedy attribute reduction approach. The theoretical analysis and experimental results with UCI data sets show that the proposed algorithm can accelerate the heuristic process of attribute reduction.


Introduction
Pawlak proposed the Rough set theory in 1980s [1], this theory has become a powerful mathematical tool for analyzing one of various types of data [2,3].It can be used in an attribute value representation model to describe the dependencies among attributes, evaluate the significance of attributes and derive reduction [4,5].
The classical rough set model can only be used to deal with categorical attributes, However, there may be attributes of multiple different types in real-life applications.many extended rough set models have been developed for attributes of multiple different types.A neighborhood relation was proposed by Hu to deal with numerical attributes [6].Guan defined a tolerance relation and used the maximal tolerance classes to derive optimal decision rules from set-valued information systems [7].Qian used a binary dominance relation to process set-valued data in set-valued ordered information systems [8].Leung defined α -tolerance relations and employed the α -misclassification rate for rule acquisition from interval-valued information systems [9].To deal with missing data, the toleration and similarity relations as well as the limited tolerance relation were proposed [10].Grzymała-Busse combined the toleration and similarity relations and presented characteristic relations for missing data in information systems [11].
Most of the classical rough set methods fail to deal with more than attributes of two different types.Many scholars introduced the composite rough set model and proposed the basic idea to deal with attributes of multiple different types [12][13][14].we introduced a structure discernibility matrix [15] to solve the time-consuming problem of traditional heuristic attribute reduction algorithms in this paper.The proposed algorithms can choose the same attribute reduction as its original version, but it can be used to accelerate a heuristic process of attribute reduction by avoiding the process of intersection and adopting the forward greedy attribute reduction approach.Extensive experiments on different data sets from UCI show that the proposed structure discernibility matrix-based method can process large data sets efficiently.

Composite rough set model
In many practical issues, there are attributes of multiple different types in the information system, we call it a composite information system.A composite information system can be written as ) , , , ( , where U is a non-empty finite set of objects; A is a non-empty finite set of attributes; More specifically, a composite information system is also called a composite decision table if there are condition and decision attributes in the information system, which is denoted by ) , , , ( , and Where Here, the positive region be a partition over the decision D .Then the lower and upper approximations of the decision D with respect to attributes B are defined as The positive region . We set the neighborhood parameter 15 .0 = δ and adopt Manhattan distance.
According to the introduction in Section 2, it is easy to know that , the results are listed in Table 2.

It is easy to obtain
a is necessary in A relative to D .Use the same method to calculate every attribute in A , for simplicity, just think 1 a , finally the core is{ a a a a }is the reduction of attribute sets A relative to D .
The traditional heuristic attribute reduction algorithms is time-consuming, section 3 gives structure discernibility matrix-based heuristic attribute reduction algorithms, which can accelerate a heuristic process of attribute reduction by avoiding the process of intersection and adopting the forward greedy attribute reduction approach.

Structure discernibility matrix-based attribute reduction algorithm
In this section, the attribute reduction algorithm based on structure discernibility matrix in complex information system is presented and the sample is listed.

The sample analysis of attribute reduction algorithm based on structure discernibility matrix
Definition 8 [12].Given an composite information system ) , , , ( Definition 9 [15].Given an composite information system ) , , , ( can be calculated by the same method.
. We set the neighborhood parameter 15 .0 = δ and adopt Manhattan distance.The process of get one attribute reduction usually includes three steps.
Step1 The construction of the relation matrix According to Definition 8, we have Step2 The calculation of core attributes First we add the structure discernibility matrix of all the condition attributes, namely , it illustrated that i and j are in the same class on k attributes. in this example, there are 5 conditional attributes, so if , i and j are in the same class on the conditional attributes.If all of the location is 5 in the i row, corresponding location in the According to Definition 3, we have  In order to save the storage space, we use char to Store the matrix.If If all of the location is 1 in the i row, corresponding location in the is the core attributes of A relative to D .

Step3
The calculation of attribute significance After calculated core attributes, we with the core attributes as the opening of attribute reduction.First we add the structure discernibility matrix of the core attributes, namely whose significance is the biggest in order until find the reduce of A relative to D .

Attribute reduction algorithm based on structure discernibility matrix
In this section, we design attribute reduction algorithm based on structure discernibility matrix in the composite decision table.MRPR algorithm is a structure discernibility matrix reduction algorithm based on positive region.
Step 1 is to construct the relation matrix and its key steps are to compute the equivalence relation, the neighborhood relation, the tolerance relation, the characteristic relation.Suppose

IST2017
Step 2 is calculate the sum of the structure discernibility matrix of all the condition attributes and its time complexity is Step 3 is calculation core attributes of the condition attributes A relative to D and its time consume . This step is the key step outperform previous heuristic attribute reduction algorithms by avoid the process of intersection.
Step 4 is calculation significance of attribute in A but not in core, its time complexity is Step 5 is add the attribute whose significance is the biggest in order until find the reduce of A relative to D .we adopt greedy and forward search algorithms.These search algorithms start with a nonempty set, and keep adding one attribute of highest significance into a pool each time until the dependence has not been increased.its time complexity is |) || (| U A O [18]. Algorithm.A structure discernibility matrix reduction algorithm based on positive region (MRPR) Output: A reduce of the composite decision table.begin 1 Construct the relation matrix:

Experimental analysis
The experiments are carried out on a PC with the operation system win7 (64-bit), which has 4 GB main memory and uses Inter Core (TM) i3-3240 CPU with a clock frequency of 3.40 GHz.All the algorithms are coded in C++ and complied with Dev-C++.
To test and compare the performances of the MRPR algorithms and traditional heuristic attribute reduction algorithms based on positive region (RPR) [17] and A general improved feature selection algorithm based on the positive region (FSPR) [18], we download six data sets from UCI [21].All these data sets are outlined in Table 3.  Table 4 shows the experimental result, the time is average value of 10 times reduction timeconsuming.Figure 1-6 can express the results more clearly.We can see that the modified algorithms are faster than their original counterparts on these six data sets, which shows that the proposed structure discernibility matrix-based method can process data sets more efficiently.Sometimes, the effect of this reduction can reduce over half the computational time and even more.For example, the MRPR algorithm reduced time achieves 0.012 seconds on the data set Lung, while the reduced time is 0.025 seconds of FSPR algorithm and the reduced time is 0.035 seconds of RPR algorithm.The result on large data sets is more outstanding, For example, the MRPR algorithm reduced time achieves 19.590 seconds on the data set Chess, while the reduced time is 96.542 seconds of FSPR algorithm and the reduced time is 119.479seconds of RPR algorithm.So the proposed structure discernibility matrix-based method can accelerate the heuristic process of attribute reduction and process large data sets more efficiently.

Conclusions
To overcome the time limitations of the existing heuristic attribute reduction schemes, in this paper, a theoretic framework based on rough set theory have been proposed, called attribute reduction algorithm based on structure discernibility matrix, which can be used to accelerate algorithms of heuristic attribute reduction.Based on this framework, a structure discernibility matrix reduction algorithm based on positive region (MRPR) has been presented.Note that the MRPR algorithm can choose the same feature subset as the previous attribute reduction algorithm.Experiment on six UCI data sets show that the modified algorithms can significantly reduce computing time of attribute reduction while producing the same attribute reductions and classification accuracy as those coming from the previous methods.

IST2017
discernibility matrix is an effective accelerator and can efficiently obtain an attribute reduction.We will develop a parallel method to process attribute reduction in future work.
relation, neighborhood relation, neighborhood relation and characteristic relation respectively.By Definition 4, } use the same method we obtain that the relation matrix on a .Then

,
i and j are in the same class, we use 1 to represent it ,else we use 0 to represent it.
If all of the location is 4 in the ith row, corresponding location in D of A relative to D .If the core attributes set is not the reduce of A relative to D , we will calculate the significance of the attribute in } { core A − according definition 7. Then add the attribute relation, the neighborhood relation, the tolerance relation, the characteristic relation on the universe.The time complexity of computing the equivalence relation

Figure 1 .Figure 4 .
Figure 1.Lenses database Figure 2. Lung database Figure 3. Zoo database is an indiscernibility relation defined by an attribute a on U .
BCR are defined as } , a is unnecessary in B relative to D , else a is necessary in B relative to D .B is independent relative to D .B is the reduce of A relative to D .a is core attribute of B related to D . ,

Table 1 .
A composite decision table

Table 2 .
Results of composite classes we have 1 a is necessary of A relative to D .
Calculation core attributes of the condition attributes A relative to D .// According to Definition 6 4 Calculation significance of attribute in A but not in core.// According to Definition 7 5 Add the attribute whose significance is the biggest in order until find the reduce of A relative to D .

Table 3 .
A description of data sets.

Table 4 .
The time-consuming of different attribute reduction algorithms.
The results show that the attribute reduction algorithm based on structure