GPU-Accelerated Apriori Algorithm

This paper propose a parallel Apriori algorithm based on GPU (GPUApriori) for frequent itemsets mining, and designs a storage structure using bit table (BIT) matrix to replace the traditional storage mode. In addition, parallel computing scheme on GPU is discussed. The experimental results show that GPUApriori algorithm can effectively improve the efficiency of frequent itemsets mining.


Introduction
The overhead of association rule mining mainly comes from the generation and processing of frequent itemsets.In order to get frequent itemsets, a large amount of computation and enough storage is needed.Therefore, the improvement of association rule mining algorithm should focuse on how to improve the processing speed and reduce the storage space.This paper combines Apriori algorithm with GPU Technology, and introduces the algorithm of GPUApriori.On the one hand, a compressed storage scheme is designed to simplify the connection operation and reducing the storage usage of the algorithm.On the other hand, in order to fit the parallel computing architecture of GPU [4], parallelization of the Apriori algorithm is proposed.

Improvement of Storage Structure 2.1 Apriori Algorithm
The Apriori are based on the property that subset of a frequent itemsets must be frequent, it uses an iterative procedure with the data subsets, and utilizes large frequent itemsets generated in the preceding stage to produce frequent itemsets in the next stage [7].In Apriori algorithm, there are two parts: Firstly, generated all 1-item candidates, and counted their number of occurrences to achieve 1-item frequencies.Secondly, joined the k-item frequencies to generate (k+1)-item candidates, tested their amount and quality (fit the above property) to generate (k+1)-item frequencies.The algorithm iterates the process until the number of the (k+1)-item candidates is 0.

Maintaining the Integrity of the Specifications
Apriori algorithm mainly uses horizontal and vertical storage structure.This paper introduces another mode (the bit table (BIT) matrix).This way can be well applied to the multi-core concurrency of GPU, and has the ability to replace the complex joined operation of Apriori with the operation "&" between two matrices.Therefore, BIT matrix Storage Structure can improve algorithm efficiency.Figure 1 shows the difference between the BIT matrix storage structure and the traditional one.The BIT matrix can be well applied to the multi-core concurrency of GPU.In these matrices, one column uses one bit to represent, and every column (item) information for each row (transaction) is stored by some "unsigned integer".The number of "unsigned integer" depends on the how many the column in the dataset.Figure 2 shows the state of the BIT matrix in memory.This structure is different from the general sense of the array, so that a new kind of read and write operation logic need to be fulfill.Read and Write operations are made using a tag variable ("unsigned integer"), which value is "0x80000000".Through the shift manipulation of the tag variable and the operation "&" between the shifted tag variable and the corresponding variables in the BIT matrix, the reading operation can be achieved.Using the relevant variables table in the matrix and shifted tag variable to finish the operations of "|=" and "-=", then the value of "0" or "1" can be write in the BIT matrix.
The Apriori algorithm based on the BIT matrix storage structure also needs to complete the operation of converting the traditional storage structure into the BIT form.Firstly, in the conversion process, the array of BIT matrix is initialized, and set the value of all elements to "0".Then, needed to traverse every transaction in the database, and wrote "1" to the corresponding positions in the BIT when reading the items in each transaction, the operation of updating the number of per items can be performed simultaneously.Finally, 1-item candidates can be built when this procedure is over.

Candidate Itemsets Generation
The candidate itemsets generation based on BIT is equivalent to the join step in Apriori algorithm.It uses the operation "&" between the k-item frequent BIT (FBIT) and the 1-item FBIT to generate (k+1)-item candidate BIT (CBIT), and completes the support count of (k+1)-item CBIT. Figure 3  BIT matrix is suitable for Gird-Block-Thread structure of GPU, multiple threads can be parallel processing of each part of the table.Algorithm 3-1 shows a single GPU thread executes the operation of candidate itemsets generation based on BIT matrix.
Algorithm 1: Join Step of GPUApriori Input: k-item FBIT and 1-item FBIT Output: (k+1)-item CBIT, Support Count of (K+1)item CBIT: S1 1. 1-item FBIT is divided into several small F1 matrixes according to the size of 32*100, k-item FBIT divided into several FK columns.2. Every GPU thread parallel processes a F1 matrix and a FK columns and achieves a CK+1 matrix (part of (k+1)-item CBIT) and its support count (refer to Figure 4).3. GPU thread get the matrix CK+1 and set it to the corresponding position of (k+1)-item CBIT (Atomic operation).4. Returned to 2 until all F1 matrix and FK column are processed completely.

Frequent Itemsets Generation
This section focuses on the pruning step of GPUApriori algorithm.The core of the pruning step is to sort out a Range table, which records all the indexes of the (K+1)item FBIT, and then filters the (K+1)-item CBIT according to the index set.Algorithm 3-2 describes an algorithm for filter the (K+1)-item CBIT to achieve the (K+1)-item FBIT.

The Whole Process of Gpuapriori
The algorithm needs the cooperation between CPU and GPU.It includes four steps: reading the data and generate the 1-item CBIT, building the 1-item FBIT, joining operation between two FBIT and pruning steps of the CBIT, The flow chart of the GPUApriori is shown in the Figure 4.The experiment of this paper is based on the GPU platform.The detailed information of this platform is listed in TABLE I.

Experimental Result
The experimental results are shown in Figure 5, Figure 6, Figure 7 and Figure 8. Figure 5, Figure 6 shows the running time of the two algorithms in the Retail and T40I10D100K experimental datasets with different support levels.Found from the comparison in contrast with the GPUAriori algorithm and the algorithm on the CPU, GPUAriori has a greater advantage.Moreover, when the minimum support is smaller, the speedup of GPApriori is higher than that of the another.Simultaneous.Figure 7 and Figure 8 show that the GPApriori algorithm has a much more efficient acceleration in a dense data set.
The Experimental shows that the GPUApriori algorithm performs well in performance testing, In addition, When the Apriori algorithm is calculating the larger and more dense dataset, GPU compared with the CPU algorithm will have a faster acceleration.

Discussion
In this paper, the association rules mining algorithm in data mining is studied, and presented the parallel processing of joining step and pruning step in GPUApriori based on BIT matrix.The parallel GPUApriori algorithm is implemented in the experiment, and the results show that the GPUApriori algorithm has better running efficiency, which reflects the parallel acceleration effect of GPU.In future, research should focus on the other algorithms of frequent itemsets mining, in order to develop other parallel algorithms based on GPU.

Figure 1 .
Figure 1.BIT Matrix and Traditional Storage Structure

DOI: 10 Figure 2 .
Figure 2. Storage Form of BIT Matrix in Memory

Generate the 1 Figure 4 .
Figure 4.The Flow Chart of the GPUApriori

Table 1 .
Table Type Styles Our benchmark datasets are from the Frequent Itemset Mining Repository.Selected Retail, Mushroom, Chess, TI0I4D100K experimental datasets, and those datasets are presented in