A Weight-Based Clustering Method

This paper proposes a weight-based self-constructing clustering method for time series data. Selfconstructing clustering processes all the data points incrementally. If a data point is not similar enough to an existing cluster, then (1) if the point currently does not belong to any cluster, it forms a new cluster of its own; (2) otherwise, the point is removed from the cluster it currently belongs to before a new cluster is formed. However, if a data point is similar enough to an existing cluster, then (1) if the point currently does not belong to any cluster, it is added to the most similar cluster; (2) otherwise, it is removed from the cluster it currently belongs to and added to the most similar cluster. During the clustering process, weights are learned and considered in the calculations of similarity between data points and clusters. Experimental results show that our proposed approach performs more effectively than other methods for real world time series datasets.


Introduction
Clustering is an unsupervised classification technology, with a purpose of forming meaningful clusters for the objects under consideration.Usually, similar objects are grouped in the same cluster, and different objects are grouped in different clusters.Clustering techniques play a very important role in the field of artificial intelligence [1] [2][3] [4].In particular, they are widely applied in times series data analysis in a variety of areas, such as bioengineering [5], environmental monitoring [6], economic applications, and so on.In the process of clustering time series data, using the same weight for each dimension may cause bad effects.To deal with this difficulty, Huang et al. proposed TSKmeans [7], which is K-means with weights, to assign different weights to different dimensions of the data.A similarity measure based on the weighted Euclidean distance was adopted.Through quadratic programming, smooth subspace in time stamps can be produced.It was shown that TSK means can result in better clusters than the original Kmeans for time series data.
This paper proposes another weight-based clustering method for time series data.Instead of using K-means, an iterative self-constructing clustering method is adopted.The method performs several rounds of clustering until convergence is reached.In each round, all the data points are processed incrementally.If a data point is not similar enough to an existing cluster, then (1) if the point currently does not belong to any cluster, it forms a new cluster of its own; (2) otherwise, the point is removed from the cluster it currently belongs to before a new cluster is formed.However, if a data point is similar enough to an existing cluster, then (1) if the point currently does not belong to any cluster, it is added to the most similar cluster; (2) otherwise, it is removed from the cluster it currently belongs to and added to the most similar cluster.During the clustering process, weights are learned and considered in the calculations of similarity between data points and clusters.If the cluster assignment of one instance has been changed in the current round, the next round of clustering continues.Otherwise, the cluster assignments are stable and the whole clustering process stops with a desired number of clusters.
The rest of this paper is organized as follows.TSKmeans is briefly reviewed in Section II.Our proposed method is presented in Section III.Experimental results are shown in Section IV.Section V gives a conclusion.

Related Work
Many clustering methods have been proposed for time series data [8] [9][10] [7].Among them, TSKmeans [7] is the most recently published.TSKmeans is a K-means incorporated with weights.It tries to make the distance between the data points contained in a cluster and the center of the cluster small through the use of weights of Subject to by the application of quadratic programming.Using these weights in each iteration of Kmeans until convergence is reached.At the beginning, TSKmeans generates randomly the centers of clusters and sets initial values for the weights of clusters.Then the following three steps are done iteratively: Step 1.For each pattern Xi, compute the distance Dpi between it and cluster p by for 1 ≤ p ≤ k, 1 ≤ i ≤ n.A pattern is assigned to the cluster with the smallest distance.If pattern i is assigned to cluster p, then ‫ݑ‬ is set to 1 and ‫ݑ‬ is set to 0, j z 0.
Step 2. Update the centers of all clusters by for 1 ≤ p ≤ k.
Step 3. Use known U and Z to update W by applying quadratic programming to Eq.( 1) with If clusters have changed in the current iteration, then go back and Steps 1-3 are performed again.Otherwise, TSKmeans stops.However, TKmeans suffers from the same problem as K-means does.The number of clusters has to be specified in advance.Our proposed approach can overcome this shortcoming.

Proposed Method
We propose a self-constructing clustering (SCC) method which does not require the number of clusters to be specified by the user in advance.We describe the clustering in detail.Also, we improve the method by incorporating weights in the calculation of similarity, just as TKmeans does to K-means.SCC performs several rounds of clustering until convergence is reached.In each round, one full training cycle on the training set of N patterns ܺ (ଵ) , ܺ (ଶ) , . . ., ܺ (ே) , is done.Let K be the number of existing clusters.Each cluster ‫ܥ‬ , 1 ≤ p ≤ K is characterized by its center ܼ , deviation ܸ , size ܵ , and weight ܹ .Initially, K is 0. Suppose we are in the rth round, r ≥ 1.For pattern i, ܺ () , 1 ≤ i ≤ N, we calculate the similarity between ܺ () and each existing cluster by for 1 ≤ p ≤ K. Two cases are considered: Case 1.If for 1 ≤ p ≤ K, we do the following: 1) If ܺ () currently does not belong to any cluster, it forms a new cluster ‫ܥ‬ ାଵ of its own.And we have K = K + 1, ܼ = ܺ () , ܸ = ‫ݒ{‬ , ‫ݒ‬ , . . ., ‫ݒ‬ }, ܵ = 1, and ܹ containing m randomly generated numbers.
2) If ܺ () currently belongs to cluster Ca, we remove ܺ () from Ca and update the characteristics of Ca.And a new cluster ‫ܥ‬ ାଵ containing only ܺ () is created as previously.
Case 2. If for some existing clusters, we do the following: 1) If ܺ () currently does not belong to any cluster, it is added to the most similar cluster, say ‫ܥ‬ ௧ , and the characteristics of ‫ܥ‬ ௧ are updated by 2) If ܺ () currently belongs to cluster Ca, we remove ܺ () from Ca, updating the characteristics of Ca, and we add ܺ () to the most similar cluster Ct as before.
After all the patterns are considered, if none of the cluster assignments has changed, SCC stops with K clusters.If the cluster assignments of some patterns have changed, we update the weights W by minimizing

Experimental Results
In this section, we present and compare the experimental results of several clustering methods on six real world time series datasets: SynControl, Trace, CBF, ECGFiveDays, FaceFour, and OliveOil [11].For convenience, our proposed method is called SCC with weights, abbreviated as SCC-W.The characteristics of the six datasets are listed in Table I.In this table, column 1 indicates the name of the dataset, and the remaining columns indicate the number of instances, the number of features, and the number of classes, respectively, in each dataset.Note that these datasets are single-labeled, i.e., an instance belongs to only one class.
For the sake of fairness in comparison, a method was applied on every dataset ten times and the average of the results of the ten runs is then presented.To evaluate the effectiveness of these methods, the following performance measures are adopted [12]: Fscore , RI and NMI .All these measures have a common property: a higher measure indicates a better clustering performance.
The results after clustering are shown in Table II.It can be seen that SCC-W outperforms SCC for all the six datasets.However, more CPU time is required SCC-W.

Conclusion
We have presented a weight-based self-constructing clustering method for time series data.Self-constructing clustering processes all the data points incrementally.If a data point is not similar enough to an existing cluster, then (1) if the point currently does not belong to any cluster, it forms a new cluster of its own; (2) otherwise, the point is removed from the cluster it currently belongs to before a new cluster is formed.However, if a data point is similar enough to an existing cluster, then (1) if the point currently does not belong to any cluster, it is added to the most similar cluster; (2) otherwise, it is removed from the cluster it currently belongs to and added to the most similar cluster.During the clustering process, weights are learned and considered in the calculations of similarity between data points and clusters.Experimental results have shown that our proposed approach performs more effectively than other methods for real world time series datasets.
stamps.Given X = {ܺ ଵ ,ܺ ଶ , . . .,ܺ } is a set of n time series patterns.Each pattern ܺ ଵ = ‫ݔ{‬ ଵ , ‫ݔ‬ ଶ . . ., ‫ݔ‬ } is the ith pattern characterized by m values, i.e., m time stamps.The membership matrix U is a n × k binary matrix, k is the total number of clusters, with ‫ݑ‬ = 1 indicating that ܺ belongs to cluster p and ‫ݑ‬ , j z p, is 0. The centers and weights of clusters are represented by two sets of k vectors Z = {ܼ ଵ ,ܼ ଶ , . . .,ܼ } and W = {ܹ ଵ ,ܹ ଶ , . . .,ܹ }, with ‫ݓ‬ being the weight of the jth time stamp for the pth cluster.The purpose of TSKmeans is to minimize the following objective function: