An Overlapping Communities Detection Algorithm via Maxing Modularity in Opportunistic Networks

. Community detection in opportunistic networks has been a significant and hot issue, which is used to understand characteristics of networks through analyzing structure of it. Community is used to represent a group of nodes in a network where nodes inside the community have more internal connections than external connections. However, most of the existing community detection algorithms focus on binary networks or disjoint community detection. In this paper, we propose a novel algorithm via maxing modularity of communities (MMC)to find overlapping community structure in opportunistic networks. It utilizes contact history of nodes to calculate the relation intensity between nodes. It finds nodes with high relation intensity as the initial community and extend the community with nodes of higher belong degree. The algorithm achieves a rapid and efficient overlapping community detection method by maxing the modularity of community continuously. The experiments prove that MMC is effective for uncovering overlapping communities and it achieves better performance than COPRA and Conductance.


Introduction
Opportunistic networks [1] are special networks in which nodes contact each other opportunistically to forward information. Due to unpredictable node mobility and without any fixed infrastructures, there is not an end-toend path in most situations. Different from the store-andforward manner in traditional networks, information are forwarded in a store-carry-and-forward manner, so applications need to tolerant long period of time delay in opportunistic networks. For example, people can use portable intelligent devices with short-range wireless communication capability (e.g. Bluetooth, WiFi) or some computing power to store and forward information, it can forward information more conveniently and easily without network infrastructure.
Community detection in opportunistic networks has become a significant and hot issue, which is used to understand characteristics of network through analyzing structure of it. Community is used to represent a group of nodes in opportunistic networks where internal connections of nodes inside the community are denser than external connections. Community detection can help us to uncover and understand local community structure in both offline mobile trace analysis and online applications, and it is helpful in decreasing forwarding time as well as the storage capacity of nodes. Since the relationships between nodes usually seem to be stable and less volatile than node mobility, forwarding schemes based on community [2][3][4][5][6]outperform traditional approaches [7,8]. Overlapping community detection, one of the most interesting research of community detection, is the primary focus of this paper. Overlapping community means that a node may participate in more than one community in the network. Furthermore, most of real-world networks exhibit the feature of overlapping communities, such as social networks, information networks and biological networks. For example, we divide communities according to people's interests. A man may belong to multiple communities. People who like sports may have interests in music and others may still have interests in cooking.
The rest of this paper is organized as follows. Section 2 introduces related work in community detection. Section 3 presents our community detection algorithm for opportunistic networks. Then, we evaluates the performance of our proposed algorithms with COPRA and Conductancein Section 4. Section 5 concludes the paper and states the future of the field.

Related Work
Several community detection algorithms have been proposed for opportunistic networks. In this section, we divide the community detection algorithms into three categories: modularity-based, label propagation-based and attribute-based.
In modularity-based detection algorithms, early scheme is GN algorithm [9].Girvan and Newman proposed edge betweenness which means the number of shortest paths in which the given edge is included. In this method, the edges with high edge betweenness score will be removed in every step. However, the method needs to recalculate the score of edge betweenness for all edges every time after each removements. It is computationally intensive and suffers with scalability problem. Additionally, Newman et al. [10] proposed a bottom-up hierarchical approach that optimizes the modularity score in a greedy manner. Initially, every node is a community. Communities are merged iteratively based on optimal modularity score until there is no increasement in modularity score. In [4], Pan Hui et al. proposed a detection algorithm based on cliques and modularity. In addition, CPM is proposed in [11], which is based on the assumption that a community consists of all k-cliques that can be reached from each other through a series of adjacent k-cliques. A k-clique is a fully connected subgraph and two k-cliques are said to be adjacent if they share k-1 nodes. However, it doesn't consider all the characteristics of links, such as connection time or connection frequency. The value of k is also hard to be determined and the method is more suitable for networks with dense connected parts.
In label propagation-based detection algorithms, the typical algorithm is the Label Propagation Algorithm (LPA) [12], researchers present an algorithm that assigns k labels to each node in the network and updates its label according to the most frequent label in its neighborhood. This method is faster than others but produces different results each time based on initial configuration. So one need to run the algorithm several times to build the consensus. It consumes time. SLPA [13] is an extension of LPA, in this method each node has a memory and considers information that has been observed in the past to make current decision. COPRA [14] can achieve good performance in some cases, but it limits the number of communities for each node which decreases the accuracy wheneverνis too big or too small. In [15] , authors propose a balanced multi-label propagation algorithm(BMLPA) for overlapping community detection. Compared to COPRA, the advantage of the strategy is that it allows nodes to belong to any number of communities without a global limitν.
In [16], researchers present an Interest Community Routing(ICR) algorithm which is founded on social network theory. Authors define an interest metric and a message header to represent individual interests and data types in the networks. By comparing the similarity between the message header and the interest metric of node, the node will be put into corresponding interest community. However, the interest is not stable and one may have many kinds of interests in reality, so it's not applicable.
These methods only detect disjoint communities, or are applied in networks where nodes are in frequent contacts, or need to run many times to achieve stable status. Those are not sufficient to process opportunistic networks with overlapping communities effectively.
In this paper, we first calculate relation intensity between nodes in opportunistic networks, it can solve the problem which community structure is fuzzy in binary networks. In binary networks, multiple contacts and single contact are considered the same as having contacts. We consider nodes with higher relation intensity as initial community. Finally, we extend the community with nodes of higher belong degree so that the modularity of community can be increased continuously while solve the problem of overlapped community detection.

MMC Algorithm
We aggregate node mobility traces into weighted contact graphs. The vertices of the graphs are nodes, the edges are relationships, and the weight of edge is computed based on the times of contacts and the duration of each contact between two nodes. Like social networks, the more often they contact, the more familiar they are. And the more time they spend together, the closer they are. For simplicity, we use contact graphs, where V is the set of nodes and {( , )} E v w is the set of edges. We use the adjacency matrix A to represent the edge in E and vw A to denote the weight of the edge between two nodes , v w V  .

Relation Intensity
In order to help us present the contact information between two nodes and make it easier for further processing, we simplify the presentation of contact record in [19]. We use a three tuples where v G is the contact history of node v. So we can get the following equations to calculate the relation intensity between node v and any other node w.
w v int is the fraction of time that node v spend with node w over the time that node v spend with all nodes. The relation intensity between node v and node w is shown in Eqs (3): Where  and  are the weighting factors. Values of  and  as the case may be can be changed considering the proportion between the number of contacts and the duration of each contact.

Modularity
In this paper, we use Newman's weighted modularity proposed in [4] as a measurement of the quality of the community structure it detects. One can get the corresponding fitness value with the following definition of modularity (Q): From the formula (4), Q is defined as this fraction minus the fraction of the edges that would be expected to fall within the communities if the edges were assigned randomly while we keep the degrees of the vertices unchanged. Generally, if Q is greater, community structure is clearer; on the contrary, if Q is smaller, community structure is more ambiguous.

Belonging Degree
During the detection of community structure, the belonging degree proposed in [17] between a node v and a community C is defined as: If all neighbors of a node v are included in community C, ( , ) 1 B v C  .

The Community Detection Algorithm
Our detection algorithm works as follows: Step 1: After a period of warm time t, each node begin to calculate the relation intensity between itself and other nodes according to the contact history. We get a weighted contact graph.
Step 2: Sort edges of contact graph by size. Then choose two nodes with highest relation intensity as a new community C and calculate its modularity according to Eqs.(4).
Step 3: Expand C, put neighbors of nodes in C into C N and sort them by belonging degree. Choose the node with the highest degree to C ' to form a new community C ' , calculate Q of C ' . Then there are two situations need to be handled: If Q increases, repeat from Step 3 and the expanding process is continued for community C ' ; Otherwise, the expanding process of C is finished. Meanwhile, the edges within community C are removed from the edge set E.
Step 4: Repeat from Step 2 to Step 3 until E is empty. For completeness, the pseudo code of the detecting algorithm is shown as follows.
Input: traces of nodes and nodes set V Output: Community of node 1: According to formula (4), if Q increase, node u is involved in community C, we have: From formula (7), we have: Node u will be added into Community C when (9) is founded. We use an example to illustrate the expanding process. Fig.1 shows an example of overlapping community detection using our detection algorithm. Two communities are detected and shown in different circles.
As shown in Fig.1

Experiments
In this section, we present some representative numerical results to validate the effectiveness of MMC comparing with two other schemes: Conductance [17] and COPRA [14], both of which can detect overlapping communities.
We use LFR benchmark [18] proposed by Lancichinetti et al. to test overlapping community detection algorithms. LFR is a well-known benchmark which can generate synthetic unweighted or weighted networks. It provides power-law distributions of node degree and community size, allows overlaps between communities. We can set many parameters to control the generated network, such as the fraction of nodes that belong to more than one community over all nodes and the number of communities that a node belongs to simultaneously. And the settings of parameters we set are shown in Table 1.  The meanings of all these parameters are as follows: N is the number of all nodes while  is the fraction of overlapping nodes over all nodes. mut and muw denote mixing parameter for topology and mixing parameter for edge weights respectively. The exponent for weight distribution is  . kand kmaxrepresent the average of node degree and the maximum of node degree respectively. We use above metrics and different parameter settings: mut = muw = 0.1,mut = muw = 0.3, N=1000,4000. And the overlapping fraction  is changed from 0 to 0.5 for each setting to evaluate the performance of these algorithms. Fig. 2 to Fig.5 show the experimental results of three algorithms: COPRA, Conductance, and MMC.    Fig.3 show modularity of the community structure detected by three algorithms while mut=muw=0.1 and number of nodes is different. As is shown, the modularity of all algorithms is higher. This may be because network structure is not very complex. For both COPRA and Conductance, the modularity is lower than MMC. In addition, the modularity of COPRA dramatically decreases while the overlapping fraction increases in all settings.    Fig.5 show modularity of the community structure detected by three algorithms for while mut=muw=0.3 and number of nodes is different. When mut and muw is higher, and the number of nodes do not change, the modularity of all algorithms is lower. The performance of COPRA is still worse while  increases.
In real situation, it's hard to determine a global limit v which has a great influence on performance for COPRA. However, MMC is stable with the increase of the overlapping fraction  . Thus, we can conclude that MMC has the ability to detect overlapping communities.

Conclusions
In this paper, we propose an overlapping community detection algorithm for opportunistic networks. First, we introduce relation intensity to measure relationships among nodes. Then we design a new detection algorithm based on maxing modularity of communities. Simulation results on synthetic networks show that our community detection algorithm is much better than other algorithms in terms of modularity. In future, we will do more work on the validations in different environments. We will evaluate this algorithm on large scale networks coming from real networks.