An Improved Particle Swarm Optimization Algorithm and Its Application in the Community Division

With the deepening of the research on complex networks, the method of detecting and classifying social network is springing up. In this essay, the basic particle swarm algorithm is improved based on the GN algorithm. Modularity is taken as a measure of community division [1]. In view of the dynamic network community division, scrolling calculation method is put forward. Experiments show that using the improved particle swarm optimization algorithm can improve the accuracy of the community division and can also get higher value of the modularity in the dynamic community Among the classic community division algorithms, GN algorithm and Newman algorithm are the most representative. The GN algorithm is to delete the highest weight of the network edge each time. It stops when each node is degenerated into a community. The GN algorithm is based on the greedy strategy. The advantage of this algorithm is the low space complexity, meanwhile, the defect of the algorithm lies in the fact that the termination of the community division is unknown when the result of division is uncertain. The Newman fast algorithm is improved based on the GN algorithm which has a good performance in the static network. However, social networks is interactive and dynamic as the relationship in the network changes over time. In order to solve the problems above, this essay refers to the concepts of modularity and designs the objective function based on particle swarm optimization algorithm. By using the penalty function method and other constraints strategy, the particles which do not meet the requirements will be timely eliminated. Through the mutation strategy and the optimization of convergence factor, the convergence of the algorithm is improved. This essay also adopts a scrolling calculation method to update the dynamic network of community structure. 1 An Improved Particle Swarm Optimization Algorithm 1.1 Description of particle swarm optimization Particle swarm optimization algorithm is proposed by Kennedy and Eberhart in 1995[2]. This is a kind of evolutionary computation technique based on the swarm intelligence which is inspired by the behavior of the birds. This algorithm eventually converges to the global optimal particle through the cooperation between the particles. These particles adjust their speed and position in order to find the local optimal and global optimal particles. The process is described as follows. First of all, the value of each particle variable is assigned including the position vector and velocity vector of the searching space. Then the iterative calculation is started. The particles renew themselves through the two "extreme values" at each time of iterations. One "extreme value" is the optimal solution called pbest which is found by each particle itself and the other is found by the whole particles called gbest. Then, the particles update their speed and position vector according to the following formula. 1 1 2 2 ( ) ( ) id id id id id id id v wv c r pbest x c r gbest x x x v           (1) In formula (1), the learning factor 1 c and 2 c are the nonnegative constants, 1 r and 2 r are the random numbers in the interval (0,1), inertial factor w is often set no greater than 0. The basic particle swarm optimization algorithm finds the best target by the cooperation between particles. However, it is easy to fall into the local optimum in the real problem. So we often need to make some improvements on it. DOI: 10.1051/ 05003 (2016) 4 7 7 , 60705003 ITM Web of Conferences itmconf/201 ITA 2016 © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). , Liu-Yi ZHANG 1.2 Improvements on particle swarm optimization 1.2.1 Optimization based on the parameter In formula 1, the parameter which affects the search ability of the algorithm is the inertia weight w. The parameters which affect the convergence rate of the algorithm is the accelerate weights 1 c and 2 c . How to set parameters has no unified standard which just depends on experiences. Reference [3] points out that the greater w is, the weaker the global search ability is and the smaller the w is and the stronger global search ability is. So we need to set up a proper inertia weight to balance the global search and local search ability. Meanwhile, the convergence speed of the PSO algorithm is determined by acceleration weight 1 c and 2 c . This essay assigns the value to 1 c and 2 c respectively because this is advantageous for speeding the convergence. Formula is as follows:


1
An Improved Particle Swarm Optimization Algorithm

Description of particle swarm optimization
Particle swarm optimization algorithm is proposed by Kennedy and Eberhart in 1995[2].This is a kind of evolutionary computation technique based on the swarm intelligence which is inspired by the behavior of the birds.This algorithm eventually converges to the global optimal particle through the cooperation between the particles.These particles adjust their speed and position in order to find the local optimal and global optimal particles.The process is described as follows.First of all, the value of each particle variable is assigned including the position vector and velocity vector of the searching space.Then the iterative calculation is started.The particles renew themselves through the two "extreme values" at each time of iterations.One "extreme value" is the optimal solution called pbest which is found by each particle itself and the other is found by the whole particles called gbest.Then, the particles update their speed and position vector according to the following formula.
In formula (1), the learning factor 1 c and 2 c are the nonnegative constants, 1 r and 2 r are the random numbers in the interval (0,1), inertial factor w is often set no greater than 0.
The basic particle swarm optimization algorithm finds the best target by the cooperation between particles.However, it is easy to fall into the local optimum in the real problem.So we often need to make some improvements on it.

1.2
Improvements on particle swarm optimization

Optimization based on the parameter
In formula 1, the parameter which affects the search ability of the algorithm is the inertia weight w.The parameters which affect the convergence rate of the algorithm is the accelerate weights In formula 2, the inertia weights 1

Optimization based on particle diversity
As the iteration times are increased, the similarity between the particles will get higher in the PSO algorithm.If the optimal solution is unable to be further optimized, this algorithm is very easy to fall into the local optimum.For this kind of situation, the diversity of the particles should be optimized.This optimization is mainly revealed in two aspects.One is to take random variation for each particle variable.Each particle variable contains the increase or decrease of the community ID which is represented in 1 or -1.During the process that each particle finds the optimal solution, those two integer variables are taken complementary mutation.This will be further elaborated in the part of community division.The other aspect is to take the random search for the current global optimum, gbest.As the particles converge towards the global optimal particle eventually, if gbest is the local optimal particle, the algorithm can't search for another space.Reference [4] puts forward a formula which is described as follows: ) In formula (3), the value c is the noise factor which is set in the interval [0,1], , ( ) In this essay, combined with the conditions in social networks, the formula is designed as follows: Using the formula 4, the value gbest is randomly chosen a bit inversion and assigned to the position vector after being calculated.Experiments prove that this kind of random mutation can maintain the diversity of particles.

Community division 2.1 Particle fitness function
In order to add the termination conditions, this essay makes reference to the concept of modularity.Assuming that the network is divided into k communities, a k*k matrix E=   ij e is defined.In this matrix, ij e represents the proportion of the edges connect two community nodes i and j in all edges.The sum of the elements on the diagonal of the matrix Trace(E)= ii i e  is defined.In this matrix, the sum of the elements in each row is i a = j ij e  .It represents the proportion of the edges linked to the i th community of all edges.On this basis, the fitness function is: In formula 5, Q represents the proportion of the inside edges minus the expectations of the edges which connect both arbitrary sides.If the proportion of the edge inside is no greater than expected for any connection, Q = 0.The upper limit of Q is 1.The closer Q gets to the value 1, the more obvious the community structure is.

Particle coding
Reference [5] puts forward that each particle variable stores the ID number of the current community.Because the number of the divided community is unknown, the initialized ID number of the community is set randomly from 1 to the total number of the nodes.If the community division is reasonable, the convergence of the algorithm can be accelerated.So this essay proposes a new particle coding scheme.Each particle variable stores the current ID number of the community and the increase or decrease of the community ID.The increase or decrease of the community ID is represented by 1 and -1.-1 represents that the current community ID minus 1 and 1 represents that the current community ID plus 1.At each period of time, the modularity is decided by both of them.These two variables are coded into the coding chains.For an example, if there are eight nodes, each particle on the search space at each time is represented as follows: 87287425-11-11-111-1 In this coding chain, 87287425 is the distribution of the ID numbers in eight communities , 11-11-111-1is the increase or decrease of the community ID direction.

Penalty function strategy
During the community division, the decision variables are affected by various constraints such as community ID, modularity and so on.Under these constraints, if the algorithm runs beyond the scope of constraint a little bit, the optimal solution will not be reached.By using the penalty function strategy, the problem is transformed into no constraint problems.

Penalty function
In the community division, if the modularity does not meet the prescribed requirements, some isolated points are generated.Isolated points are dissimilar to the majority of the nodes and their appearance is as a result of improper community convergence direction.In this essay, isolated points are avoided by adjusting the penalty factor.The core idea is to increase the weight coefficient as the error between the modularity and the expected modularity is increased.When the error meets the maximum limit, the punishment will be infinite in order to timely eliminate those particles.
Its mathematical formula is as follows: In formula 6, Q 0 represents expected modularity, Q represents the actual modularity, s represents penalty factor of the isolated points.

2rir2 Integer Penalty function
In the improved PSO algorithm, the projection of the random solution found by particles should be 1 or -1.If the increase or decrease of the community ID direction during the random solution is not 1 or -1, integer penalty function is designed in order to eliminate the particles which do not meet the requirements.The formula is as follows: In formula 7, con represents the integer penalty function, represents the situation when y is not -1 or 1.
Taking the two custom penalty functions into consideration, the extended objective function is improved as follows: In formula 8, F represents the extended target function, Q represents the current modularity, con represents integer penalty function, f represents the penalty function.

Mutation strategy
In the process of finding the optimal solutions, taking random mutations for the particles can enhance the search efficiency.This paper puts forward mutation strategy for the increase or decrease of the community ID.For example, the increase or decrease of the community ID is 1 1 1 1 -1 -1 -1 -1, this series of the integer number is -1 -1 -1 -1 1 1 1 1 after taking the bit-inversion mutation.In this paper, when taking the particle mutation is determined by fitness change rate delta.Fitness change rate is the rate of best position found by particles in the history and the mathematical expression is as follows: When the Delta is less than a certain threshold, the mutation strategy is taken.This strategy can save the cost of time and space because of the continuous iterations.

Scrolling calculation for the Dynamic community
Traditional community divisions are mostly applied in the static social network which is an analysis of the static network.However, the interaction among the members changes over time in real social networks.Static network is usually a snapshot of the social network but the structure of real social networks often changes and connection between the nodes is dynamic.The newest community structure can no longer achieved.For this kind of situation, this essay proposes a daily scrolling calculation which is applied to analyze the evolution of the network according to the separate community structure in the discrete time.
According to the current social network structure, the nodes in the community are monitored and current value of modularity is calculated.In Fig. 1, operation 1 is monitoring current nodes && calculating current modularity and operation i is monitoring nodes && calculating modularity.Algorithm is described as follows: Step 1 (Initializing the community): Traversing the network and assigning the community ID to each node.Coding the particle and initializing the population size m and the number of iterations (generally 3 to 4 times of population size).
Step 2 (Finding the optimal solution): Duplicating the particle's current location to the historical location and calculating the fitness degree of the current position.
Finding the particles which have the highest fitness degree and making their position vector as the global optimal position.
Step 3 (Iterative evolution): In every update of the iteration, the particles update themselves according to the two "extreme values".One is the optimal solution that the particle finds by itself.The other is the global optimal position.
Step 4 (The terminating conditions): If the current number of iterations is no less than the maximum number of iterations which is set from the start or meets requirements of minimum error, the iteration is terminated.Otherwise, the algorithm goes into step (3) to continue the iteration.

Experiments and Result Analysis
In order to verify the effectiveness of the community division, this paper adopts two commonly used data sets including Zachary data set and arXiv data set.Zachary data set reflects the true relationship in a karate club.This network contains 34 nodes and 78 edges.
Zachary data set is a commonly used dataset in the community division, so the correctness of its community classification can be measured.This paper uses the following formula to measure the correctness: In formula 10, s w represents the number of nodes in the correct division, 0 w represents the total number of the nodes.This paper will make comparisons among the three algorithms below.Reference [6] mentioned that Fast -Newman algorithm is the quick Newman algorithm based on the modularity.

Table1. The Accuracy Compared with Three Algorithms
As can be seen from the table1, the best result comes from the improved PSO.In its results, the correct divided nodes accounted for 97% and the value Q meets 0.47 closed to the maximum.
Then, we set Zachary data as the input data and run the algorithms.The tendency of the modularity is recorded in Fig. 2.

Fig.2 Comparisons of Modularity in the Static Community
As can be seen from the Fig. 2, the value of modularity is increased during the iteration.The modularity is steady at the end of the iteration.The improved PSO this paper proposed has a higher value of the modularity at each time of iterations.
Zachary data set is a static community but the real community is dynamic.This paper uses the airXiv data set as the initial data set and adds the behavior of network changes in the community.We gather 24 time in a day and start the community division.The CNM algorithm is used as the comparison.Reference [7] mentions that CNM algorithm is developed based on Newman fast algorithm.

Fig.3 Comparisons of Modularity In The Dynamic Community
As the Fig. 3 can be seen, this algorithm can obtain a better value of the modularity according to the latest community structure.When calculating modularity at each point of the time, the result of the improved PSO is higher than CNM algorithm does.

r Conclusion and prospect
In this paper, the particle variables are designed and coded creatively on the basis of the basic particle swarm optimization algorithm and other community division algorithms.The fitness function of the particle swarm is established by referring to the concept of modularity.We also put forward the strategy of the penalty function to eliminate the particles which do not meet the requirements.Furthermore, we adopt the mutation strategy to improve the efficiency of the algorithm.In order to extend the scope of the application of this algorithm to the dynamic community, scrolling calculation method is put forward which is to monitor the nodes of the current network and calculate the value of the modularity.But due to the limitation of the theoretical level, there are still many aspects to be improved such as the simple particle fitness function and the value of the modularity can be calculated by every hour and so on.

1 c is reduced and 2 c
the time of iterations increases, max IT is the largest time of iterations and iteration is the current time of iterations, / max iteration IT is in the interval (0, 1].As the time of iterations increases, is increased gradually.a, b and  are the constants which are set between 1 and 2.

d random l u is a random number which is between d l and d u ,the value d l and d u
are the part of position vector.