Analysing Big Social Graphs Using a List-based Graph Folding Algorithm

. In this paper, we explore the ways to represent big social graphs using adjacency lists and edge lists. Furthermore, we describe a list-based algorithm for graph folding that makes possible to analyze conditionally infinite social graphs on resource constrained mobile devices. The steps of the algorithm are (a) to partition, in a certain way, the graph into clusters of different levels, (b) to represent each cluster of the graph as an edge list, and (c) to absorb the current cluster by the cluster of the next level. The proposed algorithm is illustrated by the example of a sparse social graph.


Introduction
Currently, there is a growing interest in modeling and analyzing interrelationships between professional, religious, ethnic, and other types of social groups [1][2][3][4]. The knowledge underlying these models comprises both the knowledge of personality psychology and social psychology of interpersonal relationships as well as anthropology, sociology, theology, ethnography, conflict theory, and history.
The basic unit of modeling here is either an actor (human person or group) or, in Actor-Network Theory (ANT) [5], an output of actor's activity -a material artefact. Both are collectively called actants.
In terms of graph-hypergraph paradigm [6], an entity (actor or actant) can be represented by a vertex in a graph or by a hyperedge in a hypergraph. In second case, a set of vertices forming a hyperedge can be associated with certain characteristics of an entity: situation awareness, evaluation of event, communication or individual activities.
The two most common representations of a social graph are an adjacency matrix (sociomatrix) and an incidence matrix [7]. Both are memory-intensive, especially for sparse high-dimensional graphs, when most of the elements of the matrix have 0 value.
In this paper, we use a list data structure [8,9] to represent and analyze sparse social graphs. Furthermore, we describe a list-based algorithm for graph folding that eventually makes possible to analyze conditionally infinite social graphs on resource constrained mobile devices.
Using a list data structure to analyze social graphs was previously considered in [10][11][12][13]. There are two ways to represent a graph as a list. Adjacency list describes the set of neighbors of a vertex in the graph; its main data structure is an array of linked lists, one linked list for each vertex. Edge list is a list of node pairs (edges).  As an example, consider an undirected graph GU in Fig. 1 and an appropriate directed graph GD in Fig. 2.
The adjacency list of a graph is an n-element array of linked lists, one for each vertex, where the ith element points to a linked list of edges for vertex i. This linked list represents the edges by the vertices adjacent to vertex i. Every element (node) of the list consists of two fields. The left field is a pointer to the vertex index, e.g. 17, while the right field is a pointer to the next node as shown in Fig 3. Adjacency list of the graph GU is Xc = (17, 12, 13, 18, 21), (12,17), (13,17), (18, 17), (21, 17) as shown in Fig. 3.
The edge list of the graph is a list, or array, of edges. When representing undirected graph with an edge list, all the edges are repeated in reverse order.
Edge list for graph in Fig

Operations on the lists
In this section, we consider split and join operations on the lists. Appropriate functions were described in [8].  To split the list, we use functions hd (x) and tl (x). These functions get the head (the first element) and tail (all but the first element) of a list, respectively.
We can make a conclusion that operators hd (x) and tl (x) are inversely related to cons: hd (cons (a, b)) = a, tl (cons (a, b)) = b. The reverse rule is not always true. cons (hd (x), tl (x)) returns a new element, that contains the same pointers as the source element x. Function cons (hd (x), tl (x)) returns a copy of the cell x, but not the cell x itself.
The concept of a doubly linked list is given in [9]. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. Each node of a doubly linked list contains three fields: an integer value (key), the link to the next node (next), and the link to the previous node (prev) as shown in Fig. 5.  = (x1, x2, x3, x4, …, xn). Given a node xi, next (xi) refers to the next node, and prev (xi) refers to the previous node. If prev (xi) = NIL, then xi is the entry point called the head of the list hd (x). If next (xi) = NIL, then xi is the last node in the list called the tail of the list tl (x).
There are 2-element and 5-element lists among the adjacency lists considered earlier. The 5-element list x = (17, 12, 13, 18, 21) implies that we can get from node 17 to nodes 12, 13, 18 and 21. Meanwhile, there's no other way to get from node 12 to node 13, but through the node 17.
Eventually we end up with the edge lists discussed earlier. From this we can conclude that it is reasonable to represent graphs as lists of edges.

Graph folding algorithm
Let G1 and G2 be graphs and f: G1→G2 be a continuous function. Then f is called a graph map, if (ii) For each edge e  E(G1), dim(f(e)) ≤ dim(e).
A graph map f: G1→G2 is called a graph folding if f maps vertices to vertices and edges to edges, i.e., for each vertex v V (G1), f(v) is a vertex in V(G2) and for each edge e  E(G1), f(e)is an edge in E(G2). [14] In this section, we describe a list-based algorithm for graph folding in a sequence of steps (1) to (8).
First, we introduce the following notations: Nodes = {1…(n-1)} -the vertex set of a social graph; Nodej -the jth vertex in the graph; cNode -the source vertex in the graph; Clusteri -the cluster at the ith layer; Neighbors (cNode) -the set of neighboring (adjacent) vertices for cNode; C -the number of neighboring vertices. The basic steps of the algorithm are: 1) Define the level of the cluster, i=0; 2) Define the source vertex (cNode = Nodej); 3) Perform the search function to discover neighboring vertices for cNode, C = get Neighbors (cNode); 4) If C≠0, then i = i +1; 5) Define the cluster of the ith level as an aggregated set of source vertex cNode and its neighbors Neighbors (cNode), herewith the cluster is a list Clusteri = cons (cNode, Neighbors (cNode)), where hd (Clusteri) = cNode, tl (Clusteri) = Neighbors (cNode); 6) Fold the vertices of the cluster at the ith level into a new source vertex cNode, cNode = Clusteri; 7) return to step (4); 8) if С=0 then terminate.
We explain algorithm using the example in Fig. 6. The initial graph has n vertices numbered from 1 to 42. They are connected by edges numbered from 1 to 73. The size of incidence matrix is 42x73. The size of adjacency matrix is 42x42. Both matrices are sparse.
We emphasize the fact that some merging vertices are connected by multiple edges, e.g. in the list (43, 9) the vertices are connected by three edges: 16, 17 and 18. We note the case when two or more merging vertices are linked not only to the source vertex but are interconnected to each other. For appropriate mapping of merged vertices, we add list (9,10).
Edge lists with merging vertices in the cluster of the 3 rd level are produced accordingly:

Conclusion
In this paper, we described a list-based algorithm for graph folding. According to the results of experiments, by reducing calculation redundancy, the above algorithm is 30% less memory-intensive and 20-30% faster, depending on the density of the graph, in comparison to algorithms using adjacency/incidence matrices and iteration over vertices. In our case it was the only possible algorithm for analyzing conditionally infinite graphs on resource constrained mobile devices.