Abstract

Optimizing average path length (APL) by adding shortcut edges has been widely discussed in connection with social networks, but the relationship between network diameter and APL is generally ignored in the dynamic optimization of APL. In this paper, we analyze this relationship and transform the problem of optimizing APL into the problem of decreasing diameter to 2. We propose a mathematic model based on a memetic algorithm. Experimental results show that our algorithm can efficiently solve this problem as well as optimize APL.

1. Introduction

Following the introduction of models for small-world and scale-free networks, much research has been devoted to analyzing network characteristics [15]. In particular, there has been a focus on finding indices to quantify features of network structure such as structural entropy, robustness, or modularity [68]. These indices play an important role in measuring specific performance aspects of networks, and optimizing them can help to improve network performance.

Average path length (APL), the average shortest distance between all nodes in a network, is not only a measurement of static characteristics such as connectivity and robustness but also an important control variable in dynamic processes, such as the spread of diseases or target searching [911]. Optimizing APL has also attracted attention in the field of structural optimization. Decreasing APL by adjusting nodes or edges can effectively enhance the transfer efficiency and synchronization ability [1217]. In addition, optimization of APL has also been widely used in urban planning and site selection [14, 18, 19]. Xuan et al. [20] proposed a simulated annealing model to optimize APL in order to speed up convergence. Keren [21] employed a spectral technique to reduce APL in binary decision diagrams.

In order to optimize APL, many scholars focus on adding a given number of edges to produce the largest decrease in APL. These added edges are called “shortcut” edges and the problem of finding the best set of shortcut edges is defined as the “shortcut-selection” problem [22]. A series of methods have been proposed to solve this problem. Meyerson and Tagiku proposed an approximation method, which involved finding a source node and then connecting other nodes to this node to decrease APL [22]. Parotsidis et al. analyzed the exact effect of a single edge insertion on APL and proposed the EdgeEffect Algorithm to maximize the effect of edge insertion [23]. A greedy algorithm, which adds edges one by one and which makes the maximum reduction of APL for each added edge, has proved to be efficient [24, 25]. These methods have solved the shortcut-selection problem to some extent. However, a common phenomenon has been ignored in the process of adding edges. In experiments to optimize APL by adding edges, we find that no matter which method is used to add edges, there always exists a turning point at which APL begins to decrease linearly as more edges are added. This phenomenon can be related to the network diameter.

In this paper, we define the network diameter at the turning point as the “critical diameter,” and analyze both this critical diameter and APL in the process of adding edges. We transform the problem of optimizing APL into the problem of optimizing the critical diameter. Specifically, we focus on adding the minimum number of shortcut edges to make the network diameter decrease to 2. Research on predicting missing links has attracted much attention in recent years, the algorithms of which can extract missing information or identify spurious interactions [2629]. Gao et al. analyzed the feature of predicted network, and they found the network diameter and APL shows a negative linear relation to all of the tested prediction methods [29]. Therefore, our research can also provide some a priori knowledge in designing the method of link prediction. In the next section, we introduce the critical diameter and explore the special relationship between critical diameter and APL; the algorithm for optimizing the critical diameter is proposed in Section 3; Section 4 gives results of testing our method on generated networks; our conclusions and further work are presented in Section 5.

2. Critical Diameter and APL

Network diameter, the maximum path length for all pairs of nodes, is closely related to APL; they both contain information about connectivity and transfer efficiency [30, 31]. Imase and Itoh gave the inequalities to describe the static relationship between network diameter, , and APL [32]. In the dynamic process of adding shortcut edges, there exists a turning point, as shown in Figure 1. APL declines nonlinearly with the number of added edges until a turning point and then decreases linearly as increases further. We compute the path length between every pair of nodes and find that the longest path length of the network is larger than 2 before reaches the turning point (i.e., the network diameter ); when reaches the turning point, the diameter equals 2 (). This is because if , a new added edge between a pair of nodes can only change the path lengths between these two nodes from 2 to 1 but cannot change the path lengths of other pairs of nodes, and the APL can be reduced by just for each added edge, which constitutes a linear decline.

In fact, the APL can be computed when the network diameter declines to 2. For a network with nodes and edges, when the diameter equals 2, the path length of every pair of nodes will be equal to or less than 2. If we add edges to the network, the number of pairs of nodes whose path length equals 1 will be and the number of pairs of nodes whose path length equals 2 will be . Therefore, the value of APL achieved by adding edges with the network diameter is

In the process of adding edges, APL will ultimately decrease linearly and will become equal to the term on the right of (1). Figure 2 shows the results of 20 simulations adding edges randomly to decrease APL; the maximum, mean, and minimum APL of these 20 runs are shown. The three curves become overlapping and linear when the number of added edges becomes large enough. The curve of the minimum APL becomes linear earliest, while the curve of maximum APL is the last to become linear.

In this case, if we attempt to minimize APL by adding a large number of shortcut edges, we can find a solution for which adding a small number of edges has decreased the diameter to 2 and add the remaining edges randomly. Therefore, the problem of optimizing APL can be transformed into the problem of finding shortcut edges that quickly decrease the diameter to 2.

Here, we propose a formal definition for the diameter when APL begins to decline linearly.

Definition 1. In adding edges to a network, the network diameter declines to 2. The network diameter in this case is defined as the “critical diameter,” denoted as .

If we add shortcut edges to make the network diameter become , the set of these shortcut edges must be the most optimal solution for minimizing APL by adding edges. If there exists a solution which can make APL lower by adding another set of edges, the number of pairs of nodes whose path length equals 1 must be bigger than , which cannot be realized by adding only edges. This kind of relationship between APL and suggests that if we can minimize the number of added edges to reduce the diameter to , the APL can efficiently decrease to its lowest level.

In this paper, we focus on how to decrease the diameter to 2 by adding the minimum number of edges; this problem is defined as “optimizing the critical diameter.” The objective function can be formulated as where represents the number of added edges and represents the path length between node and node .

It should be noted that the problem of optimizing the critical diameter is an NP-hard problem. Given a connected network with nodes and edges, there can be ways of adding shortcut edges. Since computing the network diameter costs at least , finding the best set of shortcut edges requires , which is high even for a small network.

An efficient way to optimize is to establish connections between the highest-degree node and the rest of nodes, which adds edges ( is the number of nodes and is the highest degree of the network). Then the network will definitely become a Star Network with the highest-degree node at the center. We call this method “HDN” (connecting to the highest-degree node).

However, HDN fails to generate the globally optimal solution. As shown in Figure 3, node 6 is the highest-degree node of the network. To decrease the diameter to 2 by HDN, we should add two edges connecting nodes 3 and 6 and nodes 4 and 6; but if we establish a connection between nodes 3 and 4, the network diameter can also become . Thus we need to design a more efficient method to decrease the network diameter to 2.

3. The Algorithm for Optimizing Critical Diameter

Memetic algorithms combined with techniques of long-distance and short-distance search have proved to be effective in solving NP-hard problems [33, 34]. In this section, we introduce a memetic algorithm that combines a genetic algorithm and a heuristic local search to optimize critical diameter. We call the method “MA-CD.”

3.1. Framework

The framework of MA-CD is shown in Algorithm 1. We first input some necessary parameters such as the maximum iteration number and the population size as well as the adjacency matrix of the network. We generate a population by the function . Next, we repeat the process for optimizing until the number of iterates is , or the objective function remains unchanged for 50 iterations. In repeating this process, we first use to select the parent population for genetic operations; then we apply two-point crossover and one-point mutation to generate offspring chromosomes by ; we apply some a priori knowledge to carry out a local search on the offspring chromosomes by ; is used to construct a new population with better performing chromosomes. Finally, we output the results.

Input: the maximum iteration number: ; population size: ; mating pool size: ;
         tournament size: ; crossover probability: ; mutation probability: ; the initial network
         adjacency matrix: .
  ;
Repeat
;
;
;
;
Until
Output: the number of added edges, the position of added edges.
3.2. Representation and Initialization

We aim to find those positions at which we should add edges to optimize . To this end, we find all the positions of these nonexistent edges and encode them as genes in the chromosome . represents adding a new edge to the corresponding position, while represents not adding an edge. Figure 4 shows an illustration of the representation. We identify the nonexistent edges between nodes 1–3, 1–4, 2–4, 2–5, and 3–5. For the initial network, all the genes are assigned 0 because there are no added edges. If we assign 1 to the first and second gene as shown in Chromosome 2, then the edges between nodes 1–3 and 1–4 will be added. Similarly, when we assign 1 to the first and fourth gene, the edges between 1–3 and 2–5 will be added.

In the initialization, we generate a population of chromosomes and randomly assign 0 or 1 to every gene in the chromosomes.

3.3. The Genetic Operation

The genetic operation consists of two-point crossover and one-point mutation. The crossover operation is described in the appendix. Given two parent chromosomes and , we randomly choose two points, and then the parent chromosomes are divided into three parts by two chosen points. Next, we randomly select one part, and all the genes in this part are swapped between and with probability (the crossover probability), to generate two offspring chromosomes. Mutation is also described in the appendix, where we choose gene with some probability and reassign to it.

3.4. Local Search

By incorporating some a priori knowledge, a local search can efficiently reduce useless exploration and speed up the convergence of algorithms [35]. We find most optimal networks appear to be disassortative in experiments to optimize APL, and we propose a “disassortativeness-learning” technique to apply this knowledge into local search. Then, we use “Edge-Adding Learning” and “Edge-Dropping Learning” to find the local minimum of our solutions.

3.4.1. Disassortativeness Learning

The detailed algorithm is described in the appendix. We first find the added edge which has the minimum sum of the degrees of the two connected nodes and drop it. Then we find the nonexistent edge which has the maximum difference in degree between the two disconnected nodes and add this new edge to the network.

3.4.2. Edge-Adding Learning

As described in the appendix, we first judge if the diameter of the updated network has decreased to 2. If the diameter exceeds 2, we randomly add edges until the diameter equals 2. Then we output the offspring chromosomes with the updated network diameter equal to 2.

3.4.3. Edge-Dropping Learning

We select every added edge and check whether dropping the added edge will leave the network diameter unchanged. If the drop cannot increase the diameter, we drop the added edge; if the drop increases the diameter, we do not drop the added edge. As a result, some useless added edges may be dropped. The appendix gives the specific procedure of Edge-Dropping Learning.

3.5. Complexity Analysis

The time complexity of MA-CD with the network size , the number of edges of initial network , and added edges can be formulated as follows. Each iteration requires times for crossover and times for mutation, where is the size of mating pool for the genetic operation. Since computing the network diameter costs , the total time of genetic operation is . For local search, when executing Disassortativeness Learning, the time to update the matrix is ; finding the added edge with the minimum sum of degree requires ; finding the nonexistent edge with the maximum difference in degree requires . The time for Disassortativeness Learning is . To perform Edge-Adding Learning and Edge-Dropping Learning, we should check at most genes for each chromosome and it will cost at most to compute the updated diameter of all changed genes. Therefore, the overall time complexity of MA-CD for each iteration is .

4. Experiments

In this section, we test the performance of MA-CD on different computer-generated networks. The experiments were carried out on a 2.40 GHz CPU, 4.00 GB Memory, and Windows 10 operating system. We use MATLAB to execute the procedure. Table 1 shows the parameters necessary for the experiments.

Our proposed algorithm is carried out ten times on two different network structures: a random network structure and a regular network structure. The detailed information of the two network structures is shown in Appendix. We compare the solution of MA-CD with that of two other methods: adding edges between the highest-degree node and the other nodes as described in Section 2, denoted as HDN; a kind of greedy algorithm which adds edges one by one, with each of the added edges giving the minimum diameter, denoted as “GA-CD” (greedy algorithm for optimizing critical diameter). We compare the minimum and mean value of MA-CD with the minimum value of HDN and GA-CD (the minimum and mean value of these two methods are equal).

Compared with the other two methods for optimizing the critical diameter, MA-CD can always find fewer edges to make the network diameter become the critical diameter, as shown in Figure 5. Further, we find that the results for MA-CD and HDN appear to become the same as the network size becomes larger. In other words, HDN becomes more efficient for larger networks in optimizing . Compared with MA-CD and HDN, GA-CD has worse performance in optimizing , especially for regular networks, even though the greedy strategy performs well in decreasing APL or diameter [2325].

We show that our proposed method can also efficiently decrease APL. We compute the optimal networks’ APL with shortcut edges added by the MA-CD method, and then this APL is compared with that obtained using the other two methods with the same number of edges added: the EdgeEffect Algorithm, which maximizes the effect of edge insertion to optimize APL [23], denoted as “EA-APL”; a greedy algorithm adding edges one by one, with each of the added edges minimizing APL, which has previously been shown to be effective [24], denoted as “GA-APL.”

The optimal networks’ APL is shown in Figure 6. EA-APL performs worse for random networks, while GA-APL becomes less efficient in regular networks. MA-CD gives the best performance; it can always decrease the APL to its lowest level compared with the other two algorithms. Thus, we conclude that MA-CD can be used to optimize APL. If we can add a large number of edges to decrease APL, we just need to find a solution for optimizing and then add the remaining edges randomly.

5. Conclusion

In this paper, we find a critical case in which the network diameter declines to 2 when a new edge is added to the network in the process of solving the shortcut-selection problem. Using the relationship between APL and the network diameter, we transform the problem of optimizing APL into the problem of finding shortcut edges to quickly decrease the diameter to 2, which we define as the problem of optimizing the critical diameter. Further, we suggest a method to solve this problem based on a memetic algorithm. The experimental results show that our proposed method can efficiently optimize the critical diameter and is efficient in solving the shortcut-selection problem to decrease the APL.

Appendix

See Algorithms 2, 3, 4, 5, and 6 and Table 2.

Input: The parent chromosomes and .
The number of nonexistent edges of the initial network: . Crossover Probability: .
   ;
(3) ;
(4) randomly generate two positions and , which obey: ;
(5) randomly generate ;
(6) if
(7)  randomly generate ;
(8)  if
(9)   for ; ;
(10)    ;
(11)    ;
(12)  ;
(13)  end for
(14)  end if
(15)  if
(16)   for ; ;
(17)    ;
(18)    ;
(19)   ;
(20)   end for
(21)  end if
(22)  if
(23)   for ; ;
(24)   ;
(25)   ;
(26)   ;
(27)   end for
(28) end if
(29) end if
(30) Output: and .
Input: The parent chromosome .
The number of nonexistent edges of the initial network: . Mutation Probability: .
   ;
(3) randomly generate ;
(4) if
(5) randomly generate q∈(0,1];
(6)   = ceil(N non*q);
(7)  for ; i mt; i++
(8)   randomly generate r∈(0,1];
(9)   = ceil(N non*r);
(10) ;
(11) end for
(12) end if
(13) Output:
Input: the offspring chromosome: . The adjacency matrix of the initial network: .
  ;
(3) Update the matrix by decoding ;
(4) Find the element of with the minimum sum value of nodes pair degrees;
(5) ;
(6) Find the element of with the “maximum” difference value of nodes pair degrees;
(7) ;
(8) Output: .
Input: the offspring chromosome: . The adjacency matrix of the initial network: .
  ;
(3) Update the matrix by decoding ;
(4) Repeat
(5) randomly choose a gene ;
(6) Until the diameter of the updated matrix
(7) Output: .
Input: the offspring chromosome: .
The adjacency matrix of the initial network: . The number of nonexistent edges of the initial network: .
  ;
(3) Repeat
(4) islocal← TRUE;
(5) rearrange the sequence number of the chromosome seq=randperm(N non);
(6)  for ; ; i++
(7)   if (seq(i)) = 1
(8)    (seq(i)) = 0;
(9)    if the diameter of updated network D = 2
(10)     islocal← FALSE;
(11)    else
(12)    ;
(13)   end if
(14)   end if
(15) end for
(16) Until islocal is TRUE;
(17) Output: .

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is jointly supported by Key Project of the National Social Science Foundation of China (Grant no. 12AZD110) and Humanities and Social Science Talent Plan, Fundamental Research Funds for the Central Universities (Grant no. 2011jdgz08).