Abstract

Networks are prevalent in real life, and the study of network evolution models is very important for understanding the nature and laws of real networks. The distribution of the initial degree of nodes in existing classical models is constant or uniform. The model we proposed shows binomial distribution, and it is consistent with real network data. The theoretical analysis shows that the proposed model is scale-free at different probability values and its clustering coefficients are adjustable, and the Barabasi-Albert model is a special case of in our model. In addition, the analytical results of the clustering coefficients can be estimated using mean-field theory. The mean clustering coefficients calculated from the simulated data and the analytical results tend to be stable. The model also exhibits small-world characteristics and has good reproducibility for short distances of real networks. Our model combines three network characteristics, scale-free, high clustering coefficients, and small-world characteristics, which is a significant improvement over traditional models with only a single or two characteristics. The theoretical analysis procedure can be used as a theoretical reference for various network models to study the estimation of clustering coefficients. The existence of stable equilibrium points of the model explains the controversy of whether scale-free is universal or not, and this explanation provides a new way of thinking to understand the problem.

1. Introduction

Since the emergence of two landmark network models, the small-world model by Watts and Strogatz (WS model) [1] and scale-free network model by Barabasi and Albert (BA model), research on network models is rapidly increasing [2]. Network models used in data research include the online social networks [3], mail networks [4], biological networks [5], annotated networks [6], and online dating market [7]. Recently, complex networks have been extensively evaluated and applied in several applications by physicists [815]. Most of these networks have been shown to achieve a stationary state and become scale-free. A common conclusion in network research is that either most or all real-world networks are scale-free [1621], and the degree, , of this network follows power law distribution. Besides, this scale-free network has mainly been employed in network science [2, 2225]. Moreover, studies have evaluated how a scale-free structure influences system operation [2027]. Most studies consider networks to be scale-free and small-world. But some scholars have found that some networks are not [2830]. Golosovsky reports that a network of citation distributions is not scale-free [31]. Some subnets of scale-free networks are sampled by Stumpf et al. These networks are also not scale-free [32].

Complex network models can describe a large number of systems. The growing network models have the following characteristics: clustering coefficient, average path length, community structure, and degree distribution. Generally, it has been postulated that the scale-free system can be generated using the preferential attachment mechanism [8]. Besides the BA model, representative development models such as the Price model with adjustable power rate [33, 34], the model proposed by Holme and Kim (HK model) with adjustable clustering coefficient [35], the fitness model based on individual differences [36], and the local-world evolving network model based on local information [32] can be used to describe the preferential attachment mechanism. Moreover, many real networks have been shown to have some common features, such as power law degree distribution, small average shortest path length, and high clustering. However, these classical models do not fit all the three properties. For example, the WS model has a small average shortest path length and high clustering with no power law degree distribution, whereas the BA network exhibits low clustering [3741].

To some extent, the numerical simulation of HK model shows a small average path length, high clustering, and power law distribution. However, they do not provide analytical results due to model complexity [35]. Many network models have provided simulation experiment results. However, few studies have evaluated the theoretical perspectives of such models.

The basic assumption derived from the BA and HK models is that, when each new node joins the network, its edges link to the old nodes. At the initial moment, the new node’s degree is constant, , which is an assumption shared by many other models. Moreover, the initial distribution of the WS model is uniform. However, this is not the case in real networks. For instance, at the initial moment, there could be different numbers of one’s friends in the social network, new paper references in the citation network, and new paper coauthors in the scientific cooperation network. The Price model has shown reference number distribution [33], which can be thought of as degree distribution of new nodes at the initial time. Findings from this study indicate that the distribution is neither uniform nor constant, and with an increase in reference number , the ratio of papers promptly dropped. Given that scale-free networks have been extensively used for numerical simulations and experimentation purposes [2, 18, 4244], studies on generation mechanisms of scale-free structures are very meaningful. Initial distribution is the foundation of the scale-free network model that influences network evolution and development. Therefore, it is important to develop a model whose initial distribution degree agrees with the existing network.

In this study, the designed model is evaluated to attain a binomial distribution, which agrees with some real network systems. We aimed at developing a model with all three properties: power law degree distribution, small average shortest path length, and high clustering. Our theoretical analysis process can provide a feasible reference for this kind of work.

The main contents are organized in the following way: first, we show the evolution rules of the network model; second, we establish the differential equations using probability theory and mean-field theory and prove that the network degree distribution is consistent with power law distribution; then, we prove the existence of the limit of the network average clustering coefficient and derive the analytic expression; finally, we verify the theoretical results by numerical simulations, compare them with some real networks, and summarize and analyze the conclusions.

2. Network Model

This model is based on the BA model, but for each time , the number of new edges given by the newly added node is a random variable on the basis of binomial distribution. denotes the degree of node . Network development rules are as follows: (1)Initial moment: the network contains points, and it is fully connected(2)Growth: at time , a new point is added to the network(3)Preferential attachment (PA): connected to an existing node following the probability proportional to its degree(4)Triad formation (TF): a node is randomly selected from the neighbors of and connected via the new node according to probability (5)Steps (3) and (4) are repeated times

The network evolution model is shown in Figure 1. Our model produces the initial binomial distribution. Besides, each new node averagely introduced edges; therefore, the average network degree was .

When connection probability , it degrades to the BA model. is the model control parameter. It can adjust the final clustering coefficient of the network model. The larger is, the greater the clustering coefficient is. These results were verified by theoretical analysis and numerical simulation.

The PA mechanism is following the probability proportional to its degree, because people are more likely to connect with people who are social. The TF mechanism is designed to fit our social postulate that we are always more likely to know a friend of our old friend than a person who has nothing to do with us, and Newman provided evidence for this mechanism [10].

3. Degree Distribution

Degree distribution of the model is analyzed using the mean-field theory.

Proposition 1 (degree distribution conforms to power law distribution). At time , when a new node joins the network, degree change probability of node , which is caused by each connection operation can be divided into two aspects: the probability of being directly connected through the new point (PA) and being connected due to its neighbors (TF). Therefore, it is where represents the neighbor set of node .
Moreover, the equation can be obtained by repeating this process times. So, can be obtained by solving equation (2): So, one can get And point addition is uniform, that is, probability .
Then, one could obtain the degree probability density function This shows that our model conforms to power law distribution.

4. Clustering Coefficient

The definition of clustering coefficient is based on calculation formula of reference [40]: where represents the number of direct edges connected between all neighbors of node .

One notes the probability that node relates to node at time . Without loss of generality, one can assume that and obtain

Proposition 2 (limit of the average clustering coefficient exists). We consider the change in clustering coefficient of node at time . This change involves two events. First, when new node connects to node by PA rule and its neighbors by TF rule. Second, when node connects to one neighbor of node by PA rule, and node is randomly chosen by TF rule. The second event is then said to have occurred times, which makes it a binomial distribution.
The probability that a neighbor of is connected by PA can be calculated as follows: Then, probability of the first event is The change in clustering coefficients due to the first event is The probability that PA connects to the neighbor of node and node is connected using TF at the same time is Then, the probability of the second event is The change in clustering coefficients caused by the second event is Due to this , chances that the second event occurs more than twice are small. For the convenience of subsequent analysis, only the case is considered in the calculation. These results imply little effect on this approximation but greatly facilitate the analysis.
Based on the above analysis, one can get Subsequently, by inserting (3) ((9)–(13)) into (14), one obtains (15) after some simplify operations, The initial condition is Analytical solutions can be obtained from (15) by solution theory of nonhomogeneous linear ordinary differential equations. The results show that some terms in the analytical solution are very small relative to the main part, and these terms are difficult to integrate in the follow-up, so we can approximate by ellipsis. For , these smaller items are discarded. Therefore, the analytical solutions could be rewritten as where .
Let us note the three terms in (17): , , and .
Then, average clustering coefficients were calculated using the formula If we let , then, analytical results of can be obtained easily.
Consequently, we used the mean value theorem for integrals to approximate since we could not get the analytical result. where .
Hence, it can be rewritten as However, substituting them into (18), one obtains To investigate the network’s stable clustering coefficient when time is sufficiently large, the limit of (21) is calculated as where .
Therefore, the network’s clustering coefficient has a stable theoretical value, which is only correlated to and . The clustering coefficient of the BA model tends to 0 with time, and the results of our model are more in line with the high clustering property of the real network compared with the classical BA model. This feature makes the local connection between the network nodes closer, which is beneficial to the stability of the network.

5. Simulation and Result Verification

Simulation was performed to validate the effect of the estimated results obtained from theoretical analysis. These simulation results are based on the average of 1000 networks constructed from the model, and the number of nodes in each system is 10000.

The probability that the newly added nodes connect the nodes is calculated as

That is, as shown in Figure 2, degree distribution of the recently added nodes at the initial moment conforms to binomial distribution. Compared to the hyperlink network of online dictionary entry [45], physicians-trust-relationship network [46], and the student dormitory interpersonal relationship network [47], the model results revealed that node distribution exhibits similar characteristics with binomial distribution when they initially enter the networks. Whether to establish a friend relationship is basically a random event obeying uniform distribution. Thus, the number of friendship relationships established between the new person and the roommate follows a binomial distribution (23). This indicates that the model agrees with real networks. This also confirms the result of (23). Besides, most development models, such as the BA and WS models, are improved through this property. The initial node distribution generated by these classical development models is either constant distribution or uniform distribution.

Simulation results in Figure 3 elucidate on our analytical findings. Distribution degree under diverse probability values meets the scale-free characteristic; when , the model degenerates into the BA model. Besides, as probability increases, the model’s distribution is shifted up by relative to the BA model’s distribution. Notably, the scale-free structure has a particular universality in real networks. Figure 3 shows the yeast protein interaction network [48], Erdős network [49], and bible vocabulary network [50]. All of them are approximately scale-free, which is in conformity with model results.

Scatter points in Figure 4 represent the distribution of all nodes’ clustering coefficients in the simulation networks. The line in the middle is given by equation (17). All node clustering coefficients are around the node clustering coefficient equation (17) obtained through the mean-field theory. The binning plot below shows they are consistent in the trend. The error is a result of some smaller terms that are neglected in the analytical solution. Clustering coefficients of all nodes are high, in line with the characteristic revealed by real networks, but the BA model’s result is that all nodes’ clustering coefficients approach 0.

Figure 5 illustrates the ability to estimate the network’s average clustering coefficient using the simulation data and the analysis result. Over time, the network’s average clustering coefficient tends to achieve a stable value that is not 0. Changes in clustering coefficients, obtained from equation (21), numerical integration from (18), or the simulation data, are the same, and errors between them are minimal. These results are consistent with real network data. For instance, the power grid network clustering coefficient is 0.08, the C. elegans neural network is 0.28, while the film actor’s network is 0.79 [1]. Notably, average clustering coefficients of these actual networks do not vary widely as the number of nodes increases or as time changes. Compared with the BA model, the results clearly show that the proposed model has a greater clustering coefficient.

Figure 6(c) demonstrates the relationship between the clustering coefficient, , and . It denotes that the initial number of friends and the small community’s linking probability determines the network’s clustering coefficient. Figure 6(b) shows that the larger the initial number of friends when each node joins the network, the smaller the network’s stable clustering coefficient becomes. Conversely, Figure 6(a) shows that the larger the linking probability in a small community, the larger the network’s stable clustering coefficient becomes.

Therefore, the clustering coefficient in these two exceptional cases can be calculated as

Small-world is an essential property of real networks, which means that the average distance between any two points in networks is very small. Figure 7 shows that as and decrease, the average length decreases, while the distance remains small. In this study, we use different parameters such as and to simulate the average degree of networks and employed the model to replicate real networks. The results show that our model has the same average shortest path characteristic as BA. Table 1 shows the obtained results. Average path length () of the simulated networks is very close to that of real networks, with only the Erdős network exhibiting an apparent error. By comparing model simulation experiments and real networks, our model exhibits small-world characteristic and it can reproduce the short distance characteristic of real networks.

6. Conclusion

Many studies have considered the preferential attachment mechanism to cause scale-free networks. In contrast, nonpreferential attachment is proposed to explain some non-scale-free network [5153]. A significant question is how to generate networks with high clustering coefficient, small-world, and scale-free characteristics. Findings from theoretical analysis are of great significance in construction of complex networks, especially in the design and control of complex systems [2225, 54].

Our study presents a network model that has adjustable clustering coefficients, scale-free, and small-world characteristics. Notably, initial degree of the nodes caused by our generation model is binomial distribution. Besides, it is neither a constant distribution as the BA and HK models nor a uniform distribution as the WS model. This feature is more consistent with actual display of data from existing networks [4547]. Therefore, our study provides an alternative way of studying the evolution model. These results provide a new train of thought for understanding the degree distribution of network.

Results from simulation experiments agree with our analysis results. The changes in clustering coefficients under various parameter changes can be obtained. Subsequently, an increase in increases the clustering coefficient, whereas a rise in gives a contrasting result. At , our model degenerates to the BA model, so this is a more general model containing BA model. TF follows each PA in our model, and the clustering coefficient attained is lower than that of some real networks, which makes it easy to theoretically analyze the model. Suppose we want to get a network with a higher clustering coefficient. In that case, we only need to modify our mechanism using TF for times instead of one in our model. The clustering coefficient between 0 and 1 can be obtained by adjusting the parameter , and the distribution is still scale-free. Nonetheless, this complexity makes the model inappropriate for theoretical analysis.

Formula (21) shows the analytical relationship between the clustering coefficient and and gives the method of adjusting the clustering coefficient theoretically. The effectiveness of the adjustment is demonstrated in the simulation section, and the simulation results of the relationship between and are given. Another conclusion is that there is a limit to the clustering coefficient; that is, it has a stable equilibrium point. This indicates that the clustering coefficient will not change with the change of network size after the network evolution is sufficiently large in time.

Many previous studies have shown the scale-free distribution of real networks while, recently, some different views have emerged. Broido and Clauset reported that scale-free networks are rare [15]. They test the universality of scale-free structure to a large corpus of nearly 1000 network data sets. They fit the power law model to each degree distribution and test its statistical plausibility. They find that scale-free networks are rare, with only 4% exhibiting the strongest possible evidence of scale-free structure and 52% exhibiting the weakest possible evidence. The model proposed may provide an explanation to these contrasting views. We have showed that the evolution of the network has a stable equilibrium. The scale-free structure is the final stable state. But this state only appears when the time is sufficiently large. Figure 2 shows initial degree distribution, with the network finally developing to power law distribution. Degree distribution of the network is always in the process of random evolution. Therefore, for all real network structures currently being studied, degree distribution is only a certain stage in the evolution process, rather than the final state of the network with sufficiently large time. Maybe, the limit state of network evolution is scale-free. Many real networks exhibit tail power laws and are not strictly scale-free because they have not yet reached maturity or stability in their evolution. It is shown that our model can provide a new idea for the understanding of degree distributions in network research.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 11502062.