Abstract

Social networks have become an indispensable part of modern life. Signed networks, a class of social network with positive and negative edges, are becoming increasingly important. Many social networks have adopted the use of signed networks to model like (trust) or dislike (distrust) relationships. Consequently, how to rank nodes from positive and negative views has become an open issue of social network data mining. Traditional ranking algorithms usually separate the signed network into positive and negative graphs so as to rank positive and negative scores separately. However, much global information of signed network gets lost during the use of such methods, e.g., the influence of a friend’s enemy. In this paper, we propose a novel ranking algorithm that computes a positive score and a negative score for each node in a signed network. We introduce a random walking model for signed network which considers the walker has a negative or positive emotion. The steady state probability of the walker visiting a node with negative or positive emotion represents the positive score or negative score. In order to evaluate our algorithm, we use it to solve sign prediction problem, and the result shows that our algorithm has a higher prediction accuracy compared with some well-known ranking algorithms.

1. Introduction

Signed network [1] is a kind of social network that consists of edges with positive and negative signs. Positive edges denote friendship, trust, or agreement, while negative edges can be enmity, distrust, or disagreement. Online social network websites have become a favorite place for many people to share their opinions. Signed networks are very useful to understand the relationships between online social network users.

How to measure the importance of nodes in a social network has always been an important issue in many data mining fields such as collaborative filtering [2], community detection [3], and link prediction [4]. Signed networks make the node ranking problem more complex because signs associated with edges suggest further information of nodes, but this is significant in many scenarios. For example, internet celebrities (famous in the network) of online social network can gain a lot from word-of-mouth marketing. However, a person can become famous either because many people like him or because a lot of people dislike him. An unwelcome celebrity will make the marketing fail. Therefore, it is crucial to recognize the difference between positive influence and negative influence. Furthermore, it is hard to say if the different influences might cancel each other out. For example, people with a lot of supporters and the same number of opponents will be more influential than ordinary people. So, it is attractive to rank nodes from both positive and negative views.

A classical node ranking method is known as PageRank [5], which measures a node from a global view. If there is a path from node A to node B, it can be viewed as A supporting B. Hits [6] is another classical ranking algorithm, which measures the authority of websites by external links. These methods assume the network only contains positive edges. It means a node that has many friends is not different from a node that has many enemies because the signs on the edges are not considered. In the wake of the development of signed networks, a growing interest goes to the trustability of a person by computing ranking nodes based on a criterion evaluating the trust worthiness. Some modified ranking algorithms [7, 8] separate signed network into positive subgraph and negative subgraph and then compute the corresponding ranks using PageRank or Hits. Some important global information may be lost in those modified methods. For example, node A has a friend B, and B has an enemy C. Methods [7] and [8] ignore the relationship between A and C. SRWR [9] could compute positive and negative scores of nodes in signed network, but these scores are from a personal view.

In this paper, we propose a novel ranking method for signed network. Our contributions are as follows:

A novel random walking model for signed network: this model can simulate human activity of visiting an online social network website. A walker visits nodes with positive or negative emotion which denotes the walker likes (trust) or dislikes (distrust) this node. When the walker visits a node and turns to negative, he leaves with probability just like you will close an unpleasant website. The steady state probabilities of the walker visiting a node with negative or positive emotion represent the positive score or negative score.

A ranking algorithm referred as SignRank: we propose an iterative algorithm to rank the nodes and prove our algorithm converges.

Experiment: we compare our method with some well-known classical algorithms. It is difficult to show whether our method is better directly, so we apply these methods to a classic problem, i.e., sign prediction problem in signed network. The experiment shows our method has a higher accuracy than other methods.

This paper is organized as follows. Section 2 is about the related works. Section 3 discusses methods for the SignRank in signed networks. Section 4 presents the experiments and Section 5 gives conclusions.

In recent years, signed networks have attracted more and more attention for its ability to specify trust or distrust relationships between nodes. Network modeling and network topology analysis are important foundations for the study of signed networks. BSCL [10] is a signed network generative model which could learn parameters automatically and generate signed networks from a given real network. The generated network keeps some key properties unchanged, especially balance/unbalanced triangle distribution. M. Ludwig et al. [11] proposed a balance theory [12, 13] based evolutionary model, which adds or removes edges to a social network until it reaches a steady state. IB [14] is inspired by balance theory and ant behavior, which considers interactions between individuals could cause edges to be changed. Measuring node relevance on signed networks is becoming an important and attractive issue. Tyler Derr et al. [15] have proposed a series of methods from both local and global perspectives such as signed common neighbors (SCN) and Signed Jaccard Index (SJI).

PageRank [5] and HITS [6] are the most popular ranking algorithms for unsigned network (positive edges only). They focus is on the global topology of social network. Personalized ranking is another kind of ranking research area, which ranks nodes from a specified node’s view, such as Personalized PageRank [16] and Bayesian Personalized Ranking [17].

In order to make traditional methods applicable to negative edges, researchers have proposed some new methods. Modified PageRank [7] separates signed network into positive and negative subgraphs and then computes PageRank score separately. This method ignores much information of global topology. PageTrust [18] is also extended from PageRank. It uses random walk model and considers the walker choose negative edge with a lower probability, so it could rank how trustable nodes are. Another trustable ranking algorithm is TrollTrust [19] which scores the nodes with probability of trustworthiness. Prestige [20] subtracts the number of positive edges by the number of negative edges and normalizes the result. According to these methods, positive or trust score can be counteracted by negative or distrust score. However, positive and negative scores cannot cancel each other out in some case. For example, a famous rock star has a lot of supporters and the same number opponents, but we do not think he has the same influence as an ordinary person. SRWR [9] is a random walk with restart model based ranking method, which could rank nodes from a personal view in the signed network, but it cannot be applied to large-scale networks. SWR [21] is another good random walk based method, which considers the walker will choose negative edge with smaller probability than positive edge.

3. Ranking Method

3.1. Random Walking Model for Signed Network

At first, a signed network is denoted by a weighted graph . and are defined as follows:

We will introduce a novel random walk model for signed network (RWSN for short) and simulate the behavior of users accessing online social network sites. RWSN supposes a walker randomly visits a user’s home page. After that, the walker will visit one of the neighbors of this page.

The walker could have a positive or negative emotion when accessing social networks, and the edges of social networks have positive or negative signs. The reasons why the walker has positive emotion may be the following ones:

The walker trusts the visited node.

The walker agrees with the visited node’s political views.

Some external factors are the reason.

In contrast, the walker gets into negative emotion because of the following:

The walker distrusts the visited node.

The walker disagrees with the visited node’s political views.

Some external factors are the reason.

In our model, if a walker travels through a negative link, he/she will flip his/her sign, whereas the walker will keep the sign unchanged if he/she travels through a positive link. We define such rules according to structure balance theory of sign network [13].

For example, Figure 1 shows a walker named Alice visits an online social network with emotion. At first Alice visits node A with positive emotion; then she has two choices denoted by actions 1 and 2. In the case of action 1, she visits A’s friend B through a positive edge and keeps positive emotion. In action 2, she visits A’s enemy C through a negative edge and turns to negative emotion.

Figure 2 shows another example, in which Alice starts walking with negative emotion, and she has three choices. In action 1, she visits A’s friend B and remains unhappy. In action 2, she visits A’s enemy C and becomes happy. In action 3, Alice is tired of them and leaves; then she will visit a node in the network randomly with random emotion.

We say that is the probability of Alice visiting node with positive emotion at time . In contrast, we use to represent the probability of Alice visiting this node with negative emotion. Therefore, the probability of Alice visiting the node i with positive emotion at time can be calculated as

We say that the subscripts and denote the nodes in the signed network. belongs to the set under the condition that there is a positive link going from to . Similarly, belongs to under the condition that there is a negative link going from to . Here is the probability of Alice accessing the node after accessing the node without taking Alice’s emotion into account, and is the number of the nodes, and is the probability of random jump due to a bad mood of Alice. We name as tiredness probability. is computed aswhere is the number of out-degree of node . The probability of Alice visiting the node with negative emotion at time is as follows:

Figure 3 is an example of trap. It shows that B only treats C as a friend, and vice versa. If Alice visits B with a positive emotion, she will only circulate between B and C forever. So we must solve such trap problem.

We use hopping probability to solve the trap problem. After visiting a node, Alice will jump to a random node with probability no matter what emotion she has. The correction equations for updating and are as follows:

Then we can use an iterative approach to update and until they converge.

3.2. Convergence Proof

In this section, we will prove the convergence of (6) and (7). They can be rewritten as follows:

where is a vector expressed as . In the above equation, P is a probability matrix, and the calculation method of P is provided as follows:

where represents the sum of row in and represents the sum of row in . We can figure out that the sum of each row in is 1.

According to Markoff’s convergence theorem [22], is convergent.

3.3. SignRank Algorithm

We can calculate by adjacency matrix operation according to previous equations. But it will cost a lot of time and memory to process the sparse matrices. So we will introduce a fast ranking algorithm referred to as SignRank. The input of SignRank includes the following: the positive edges set , the negative edges set , tiredness probability , the hopping probability , the max iteration time , and the stop threshold .

First, we initialize and with equal value (line 1-3). Then, during each iteration, we do the following operations.

Each positive edge is accessed, and scores of source node are added to destination node (lines 5-8).

Each negative edge is accessed, and scores of source node are added to destination node. It is worth noting that the positive score of source node is added to the negative score of destination node or vice versa (lines 9-12).

and are updated according to (6) and (7) (lines 13-19).

Finally, we calculate the error tolerance . If is less than , the algorithm is finished.

4. Experimental Results

4.1. Verifying

In the experiment, first we use a simple example to verify the effectiveness of the algorithm; then we compare our algorithm with some other ranking algorithms to prove that SignRank is better.

An example of signed network is shown in Figure 4. There are four nodes in the network, with node A being hostile to nodes B and C, which are also hostile to A. On the other hand, node D is friendly with B and C. Obviously, in the PageRank’s view, the four nodes are the same. The result of SignRank is shown in Figures 5, 6, and 7. In these three figures, the values of tiredness probability are 0.9, 0.5, and 0, respectively. Figures 5 and 6 reveal that the negative rank of node A is significantly higher than its positive rank. It should be noted that node A does not have any positive edge, but its positive rank is not 0, because people who hate B and C may bring a positive rank to A.

In Figure 7 we set , which means that the walker will never be tired and run away. In this case, our SignRank degenerates into a PageRank algorithm. Positive and negative signs will no longer have any influence on the ranking algorithm. As a result, positive scores and negative scores of all the nodes are equal.

4.2. Evaluation Method

It is very difficult to give a direct proof that our ranking algorithm is better than other algorithms, so we adopted an indirect method that has been used by many researchers to prove the superiority of their algorithm [7, 9, 19]. This method is to use the result of ranking algorithm for sign prediction, and the quality of which can be used to evaluate the ranking algorithm. Sign prediction is an important field of the research on signed networks. When there is an edge with an unknown sign in a signed network, we predict the sign through the features of the edge. In order to implement sign prediction, we use ranking score to generate some features [7] for edge . is the abbreviation of reputation, which represents the popularity of nodes in the network. and , respectively, denote the reputation of the two endpoints of edge . is the abbreviation of optimism, which quantifies the pattern of voting a node in the network. and , respectively, represent the optimism of nodes and . After extracting features for each edge, a classification model can be used for sign prediction.

Reputation and optimism can be calculated through the ranking score of nodes.

represents a set of nodes that have a positive edge pointing to , and represents a set of nodes that have a negative edge to .

represents a set of nodes pointed from i through positive edges.

In this paper, we can measure a node with its positive and negative scores, so we can extend rep to and . They can show popularity and unpopularity of the node. Their calculating methods are as follows:

Correspondingly, opt is extended to and .

Therefore, in this paper, we generate eight features denoted by vector v for each edge and then use logistic regression for sign prediction.

4.3. Evaluation Metrics

We choose accuracy, recall, precision, and F1 to evaluate the quality of our method and comparative methods. And their definitions are as follows:(i)Accuracy is the proportion of correctly predicted edges.(ii)Recall is the proportion of correctly predicted edges in actually positive edges.(iii)Precision is the proportion of correctly predicted edges in predicted positive edges.(iv)F1 is the harmonic mean of precision and recall.

4.4. Comparative Methods

To study the performance of our algorithm, we apply it in sign prediction and compare it with ranking algorithms as follows.(i)PageRank [5] is a page scoring algorithm proposed by Google Larry Page. We calculate the PageRank value for each node and consider that .(ii)Hits [6] is an algorithm for analyzing the link topology of a web page. We use the authority value as the score of a node and consider that .(iii)Modified PageRank [7] divides the signed network into two subgraphs, and . There are only positive edges in and only negative edges in . It uses the PageRank algorithm to calculate the node scores for and , respectively.(iv)TrollTrust [19] expresses the positive edge in the signed network as trust and the negative edge as distrust. The calculated represents the reliability of node .

4.5. Experimental Results

In the experiment, we used three signed network datasets described in Table 1. They can be downloaded from http://snap.stanford.edu/data/.

Epinions: Epinions.com is a consumer review website where members present their opinions toward each other, and these opinions can be trusted or distrusted. Epinions records these trust or distrust relationships.

Slashdot: slashdot.org is a technology-related news website where users could tag each other as friend or foe. Slashdot records these friend or foe relationships.

Wiki-RFA: Wikipedia is a free online encyclopedia. If a Wikipedia editor wants to become an administrator, a request for adminship (RfA) must be submitted. Any Wikipedia member may cast a supporting, neutral, or opposing vote. Wiki-RFA records these supporting or opposing relationships.

We execute SignRank and comparison methods (PageRank, Hits, MPR, and TrollTrust) on Slashdot, Epinions, and Wiki-RFA datasets. Then we calculate features according to (17), (18), (19), and (20) for each edge based on node scores. At last, we train classifiers for sign prediction. Our experiments use 10-fold cross-validation and all results are the average of 10 repeated calculations.

Figures 8, 9, and 10 show the performance comparison of five algorithms on Slashdot, Epinions, and Wiki-RFA, respectively. The prediction accuracies of SignRank on three datasets are 91%, 97%, and 90%. It can be observed that our algorithm has better accuracy on all datasets. At the same time, SignRank is also the top performer of prediction precision, which are 93%, 97%, and 90%, respectively. Our recalls are a little lower than the comparison algorithms; however, our f1 scores are better than them. When precision and recall are opposed, f1 score would be the most important measure. Therefore, SignRank performs better in the sign prediction.

5. Conclusions and Summary

This paper presents a novel random walk model for signed network. It simulates the action of visiting online social network websites with emotion. When the visitor feels unhappy, he/she leaves. In this way, our model has a clear semantic interpretation of the ranking score, which is the steady probability of the walker visiting the node with emotion. Furthermore, this paper presents an iterative algorithm described in Algorithm 1 named SignRank to calculate such probabilities for each node. We also apply our method on sign prediction, and the result shows our method performs better than compared methods.

 Input: , , , , ,
 Output: ,
1 ;
2 ;
3 ;
4 while do
5  for each in do
6   ;
7   ;
8  end
9  for each in do
10   ;
11   ;
12  end
13  ;
14  for each in do
15   ;
16  end
17  for each in do
18   ;
19  end
20  ;
21  if then
22   return ;
23  end
24 end
25 return

Data Availability

The datasets (Epinions, Slashdot, and Wiki-RFA) used to support the findings of this study are open access and they can be downloaded from http://snap.stanford.edu/data/.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 61702089 and 61501102 and the Basic Scientific Research Operating Foundation of central universities under Grant N182304021, and the Science and Technology Support Program of Northeastern University at Qinhuangdao (XNK201401).