Abstract

circRNA is a novel class of noncoding RNA with closed-loop structure. Increasing biological experiments have shown that circRNAs play an important role in many diseases by acting as a miRNA sponge to indirectly regulate the expression of miRNA target genes. Therefore, predicting associations between circRNAs and miRNAs can promote the understanding of pathogenesis of disease. In this paper, we propose a new computational method, NECMA, based on network embedding to predict potential associations between circRNAs and miRNAs. In our method, the Gaussian interaction profile (GIP) kernel similarities of circRNA and miRNA are calculated based on the known circRNA-miRNA associations, respectively. Then, the circRNA-miRNA association network, circRNA GIP kernel similarity network, and miRNA GIP kernel similarity network are utilized to construct the heterogeneous network. Furthermore, the network embedding algorithm is used to extract potential features of circRNA and miRNA from the heterogeneous network, respectively. Finally, the associations between circRNAs and miRNAs are predicted by using neighborhood regularization logic matrix decomposition and inner product. The performance of NECMA is evaluated by using ten-fold cross-validation. The results show that this method has better prediction accuracy than other state-of-the-art methods.

1. Introduction

circRNA is a new group of endogenous noncoding RNA that is highly represented in the mammalian transcriptome [1]. Compared with other noncoding RNAs (such as lncRNAs and miRNAs), circRNAs did not receive extensive attention in the early stage. With the development of high-throughput biological sequencing technology, more and more circRNA molecules have been discovered. Increasing studies have shown that circRNA does not have 5′-terminal cap and 3′-terminal poly (A) tail and can form a closed ring structure with covalent bonds [2]. Like other noncoding RNAs, circRNAs are also widely found in eukaryotes’ brains [3], stomachs [4], and mammary glands [5]. Meanwhile, circRNAs are more stable than other linear noncoding RNAs due to their unique circular structure [6]. In addition, the unique structure of circRNA enables it to regulate gene transcription and expression [7]. For example, ciRs7 can bind related miRNAs and act as a miR-7 sponge to affect miR-7 binding to the target gene [8]. In addition, it has been found that circHIPK3 can sponge miR-124 and inhibit the activity of miR-124 in malignant tumors to achieve the purpose of regulating cell growth [9]. Numerous evidences have shown that miRNAs are closely related to a variety of diseases [10]. For example, miR-145 inhibits colon cancer cell growth by targeting the insulin receptor substrate [11]. Therefore, predicting the potential associations between circRNAs and miRNAs can help biologists to understand complex pathogenesis of disease and further contribute to disease diagnoses.

With the continuous development of high-throughput sequencing technology, a large number of circRNAs have been discovered. Simultaneously, plenty of databases are developed to store circRNA-related information such as circBase [12], circR2Disease [13], circRNADisease [14], and circ2Disease [15]. circBase is an online database that provides users with a variety of basic circRNA information such as circRNA ID, sequence, gene description, and location [12]. circR2Disease is a public database that stores experimentally verified circRNA-related disease information. The database contains 793 circRNA-disease associations including 661 circRNAs and 100 diseases [13]. The circRNADisease database contains 354 circRNA-disease interactions, 330 circRNAs, and 48 diseases [14]. Similar to circR2Disease and circRNADisease, the circ2Disease database is used to store a vitro-proved circRNA-disease association database through which users can obtain circRNA-disease associations and the associations between miRNAs and its targets [15]. These databases enable users to identify potential associations between circRNA and miRNA by using computational methods.

Compared with traditional biological experiment methods, the circRNA-miRNA association prediction based on computational methods can maintain high accuracy and be less time-consuming. Therefore, more and more attention has been paid to circRNA-miRNA association prediction based on computational methods. At present, a large number of computational prediction models have been applied in many fields of biology, for example, predicting associations between diseases and genes, miRNA-disease associations [16, 17], circRNA-disease associations [18, 19], lncRNA-disease associations [20, 21], protein function [22, 23], drug-target interactions [24, 25], and lncRNA-miRNA associations [26, 27]. Compared with other fields, there are few prediction models based on the computational method in the circRNA-miRNA association prediction. Therefore, it is urgent to develop an effective computational method to infer circRNA-miRNA associations.

For the above purposes, in this study, we propose a new computational algorithm based on network embedding, NECMA, to predict circRNA-miRNA association. In our method, the circRNA-miRNA network is constructed based on experimental verified circRNA-miRNA associations. Then, based on the circRNA-miRNA associations, the GIP kernel similarities of circRNA and miRNA are calculated by using the Gaussian interaction profile kernel similarity, respectively. Furthermore, the circRNA GIP kernel similarity network, miRNA GIP kernel similarity network, and circRNA-miRNA association network are integrated to construct the circRNA-miRNA heterogeneous network. In addition, the network embedding model is employed to learn the features of circRNA and miRNA based on the circRNA-miRNA heterogeneous network, respectively. Finally, the weighted neighborhood regularized logistic matrix factorization and inner product are combined to predict potential circRNA-miRNA associations. The ten-fold cross-validation is used to evaluate the performance of our method. The experimental results show that NECMA achieves better performance than other state-of-the-art methods. In addition, the case study shows that NECMA could effectively infer potential circRNA-miRNA associations which are confirmed by the latest literature.

Numerous experiments have shown that circRNA and miRNA have a close association with diseases. The current circRNA-disease association prediction algorithms are divided into the following categories. (1) Network-based circRNA-disease association prediction method: Fan et al. [28] used known circRNA-disease associations, circRNA expression profile similarities, and disease phenotype similarities to construct the circRNA-disease heterogeneous network and then used KATZ to predict potential associations between circRNAs and diseases. Li et al. [29] integrated known circRNA-disease associations, circRNA functional similarities, and disease semantic similarities and utilized network-consistent projections to identify potential circRNA-disease associations. Zhao et al. [30] developed an ensemble learning algorithm to predict the potential association between circRNA and diseases. In this method, the circRNA-disease heterogeneous network is constructed from known circRNA-disease association network, circRNA similarity network, and disease similarity network and circRNA-disease association is predicted by using KATZ and bipartite network projections. Lei and Bian [31] used random walk with restart and KNN algorithms to identify potential associations between circRNAs and diseases based on known circRNA-disease associations, circRNA similarities, and disease similarities. Li et al. [32] predicted potential circRNA-disease associations based on known circRNA-disease association networks, circRNA similarity, and disease similarity by using inductive matrix completion. Wei and Liu [18] reconstructed the circRNA-disease association network using circRNA similarities and disease similarities and then used nonnegative matrix factorization to predict potential associations. (2) Machine learning-based circRNA-disease association prediction method: Lei and Fang [33] fused circRNA expression profile similarity network, circRNA sequence similarity network, and circRNA functional annotation similarity network to construct the circRNA similarity network. The disease similarity network is constructed by integrating the disease functional similarity network and the disease semantic similarity network. Finally, based on the known circRNA-disease association network, circRNA similarity network, and disease similarity network, the potential feature of circRNA and diseases were extracted, respectively, and then the gradient boosting decision tree algorithm was used to predict the potential circRNA-disease associations.

Similarly, miRNA-disease association prediction algorithms can also be classified into the similar categories. Peng et al. [34] developed a miRNA-disease association prediction model (ThrWRDE) that performs a restart random walk algorithm on a variety of miRNA-related biological data and then integrates the results obtained from multiple restart random walk models. You et al. [35] integrated known miRNA-disease associations, miRNA similarity, and disease similarity to construct a miRNA-disease heterogeneous network and then used a depth-first search algorithm to count the path between miRNA and diseases. Finally, the different pathways between miRNA and diseases are integrated to obtain the predicted association score between miRNA and diseases. Chen et al. [36] constructed miRNA similarity and disease similarity by integrating miRNA functional similarity and miRNA Gaussian interaction profile kernel similarity and disease Gaussian interaction profile kernel similarity and disease semantic similarity, respectively. Finally, the inductive matrix completion is used to obtain the final predicted miRNA-disease association. Chen et al. [37] extracted the potential representations of miRNA and disease, respectively, using a stacked autoencoder and then obtained the predicted score of miRNA-disease association by using support vector machine (SVM).

3. Materials and Methods

3.1. Materials

The circRNA-miRNA associations are downloaded from circR2Cancer database (http://www.biobdlab.cn:8000/). The circR2Cancer database [38] is a manually curated database which contains not only circRNA-cancer association data but also circRNA-miRNA association data and miRNA-cancer association data. After removing redundant data, 130 circRNAs, 412 miRNAs, and 477 associations are extracted in final. Furthermore, the adjacent matrix is constructed to represent circRNA-miRNA association, where m represents the number of circRNAs and n represents the number of miRNAs. The value of element is equal to 1 when circRNA is related to miRNA , otherwise 0.

3.2. circRNA and miRNA Similarity Calculation

In this study, the Gaussian interaction profile (GIP) kernel similarity is used to calculate similarities of circRNA and miRNA. Based on the assumption that circRNAs with similar functions are often associated with similar miRNAs, circRNA GIP kernel similarity and miRNA GIP kernel similarity are calculated based on the circRNA-miRNA interaction network, respectively. For pairwise circRNAs and , the GIP kernel similarity between circRNAs and is defined as follows:where represents the row in the matrix and represents the kernel bandwidth, which is defined as follows:where represents the number of rows in matrix CM.

Similarly, the miRNA GIP kernel similarity between miRNA and miRNA is defined as follows:where represents column of matrix and represents the kernel bandwidth, which is defined as follows:where represents the number of columns in the interaction matrix CM.

3.3. Construction of Heterogeneous Network

The heterogeneous network used for circRNA-miRNA association prediction is composed of three subnetworks including circRNA-miRNA interaction network, circRNA GIP kernel similarity network, and miRNA GIP kernel similarity network. Based on the above three subnetworks, the heterogeneous network is constructed as follows:where represents the circRNA-miRNA interaction network, represents the transpose of the circRNA-miRNA interaction network, represents the circRNA GIP kernel similarity network, and represents the miRNA GIP kernel similarity network.

3.4. The Feature Extraction Based on NetMF

After obtaining the heterogeneous network H, the network embedding as matrix factorization (NetMF) algorithm [39] is used to extract the potential features of circRNA and miRNA on the heterogeneous network, respectively. NetMF is a matrix factorization framework based on the original DeepWalk algorithm. To be specific, the NetMF model is the matrix factorization form of DeepWalk algorithm derived from the implicit decomposition model of the skip-gram with negative-sampling model (SGNS) [40, 41]. It can reduce the noise information in the matrix H and improve the performance of the prediction model. The NetMF model is defined as the probability distribution of truncated random walk, which is calculated aswhere denotes a diagonal matrix and the elements in represent the generalized degree of nodes in the matrix .

Then, we conducted times random walk on the heterogeneous network according to the probability distribution calculated before. It is used to sample the nodes in the heterogeneous network to obtain the transfer matrix which is defined as follows:

After obtaining the transition matrix , the DeepWalk matrix is obtained aswhere represents the dimension of heterogeneous network H and represents the number of negative samples.

Since the density of increases the time complexity of subsequent calculations, the approximate matrix is defined as follows:

After obtaining the matrix of the circRNA-miRNA heterogeneous network H, the low-dimensional space feature vectors of circRNA and miRNA are obtained by using the singular value decomposition (SVD) model [4244], which is defined as follows:where d represents the dimension of a low-dimensional space.

Finally, the eigenmatrix is calculated:

The dimension of is and . This matrix is composed of circRNA feature vectors u and miRNA feature vectors , in which the dimensions are m and n, respectively.

3.5. circRNA-miRNA Association Prediction

The potential eigenvectors of circRNA and miRNA are obtained by NetMF on heterogeneous network . Then, the weighted neighborhood regularized logistic matrix factorization [45] and inner product are utilized to reconstruct the circRNA-miRNA association matrix.

The weighted neighborhood regularized logistic matrix factorization is defined as follows:

The inner product is defined as follows:where represents the feature vector of circRNA and represents the feature vector of miRNA . represents the weight coefficient to balance the influence of two feature vectors on the reconstructed matrix.

Finally, the score of circRNA-miRNA association matrix is defined as follows:where denotes the predicted score between circRNA and miRNA .

The flowchart of NECMA is shown in Figure 1. It mainly contains the following steps: first, the Gaussian interaction profile kernel similarity is utilized to calculate circRNA similarity and miRNA similarity based on the known circRNA-miRNA associations, respectively. Then, the heterogeneous network H is constructed based on the circRNA-miRNA association network, miRNA similarity network, and circRNA similarity network. Furthermore, the NetMF is used to extract the low-dimensional features of circRNA and miRNA on heterogeneous network H, respectively. Finally, the weighted neighborhood regularized logistic matrix factorization and the inner product are utilized to reconstruct the circRNA-miRNA association matrix based on the circRNA feature vector and miRNA feature vector (Algorithm 1).

Input: circRNA-miRNA association matrix CM
Output: Predicted association matrix Pre
(1)Calculate the circRNA GIP kernel similarity CS based on known circRNA-miRNA associations
(2)Calculate the miRNA GIP kernel similarity MS based on known circRNA-miRNA associations
(3)Construct a heterogeneous network H based on CM, CS, and MS
(4)Calculate generalized degree matrix D based on heterogeneous network H
(5)Calculate the probability of truncated random walk
(6)Calculate the transition matrix
(7)Calculate
(8)Calculate
(9)Dimensionality reduction through SVD
(10)Calculate circRNA-miRNA feature matrix
(11)Extract circRNA feature u and miRNA feature based on eigen_matrix
(12)Calculate
(13)Calculate
(14)Return: Predicted association matrix:

4. Result

4.1. Ten-Fold Cross-Validation

In order to evaluate the performance of NECMA, we conduct the ten-fold cross-validation in the experiment. In the ten-fold cross-validation, the known circRNA-miRNA associations are randomly divided into ten subsets. Then, in each round of cross-validation experiment, one set is selected as the test samples and the other nine sets are treated as the training samples which are used in model training. The final score of circRNA-miRNA association is predicted by using the model. The higher the score of the association, the higher the probability of circRNA-miRNA interaction. Then, we rearranged the score of circRNA-miRNA association in descending order. Furthermore, the true positive rate (TPR) and false positive rate (FPR) are calculated by altering the threshold. The TPR and FPR are defined as follows:where TP and FP represent true positive and false positive, respectively, and TN and FN represent true negative and false negative, respectively. Finally, the receiver operating characteristics (ROC) curve is plotted based on TPR and FPR, and the area under ROC curve (AUROC) is calculated to evaluate the predictive power of the model. The higher the AUROC value, the better the performance of the model.

Similarly, the area under precision-recall (AUPR) curve based on precision and recall is used to evaluate the performance of prediction model. The precision and recall are defined as follows:where precision represents the proportion of positive examples in the predicted results to the actual positive examples and recall represents the proportion of all true positive cases divided into positive cases, which measures the classifier’s ability to recognize positive cases.

In addition, in order to demonstrate the superiority of NECMA in predicting the potential association of circRNA-miRNA. We compare NECMA with three state-of-the-art algorithms including RWRLncD [46], NCPLDA [47], and LRLSLDA [48]. Figures 2 and 3 show the AUROC and AUPR values obtained by different prediction models in ten-fold cross-validation, respectively. The results of ten-fold cross-validation show that the AUROC and AUPR of the NECMA are better than other three prediction algorithms. It can be found from Figure 2 that the AUROC value of NECMA is 0.8264 which is higher than RWRLncD (0.5243), NCPLDA (0.6985), and LRLSLDA (0.7661). Simultaneously, it can be observed from Figure 3 that the AUPR value of NECMA is 0.0048 which is higher than RWRLncD (0.0016), NCPLDA (0.0011), and LRLSLDA (0.0026). The overall results of ten-fold cross-validation are shown in Table 1. It can be concluded that NECMA is an effective method in identifying association between circRNA and miRNA.

4.2. Effect of Parameters

There are three parameters contained in the NetMF model (context window , negative sampling number b, and embedded dimension d). According to the previous study [39], both the context window and the negative sampling number b are set to 1. To test the effect of embedded dimension d, we set embedded dimension d ranging from 8 to 128. The result is shown in Figure 4. It can be found that the AUROC value of NECMA is the highest when the embedded dimension d of the NetMF is set to 8. In addition, we also test the effect of parameter weight coefficient α in neighborhood regularization logistic matrix factorization which is used to balance the influence of two eigenvectors in the process of matrix reconstruction. The parameter α ranges from 0.1 to 0.9 with 0.1 increasing in each time. The influence of parameter α on the prediction performance is shown in Figure 5. It can be observed that the AUROC value obtained is the highest when α = 0.6.

4.3. Case Study

To further illustrate the ability of NECMA to predict potential circRNA-miRNA associations, we conduct a case study on miR-130a-3p. We select the top 10 circRNAs predicted by NECMA and prove these associations by manually retrieving related databases and literature.

The numerous experiments have shown that miR-130a-3p is associated with proliferation and migration of many cancer cells [49]. For example, miR-130a-3p can regulate its target Smad4 to inhibit migration and invasion of gemcitabine-resistant (GR) hepatocellular carcinoma (HCC) cells [50]. Therefore, the correct prediction of the circRNAs associated with miR-130a-3p is useful for understanding complex disease mechanisms. The top 10 predicted circRNAs of miR-130a-3p are shown in Table 2. The results showed that nine circRNAs (hsa_circ_0068942, hsa_circ_0089378, hsa_circ_0083357, hsa_circ_0006323, hsa_circ_0032970, hsa_circ_0051172, hsa_circ_0054537, hsa_circ_0057576, and hsa_circ_0082824) have been confirmed in the literature. It has been confirmed that the moderately upregulated hsa_circ_0068942 ranked at top 1 can serve as miR-130a-3p sponge and the disease marker for coronary artery disease (CAD) [51]. It has been demonstrated that hsa_circ_0089378 ranked at top 2 can act as the sponge for miR-130a-3p to affect its target mRNA expression in coronary artery disease [52]. It has been demonstrated that hsa_circ_0083357 ranked at top 3 can play an important role in coronary artery disease through miR-130a-3p-mediated circRNA-mRNA-competitive endogenous RNA (ceRNA) networks [53]. It has been found that hsa_circ_0006323 ranked at top 4 can inhibit the expression of miR-130a-3p in coronary artery disease cells [54]. It has been demonstrated that hsa_circ_0032970 ranked at top 5 can bind to miR-130a-3p binding sites in coronary artery disease cells [55]. It has been demonstrated that hsa_circ_0051172 ranked at top 6 can regulate the expression of TRPM3 by targeting miR-130a-3p in coronary artery disease [56]. In addition, it has been confirmed that differential expression of hsa_circ_0054537 ranked at top 7 and hsa_circ_0057576 ranked at top 8 can not only inhibit miR-130a-3p but also lead to upregulation of TRPM3 [55]. It has been discovered that hsa_circ_0082824 ranked at top 9 can promote the expression of TRPM3 in target cells in coronary artery disease by inhibiting miR-130a-3p [54].

5. Conclusion

Accumulating experiments have shown that predicting associations between circRNAs and miRNAs not only helps to understand complex disease mechanisms but also contributes to prevent and diagnose diseases [57]. In this study, we propose a computational method, NECMA, to infer circRNA-miRNA associations. In this model, we first construct the circRNA-miRNA association matrix based on known circRNA-miRNA associations. Then, the Gaussian interaction profile kernel similarity is used to calculate circRNA similarity and miRNA similarity based on known circRNA-miRNA associations, respectively. Furthermore, the heterogeneous network is constructed based on three subnetworks (circRNA-miRNA association network, circRNA similarity network, and miRNA similarity network). In addition, the NetMF is employed to extract the subspace features of circRNA and miRNA from the heterogeneous network, respectively. Finally, the scores of circRNA-miRNA associations are predicted by using weighted neighborhood regularized logistic matrix factorization and inner product. In order to show the performance of NECMA, we compare NECMA with three state-of-the-art methods (RWRLncD, NCPLDA, and LRLSLDA) in terms of ten-fold cross-validation. The experimental results show that the NECMA achieves a higher AUROC value (0.8264) than the other three prediction models. In addition, it is demonstrated that NECMA could correctly identify potential associations between circRNA and miRNAs by constructing a case study on miR-130a-3p.

Although the NECMA model can effectively predict the potential circRNA-miRNA association, there are still many limitations. First, the NECMA model mainly relies on known circRNA-miRNA association data and the imbalance of positive and negative samples will greatly affect the prediction accuracy of the model. Second, the setting of parameters will also affect the prediction results of the model. In addition, the integration of various circRNA and miRNA information can further improve the predictive power of the model [5860]. Moreover, the NECMA model cannot predict new circRNA-miRNA without any known association. Therefore, we will integrate more biological data of circRNA and miRNA in the future, which will make it more reliable [6163].

Data Availability

The underlying data supporting the results of our study can be found at http://www.biobdlab.cn:8000/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Nos. 62072124, 61963004, and 61972185), the Natural Science Foundation of Guangxi (Nos. 2021GXNSFAA075041 and 2018GXNSFBA281193), the Science and Technology Base and Talent Special Project of Guangxi (No. AD20159044), the Natural Science Foundation of Yunnan Province of China (No. 2019FA024), the Hunan Provincial Science and Technology Program (No. 2018WK4001), and the Scientific Research Foundation of Hunan Provincial Education Department (No. 18B469).