A Credit Conflict Detection Model Based on Decision Distance and Probability Matrix

Zhang, Xiaodong; Lv, Congdong; Sun, Zhoubao

doi:https://doi.org/10.1155/2022/3795183

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Works Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Channel Estimation and Sensing in Intelligent Reflecting Surface (IRS)-Assisted Communication Systems

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 3795183 | https://doi.org/10.1155/2022/3795183

A Credit Conflict Detection Model Based on Decision Distance and Probability Matrix

Xiaodong Zhang,¹Congdong Lv,¹and Zhoubao Sun²

Academic Editor: Chunguo Li

Received17 Sept 2021

Accepted25 Nov 2021

Published07 Jan 2022

Abstract

Considering the credit index calculation differences, semantic differences, false data, and other problems between platforms such as Internet finance, e-commerce, and health and elderly care, which lead to the credit deviation from the trusted range of credit subjects and the lack of related information of credit subjects, in this paper, we proposed a crossplatform service credit conflict detection model based on the decision distance to support the migration and application of crossplatform credit information transmission and integration. Firstly, we give a scoring table of influencing factors. Score is the probability of the impact of this factor on credit. Through this probability, the distance matrix between influencing factors is generated. Secondly, the similarity matrix is calculated from the distance matrix. Thirdly, the support vector is calculated through the similarity matrix. Fourth, the credit vector is calculated by the support vector. Finally, the credibility is calculated by the credit vector and probability.

1. Introduction

In recent years, with the development of the Internet, online services in all walks of life came into being. With the advantages of the Internet, users can obtain the desired services through simple processes in various environments, but also, because of the virtualization of the network, fraud is easy to occur. This poses a challenge to the credit evaluation system of each platform. The requirement of accelerating the construction of social credit system is put forward in the 12th Five-Year Plan, which is more clearly explained in the 14th Five-Year Plan. Strengthen the collection, sharing, disclosure, and application of credit information; promote credit products and services that benefit the people and facilitate enterprises; establish a sharing and integration mechanism of public credit information and financial information; cultivate internationally competitive enterprise credit investigation institutions and credit rating institutions; strengthen credit investigation supervision; and promote the healthy development of the credit service market. In the environment where big data technology is widely used, in order to meet the following challenges, each platform organization uses the data collected by the platform to calculate credit indicators and build its own credit evaluation system. However, there are many problems in this process: for example, the collected information cannot fully evaluate and describe the credit indicators, and the information is collected and entered in the process. Errors and deficiencies and the focus on credit and evaluation models are different. There will be differences in the information and evaluation results of the same object on different platforms, and there is no good coordination mechanism. The data are scattered, heterogeneous, and low-quality, which is difficult to be directly applicable to judge the overall credit level of an object. The outline of the plan for the construction of social credit system issued by the State Council (2014-2020) puts forward that “accelerating the construction of credit information system and improving the recording, integration and application of credit information are the basis and premise for the formation of trustworthy incentive and dishonest punishment mechanism.” From this point of view, to solve the data problem in credit evaluation, it is necessary for all platforms to establish a perfect information exchange mechanism, gradually form a credit service network with wide coverage and complete categories, and build an objective, fair, reasonable and balanced international credit rating system model.

The core content of building a crossplatform credit index evaluation model is to fuse multisource heterogeneous credit data, and the information conflict caused by data fusion is the focus of the research: there are attribute differences between the same information and different names between the data attributes of each platform, there are numerical differences between the same attributes of different sources, and due to the data collection methods to sum up, the purpose of conflict detection modelling is to match attributes, solve conflict problems, clean up false data, and obtain data with unified standards, reliable sources, and strong authenticity, so as to achieve high efficiency and authenticity in the construction of subsequent credit models.

In the case of conflicting descriptions of the same attribute in multisource data, pattern matching technology is used to solve problems for different source attributes [1, 2]. Pattern matching technology is divided into two levels—pattern level and instance level. Pattern level method analyses the corresponding relationship between attribute descriptions of different source data, such as attribute name, abbreviation, or attribute storage type [3]. The advantage of similarity analysis with range is simple and intuitive and low time cost, but the amount of attribute description information is small, so it is difficult to use directly [4–6]. It is necessary to build a unified standard for the most effective use. At the instance level, feature extraction and similarity analysis are carried out on data from different sources to obtain the mapping relationship between different attributes. The advantage is that it can reduce the dependence on domain knowledge. However, good data with sufficient scale is needed [7]. In relevant research, pattern-based solutions are considered from three levels: the more intuitive is the language level, and the research mainly considers the semantics of attribute description [8–10]. For example, coma system determines the relationship between various attributes by constructing attribute language association diagram, inputs attribute pairs in use, and returns a measurement between 0 and 1 to confirm the attribute similarity degree; more studies further consider the attribute constraint level and attribute structure level [11]. For example, the similarity flooding algorithm constructs the attribute description and data type in the graph according to the input information and obtains the mapping between attributes through multiple iterative fixed-point calculation [12]. However, the data description information lacks a unified standard, and it is difficult to obtain, identify, and use the data description information which is very few, which is not conducive to wide application. Now, the solution based on case analysis is mostly used to obtain the mapping relationship between attributes through feature extraction and comparative analysis of data from different sources. SMDD method uses neural network technology to find the element set with similar distribution law according to the data instance information and calculate and return the matching results with high similarity [13]. Mehdi et al. analysed the similarity of different types of data which reduces the scale of similarity matrix, and introduce Google similarity distance to the semantic relationship between character type data in the task [14]. Zhou et al. had established a pattern matching system based on Hungarian algorithm to analyse the characteristics of multisource data and obtain the mapping relationship, which has enhanced the generalization ability of the model. In the research, natural language processing technology is also more and more used for similarity analysis [15, 16]. For example, Nozaki et al. used the word2vec tool to compare and analyse the similarity of attribute semantic relationship of strings in the data set, and Northrop et al. used Bert-based semantic similarity calculation to align the knowledge base index [17, 18].

For the detection of data conflict in multisource data fusion, the abnormal points in conflict are regarded as outliers, and the point outlier detection technology is used to detect and process the conflict [19–22]. In the traditional data mining work, outlier detection is carried out by using statistics, clustering, classification, proximity, and other methods [23–29]. These methods are strong, simple, and direct but need to rely on a certain prior knowledge, and processing effects are directly affected by the level of knowledge. Some studies have also applied machine learning to this field to machine learn the attributes of conflict generation and use the result prediction to replace the conflict. The numerical conflict detection method based on outlier detection is to find data objects that are significantly different from other data distributions. The traditional statistical method is through on the premise of known data distribution, the data that does not conform to the distribution is classified as outliers, but it requires difficult-to-obtain a priori knowledge, which is not conducive to work. In practical work, more outlier detection methods based on clustering or classification are used, and common clustering algorithms can be better applied to outlier detection: such as DBSCAN [30, 31] and BIRCH [32, 33]. Clustering algorithms need the authenticity of data sources. Jia et al. used the clustering-based outlier detection algorithm to clean the wrong and missing data in the medical database, showing better performance than the distance-based algorithm. Another idea is the outlier test based on proximity, by calculating the distance or density of data points that determine outliers [34]. Riahi-Madvar et al. use lof (local outlier factor) as a measure; it represents the local outlier degree of the object, determines the -distance neighbourhood to calculate the local reachability density of the object, and obtains the outlier degree of data points [35]. The algorithm using lof for measurement also has many applications [36, 37]. Based on the work, Liu et al., based on lof mining outliers, a local outlier degree measurement method is constructed, which reduces the complexity of the algorithm and avoids misjudgement to a certain extent depending on the local characteristics [38, 39]. Because of not considering the characteristics of the overall data distribution, it is very suitable for anomaly detection of data with different density distributions [40].

2.1. The Credit Conflict Detection Model

In order to effectively apply decision distance measurement and probability distance matrix to deal with multisource conflict evidence, it is necessary to convert multisource conflict evidence into decision distance measurement value and probability distance matrix. On this basis, the similarity matrix, support vector, reliability, and average trust function are obtained. The treatment process is as follows.

Each record of credit data is represented by a company attribute or personal information attribute, such as age. We use a multisource group to represent

For any record , we need to calculate its basic probability:

The algorithm is shown in Algorithm 1, and the judgment conditions are based on Table 1.

Require:
The set of credit data, y;
Ensure: m(y)
1: ifthen
2: ;
3: end if
4: ifthen
5: ;
6: end if
7: ifthen
8: ;
9: end if
10: ifthen
11: ;
12: end if
13: ...
14: return;

After obtaining the basic probability vector, we need to calculate the distance matrix:

is calculated as follows:

The algorithm is shown in Algorithm 2.

Similarity matrix is as follows:

The algorithm is shown in Algorithm 3.

Require:
The set of basic probability, ;
Ensure: D(y)
1: for each item do
2: for each item do
3: ;
4: end for
5: end for
6: returnD(y)

Require:
The distance matrix, ;
Ensure: S(y)
1: for each item do
2: for each item do
3: ;
4: end for
5: end for
6: returnS(y)

The column cells of the similarity matrix are summed to obtain the support vector . The calculation formula is as follows:

The algorithm is shown in Algorithm 4.

Require:
The similarity matrix, S(y);
Ensure: Sup(y)
1: for each item do
2: for each item do
3: ;
4: end for
5: end for
6: returnSup(y)

The credibility vector can be obtained by regularization of support vector. The calculation formula is as follows:

The algorithm is shown in Algorithm 5.

Require:
The support vector, Sup(y);
Ensure: Crd(y)
1: for each item do
2:
3: end for
4: for each item do
5: ;
6: end for
7: returnCrd(y)

The average trust value of evidences is obtained. The calculation formula is as follows:

The algorithm is shown in Algorithm 6.

Require:
The support vector, Crd(y);
The set of basic probability, ;
Ensure: Crd(y)
1: for each item do
2:
3: end for
4: return;

Finally, we have our credit conflict detection model. Given a threshold , if , the record has no conflict. Otherwise, the record has conflict. The algorithm is shown in Algorithm 7.

Require:
The average trust value, ;
Ensure: Conflict
1: ifthen
2: return True;
3: end if
4: returnFalse;

3. Conclusions

In today’s life, a large number of new data are generated every day. In the direction of credit research, the credit evaluation of each credit subject also changes dynamically with the new data. In the future research, conflict detection and processing of dynamic data and improving the timeliness and accuracy of model operation will become the key issues of conflict model construction. Because of the diversity of detection methods, no general model can be applied to all data, so the expansion of the use of new technologies will also be one of the focuses of conflict detection.

In this paper, we analyse cluster, outlier detection, machine learning, and other technologies and design a crossplatform, multilevel, multidimensional, and multigranularity service credit conflict detection model. The model is used to deal with the semantic differences, calculation differences, false data, and other problems of credit indicators among platforms such as Internet finance, e-commerce, and health pension. It is expected to effectively detect and deal with credit information conflicts such as credit deviation and lack of credit subject information.

In the future, on the premise of certain identification samples, artificial intelligence can play a better role. This is a very worthy research content.

Data Availability

The data cannot be fully disclosed for the time being because it contains private data. The data structure and its implementation code have been uploaded to GitHub; please visit https://github.com/juckylv/Credit-data.

Conflicts of Interest

There is no conflict of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the National Key R&D Program (Grant No. 2019YFB1404602) and Natural Science Research Project of Jiangsu Province Universities and Colleges (No. 21KJB520022).

References

M. Lee, “Plastic pollution mitigation - net plastic circularity through a standardized credit system in Asia,” Ocean & Coastal Management, vol. 210, no. 1, article 105733, 2021.
View at: Publisher Site | Google Scholar
R. Iqbal, F. Doctor, B. More, S. Mahmud, and U. Yousuf, “Big data analytics and computational intelligence for cyber-physical systems: recent trends and state of the art applications,” Future Generation Computer Systems, vol. 105, pp. 766–778, 2020.
View at: Publisher Site | Google Scholar
J. Trávníček, J. Janoušek, B. Melichar, and L. Cleophas, “On modification of Boyer-Moore-Horspool's algorithm for tree pattern matching in linearised trees,” Theoretical Computer Science, vol. 830-831, pp. 60–90, 2020.
View at: Publisher Site | Google Scholar
S. Cruz and A. Aguiar, “MagLand: magnetic landmarks for road vehicle localization,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 3654–3667, 2020.
View at: Publisher Site | Google Scholar
G. Ding, S. Sun, and G. Wang, “Schema matching based on SQL statements,” Distributed and Parallel Databases, vol. 38, no. 1, pp. 193–226, 2020.
View at: Publisher Site | Google Scholar
M. C. Freund, J. A. Etzel, and T. S. Braver, “Neural coding of cognitive control: the representational similarity analysis approach[J],” Trends in Cognitive Sciences, vol. 25, no. 7, pp. 115–136, 2021.
View at: Google Scholar
X. Zhang, R. Li, B. Zhang, Y. Yang, J. Guo, and X. Ji, “An instance-based learning recommendation algorithm of imbalance handling methods,” Applied Mathematics and Computation, vol. 351, pp. 204–218, 2019.
View at: Publisher Site | Google Scholar
S. Song, G. Gu, C. Ryu, S. Faro, T. Lecroq, and K. Park, “Fast algorithms for single and multiple pattern Cartesian tree matching,” Theoretical Computer Science, vol. 849, pp. 47–63, 2021.
View at: Publisher Site | Google Scholar
S. Munir, F. Khan, and M. A. Riaz, “An instance-based schema matching between opaque database schemas,” in Proceedings of the 4th International Conference on Engineering Technology and Technopreneuship, pp. 177–182, Kuala Lumpur, 2014.
View at: Google Scholar
H. Zhao and S. Ram, “Combining schema and instance information for integrating heterogeneous data sources,” Data & Knowledge Engineering, vol. 61, no. 2, pp. 281–303, 2007.
View at: Publisher Site | Google Scholar
P. A. Bernstein, J. Madhavan, and E. Rahm, “Generic schema matching, ten years later,” in Proceedings of the 37th International Conference on Very Large Data Bases, pp. 695–701, Seattle, 2011.
View at: Google Scholar
H. Yang, L. Shen, X. Dong, Q. Ding, P. An, and G. Jiang, “Low complexity CTU partition structure decision and fast intra mode decision for versatile video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 1–18, 2019.
View at: Google Scholar
M. Shrestha, T. X. Tran, B. Bhattarai, M. L. Pusey, and R. S. Aygun, “Schema matching and data integration with consistent naming on protein crystallization screens,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 15, no. 39, pp. 1–1, 2019.
View at: Google Scholar
Z. Gao, Y. Wu, M. Harandi, and Y. Jia, “A robust distance measure for similarity-based classification on the SPD manifold,” IEEE Transactions on Neural Networks and Learning Systems, vol. 16, no. 29, pp. 1–15, 2019.
View at: Google Scholar
T. Zhou, M. Chen, and J. Zou, “Reinforcement learning based data fusion method for multi-sensors,” IEEE/CAA Journal of Automatica Sinica, vol. 6, pp. 128–149, 2020.
View at: Google Scholar
F. Y. Rao, J. Cao, E. Bertino, and M. Kantarcioglu, “Hybrid private record linkage,” ACM Transaction on Information and System Security, vol. 22, no. 3, pp. 1–36, 2019.
View at: Publisher Site | Google Scholar
K. Nozaki, T. Hochin, and H. Nomiya, “Semantic schema matching for string attribute with word vectors and its evaluation,” International Journal of Networked and Distributed Computing, vol. 7, pp. 100–106, 2019.
View at: Google Scholar
J. A. Northrop and A. Papandreou-Suppappola, “Computationally efficient estimation of compound K-distribution sea clutter in thermal noise and its application to sea echo reflectivity observations,” IEEE Transactions on Aerospace and Electronic Systems, vol. 56, no. 3, pp. 2340–2350, 2020.
View at: Publisher Site | Google Scholar
J. Li, Z. Zhang, X. Li, and H. Chen, “Kernel-based learning for biomedical relation extraction,” Journal of the American Society for Information Science and Technology, vol. 59, no. 5, pp. 756–769, 2008.
View at: Publisher Site | Google Scholar
W. Liao, B. Zeng, X. Yin, and P. Wei, “An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa,” Applied Intelligence, vol. 4, pp. 1–12, 2020.
View at: Google Scholar
S. Yang and C. Tan, “Detection of conflicts between resource authorization rules in extensible access control markup language based on dynamic description logic,” Ingénierie des Systèmes d'Information, vol. 25, no. 3, pp. 178–201, 2020.
View at: Google Scholar
L. Li, F. Zhu, H. Sun, Y. Hu, Y. Yang, and D. Jin, “Multi-source information fusion and deep-learning-based characteristics measurement for exploring the effects of peer engagement on stock price synchronicity,” Information Fusion, vol. 69, no. 3, pp. 1–21, 2021.
View at: Publisher Site | Google Scholar
L. Ying, L. Junting, and G. Fuxiang, “Adaptive conflict detection algorithm based on Rochester Software Transactional Memory,” Journal of Physics: Conference Series, vol. 1746, no. 1, p. 012050, 2021.
View at: Publisher Site | Google Scholar
Y. Jin, W. Cao, M. Wu, and Y. Yuan, “Simplified outlier detection for improving the robustness of a fuzzy model,” Science China, vol. 63, no. 4, 2020.
View at: Publisher Site | Google Scholar
X. Qin, J. Li, W. Hu, and J. Yang, “Machine learning K-means clustering algorithm for interpolative separable density fitting to accelerate hybrid functional calculations with numerical atomic orbitals,” The Journal of Physical Chemistry A, vol. 124, no. 48, pp. 10066–10074, 2020.
View at: Publisher Site | Google Scholar
H. H. Pajouh, R. Javidan, R. Khayami, A. Dehghantanha, and K. K. R. Choo, “A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks,” IEEE Transactions on Emerging Topics in Computing, vol. 7, no. 2, pp. 314–323, 2019.
View at: Publisher Site | Google Scholar
L. Zhongbao and Z. Wenjuan, “Study on stellar spectral outlier mining based on fuzzy large margin and minimum ball classification model,” Spectroscopy and Spectral Analysis, vol. 36, no. 4, pp. 1245–1248, 2016.
View at: Google Scholar
S. Cai, R. Sun, S. Hao, S. Li, and G. Yuan, “An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining,” China Communications, vol. 16, no. 10, pp. 83–99, 2019.
View at: Publisher Site | Google Scholar
Q. Pu, K. Y. Ng, M. Zhou, and J. Wang, “A joint rogue access point localization and outlier detection scheme leveraging sparse recovery technique,” IEEE Transactions on Vehicular Technology, vol. 70, no. 2, pp. 1866–1877, 2021.
View at: Publisher Site | Google Scholar
L. Zhang, Y. Xie, X. Luan, and X. Zhang, “Multi-source heterogeneous data fusion,” in 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), IEEE, 2018.
View at: Google Scholar
J. Li, I. Tobore, Y. Liu, A. Kandwal, L. Wang, and Z. Nie, “Non-invasive monitoring of three glucose ranges based on ECG by using DBSCAN-CNN,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 9, pp. 3340–3350, 2021.
View at: Publisher Site | Google Scholar
S. Lee, J. Jung, I. Park, K. Park, and D. S. Kim, “A deep learning and similarity-based hierarchical clustering approach for pathological stage prediction of papillary renal cell carcinoma,” Computational and Structural Biotechnology Journal, vol. 18, no. 2, pp. 2639–2646, 2020.
View at: Publisher Site | Google Scholar
L. Yufeng, J. H. Tian, L. Jiyong, L. Xiaozhong, S. Zhiwei, and L. Min, “MR-BIRCH: a scalable MapReduce-based BIRCH clustering algorithm,” Journal of Intelligent & Fuzzy Systems, vol. 40, no. 3, pp. 1432–1451, 2021.
View at: Google Scholar
P. Jia, X. Wang, and K. Zheng, “Distributed clock synchronization based on intelligent clustering in local area industrial IoT systems,” IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp. 3697–3707, 2020.
View at: Google Scholar
M. Riahi-Madvar, A. Akbari Azirani, B. Nasersharif, and B. Raahemi, “A new density-based subspace selection method using mutual information for high dimensional outlier detection,” Knowledge-Based Systems, vol. 216, no. 2, pp. 106733–106733, 2021.
View at: Publisher Site | Google Scholar
O. Alghushairy, R. Alsini, X. Ma, and T. Soule, “A genetic-based incremental local outlier factor algorithm for efficient data stream processing,” in International Conference On Compute And Data Analysis (Iccda 2020), Silicon Valley, CA, USA, 2020.
View at: Google Scholar
M. Yang and D. Ergu, “Anomaly detection of vehicle data based on LOF algorithm,” Frontiers in Signal Processing, vol. 4, no. 1, pp. 678–694, 2020.
View at: Google Scholar
F. Liu, Y. Yu, P. Song, Y. Fan, and X. Tong, “Scalable KDE-based top-n local outlier detection over large-scale data streams,” Knowledge-Based Systems, vol. 204, no. 9, pp. 106186–106186, 2020.
View at: Publisher Site | Google Scholar
J. Jiang, Q. Ma, X. Jiang, and J. Ma, “Ranking list preservation for feature matching,” Pattern Recognition, vol. 111, no. 8, pp. 107665–107665, 2021.
View at: Publisher Site | Google Scholar
L. Chen, W. Wang, and Y. Yang, “CELOF: effective and fast memory efficient local outlier detection in high-dimensional data streams,” Applied Soft Computing, vol. 102, no. 12, pp. 107–129, 2021.
View at: Google Scholar

Copyright

Copyright © 2022 Xiaodong Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

333

Downloads

455

Citations

Wireless Communications and Mobile Computing

Channel Estimation and Sensing in Intelligent Reflecting Surface (IRS)-Assisted Communication Systems

A Credit Conflict Detection Model Based on Decision Distance and Probability Matrix

Abstract

1. Introduction

2. Related Works

2.1. The Credit Conflict Detection Model

3. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright