Abstract

Considering the credit index calculation differences, semantic differences, false data, and other problems between platforms such as Internet finance, e-commerce, and health and elderly care, which lead to the credit deviation from the trusted range of credit subjects and the lack of related information of credit subjects, in this paper, we proposed a crossplatform service credit conflict detection model based on the decision distance to support the migration and application of crossplatform credit information transmission and integration. Firstly, we give a scoring table of influencing factors. Score is the probability of the impact of this factor on credit. Through this probability, the distance matrix between influencing factors is generated. Secondly, the similarity matrix is calculated from the distance matrix. Thirdly, the support vector is calculated through the similarity matrix. Fourth, the credit vector is calculated by the support vector. Finally, the credibility is calculated by the credit vector and probability.

1. Introduction

In recent years, with the development of the Internet, online services in all walks of life came into being. With the advantages of the Internet, users can obtain the desired services through simple processes in various environments, but also, because of the virtualization of the network, fraud is easy to occur. This poses a challenge to the credit evaluation system of each platform. The requirement of accelerating the construction of social credit system is put forward in the 12th Five-Year Plan, which is more clearly explained in the 14th Five-Year Plan. Strengthen the collection, sharing, disclosure, and application of credit information; promote credit products and services that benefit the people and facilitate enterprises; establish a sharing and integration mechanism of public credit information and financial information; cultivate internationally competitive enterprise credit investigation institutions and credit rating institutions; strengthen credit investigation supervision; and promote the healthy development of the credit service market. In the environment where big data technology is widely used, in order to meet the following challenges, each platform organization uses the data collected by the platform to calculate credit indicators and build its own credit evaluation system. However, there are many problems in this process: for example, the collected information cannot fully evaluate and describe the credit indicators, and the information is collected and entered in the process. Errors and deficiencies and the focus on credit and evaluation models are different. There will be differences in the information and evaluation results of the same object on different platforms, and there is no good coordination mechanism. The data are scattered, heterogeneous, and low-quality, which is difficult to be directly applicable to judge the overall credit level of an object. The outline of the plan for the construction of social credit system issued by the State Council (2014-2020) puts forward that “accelerating the construction of credit information system and improving the recording, integration and application of credit information are the basis and premise for the formation of trustworthy incentive and dishonest punishment mechanism.” From this point of view, to solve the data problem in credit evaluation, it is necessary for all platforms to establish a perfect information exchange mechanism, gradually form a credit service network with wide coverage and complete categories, and build an objective, fair, reasonable and balanced international credit rating system model.

The core content of building a crossplatform credit index evaluation model is to fuse multisource heterogeneous credit data, and the information conflict caused by data fusion is the focus of the research: there are attribute differences between the same information and different names between the data attributes of each platform, there are numerical differences between the same attributes of different sources, and due to the data collection methods to sum up, the purpose of conflict detection modelling is to match attributes, solve conflict problems, clean up false data, and obtain data with unified standards, reliable sources, and strong authenticity, so as to achieve high efficiency and authenticity in the construction of subsequent credit models.

In the case of conflicting descriptions of the same attribute in multisource data, pattern matching technology is used to solve problems for different source attributes [1, 2]. Pattern matching technology is divided into two levels—pattern level and instance level. Pattern level method analyses the corresponding relationship between attribute descriptions of different source data, such as attribute name, abbreviation, or attribute storage type [3]. The advantage of similarity analysis with range is simple and intuitive and low time cost, but the amount of attribute description information is small, so it is difficult to use directly [46]. It is necessary to build a unified standard for the most effective use. At the instance level, feature extraction and similarity analysis are carried out on data from different sources to obtain the mapping relationship between different attributes. The advantage is that it can reduce the dependence on domain knowledge. However, good data with sufficient scale is needed [7]. In relevant research, pattern-based solutions are considered from three levels: the more intuitive is the language level, and the research mainly considers the semantics of attribute description [810]. For example, coma system determines the relationship between various attributes by constructing attribute language association diagram, inputs attribute pairs in use, and returns a measurement between 0 and 1 to confirm the attribute similarity degree; more studies further consider the attribute constraint level and attribute structure level [11]. For example, the similarity flooding algorithm constructs the attribute description and data type in the graph according to the input information and obtains the mapping between attributes through multiple iterative fixed-point calculation [12]. However, the data description information lacks a unified standard, and it is difficult to obtain, identify, and use the data description information which is very few, which is not conducive to wide application. Now, the solution based on case analysis is mostly used to obtain the mapping relationship between attributes through feature extraction and comparative analysis of data from different sources. SMDD method uses neural network technology to find the element set with similar distribution law according to the data instance information and calculate and return the matching results with high similarity [13]. Mehdi et al. analysed the similarity of different types of data which reduces the scale of similarity matrix, and introduce Google similarity distance to the semantic relationship between character type data in the task [14]. Zhou et al. had established a pattern matching system based on Hungarian algorithm to analyse the characteristics of multisource data and obtain the mapping relationship, which has enhanced the generalization ability of the model. In the research, natural language processing technology is also more and more used for similarity analysis [15, 16]. For example, Nozaki et al. used the word2vec tool to compare and analyse the similarity of attribute semantic relationship of strings in the data set, and Northrop et al. used Bert-based semantic similarity calculation to align the knowledge base index [17, 18].

For the detection of data conflict in multisource data fusion, the abnormal points in conflict are regarded as outliers, and the point outlier detection technology is used to detect and process the conflict [1922]. In the traditional data mining work, outlier detection is carried out by using statistics, clustering, classification, proximity, and other methods [2329]. These methods are strong, simple, and direct but need to rely on a certain prior knowledge, and processing effects are directly affected by the level of knowledge. Some studies have also applied machine learning to this field to machine learn the attributes of conflict generation and use the result prediction to replace the conflict. The numerical conflict detection method based on outlier detection is to find data objects that are significantly different from other data distributions. The traditional statistical method is through on the premise of known data distribution, the data that does not conform to the distribution is classified as outliers, but it requires difficult-to-obtain a priori knowledge, which is not conducive to work. In practical work, more outlier detection methods based on clustering or classification are used, and common clustering algorithms can be better applied to outlier detection: such as DBSCAN [30, 31] and BIRCH [32, 33]. Clustering algorithms need the authenticity of data sources. Jia et al. used the clustering-based outlier detection algorithm to clean the wrong and missing data in the medical database, showing better performance than the distance-based algorithm. Another idea is the outlier test based on proximity, by calculating the distance or density of data points that determine outliers [34]. Riahi-Madvar et al. use lof (local outlier factor) as a measure; it represents the local outlier degree of the object, determines the -distance neighbourhood to calculate the local reachability density of the object, and obtains the outlier degree of data points [35]. The algorithm using lof for measurement also has many applications [36, 37]. Based on the work, Liu et al., based on lof mining outliers, a local outlier degree measurement method is constructed, which reduces the complexity of the algorithm and avoids misjudgement to a certain extent depending on the local characteristics [38, 39]. Because of not considering the characteristics of the overall data distribution, it is very suitable for anomaly detection of data with different density distributions [40].

2.1. The Credit Conflict Detection Model

In order to effectively apply decision distance measurement and probability distance matrix to deal with multisource conflict evidence, it is necessary to convert multisource conflict evidence into decision distance measurement value and probability distance matrix. On this basis, the similarity matrix, support vector, reliability, and average trust function are obtained. The treatment process is as follows.

Each record of credit data is represented by a company attribute or personal information attribute, such as age. We use a multisource group to represent

For any record , we need to calculate its basic probability:

The algorithm is shown in Algorithm 1, and the judgment conditions are based on Table 1.

Require:
 The set of credit data, y;
Ensure: m(y)
1: ifthen
2: ;
3: end if
4: ifthen
5: ;
6: end if
7: ifthen
8: ;
9: end if
10: ifthen
11: ;
12: end if
13: ...
14: return;

After obtaining the basic probability vector, we need to calculate the distance matrix:

is calculated as follows:

The algorithm is shown in Algorithm 2.

Similarity matrix is as follows:

The algorithm is shown in Algorithm 3.

Require:
 The set of basic probability, ;
Ensure: D(y)
1: for each item do
2: for each item do
3:  ;
4: end for
5: end for
6: returnD(y)
Require:
 The distance matrix, ;
Ensure: S(y)
1: for each item do
2: for each item do
3:  ;
4: end for
5: end for
6: returnS(y)

The column cells of the similarity matrix are summed to obtain the support vector . The calculation formula is as follows:

The algorithm is shown in Algorithm 4.

Require:
 The similarity matrix, S(y);
Ensure: Sup(y)
1: for each item do
2: for each item do
3:  ;
4: end for
5: end for
6: returnSup(y)

The credibility vector can be obtained by regularization of support vector. The calculation formula is as follows:

The algorithm is shown in Algorithm 5.

Require:
 The support vector, Sup(y);
Ensure: Crd(y)
1: for each item do
2: 
3: end for
4: for each item do
5: ;
6: end for
7: returnCrd(y)

The average trust value of evidences is obtained. The calculation formula is as follows:

The algorithm is shown in Algorithm 6.

Require:
 The support vector, Crd(y);
 The set of basic probability, ;
Ensure: Crd(y)
1: for each item do
2: 
3: end for
4: return;

Finally, we have our credit conflict detection model. Given a threshold , if , the record has no conflict. Otherwise, the record has conflict. The algorithm is shown in Algorithm 7.

Require:
 The average trust value, ;
Ensure: Conflict
1: ifthen
2: return True;
3: end if
4: returnFalse;

3. Conclusions

In today’s life, a large number of new data are generated every day. In the direction of credit research, the credit evaluation of each credit subject also changes dynamically with the new data. In the future research, conflict detection and processing of dynamic data and improving the timeliness and accuracy of model operation will become the key issues of conflict model construction. Because of the diversity of detection methods, no general model can be applied to all data, so the expansion of the use of new technologies will also be one of the focuses of conflict detection.

In this paper, we analyse cluster, outlier detection, machine learning, and other technologies and design a crossplatform, multilevel, multidimensional, and multigranularity service credit conflict detection model. The model is used to deal with the semantic differences, calculation differences, false data, and other problems of credit indicators among platforms such as Internet finance, e-commerce, and health pension. It is expected to effectively detect and deal with credit information conflicts such as credit deviation and lack of credit subject information.

In the future, on the premise of certain identification samples, artificial intelligence can play a better role. This is a very worthy research content.

Data Availability

The data cannot be fully disclosed for the time being because it contains private data. The data structure and its implementation code have been uploaded to GitHub; please visit https://github.com/juckylv/Credit-data.

Conflicts of Interest

There is no conflict of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the National Key R&D Program (Grant No. 2019YFB1404602) and Natural Science Research Project of Jiangsu Province Universities and Colleges (No. 21KJB520022).