Complexity

Research Article

Application of the Variable Precision Rough Sets Model to Estimate the Outlier Probability of Each Element

Table 4

Characteristics of the RS-based methods compared to the limitations of conventional methods.

Comparison to STATISTICAL and DISTANCE-BASED METHODS

(i) Applicability to datasets with a mixture of continuous and discrete attributes. Equivalence relationships are a natural way to discretise continuous data.

(ii) Neither knowing the data distribution nor establishing data distance criteria is required.

(iii) Specifically, for , the quadratic temporal complexity problem of most distance-based methods is solved.

(iv) The dimensionality and dataset size do not limit the execution of the algorithms.

Comparison to DENSITY- and DEPTH-BASED METHODS

(i) There is no need to establish data density criteria in the dataset.

(ii) The dimensionality of the dataset does not limit the execution of the algorithm.

(iii) No time-consuming calculations are necessary, including calculating the convex wrap, which is required in most depth-based methods.

(iv) FIND_OUTLIER_REGION and βμ_PROB provide unsupervised results without requiring the user to preset, before running the algorithm, the value of specific analysis parameters, which is necessary in density-based methods, such as DBSCAN.

(v) Pawlak rough sets and VPRS improve the temporal complexity compared to depth-based methods.

Comparison to METHODS BASED ON NEURAL NETWORKS

(i) No time-consuming processes must be previously established, for example, network training, required in some neuronal network models to ensure their learning.

(ii) The dimensionality of the dataset does not limit the execution of the algorithms.

(iii) The functionality of the algorithms does not depend on data density criteria, in contrast to some supervised models.

(iv) There is no need to model the data distribution, in contrast to some supervised models.

(v) Some approaches based on supervised networks establish the use of thresholds for various purposes in the outlier detection process. This is solved in the concept of the FIND_OUTLIER_REGION and βμ_PROB algorithms.

Comparison to GENERAL OUTLIER DETECTION METHODS

(i) In contrast to most detection methods, which require successive executions of the algorithm until obtaining the set of outliers that actually meets the analysis criteria, β. PROB algorithm performs the single-run, unsupervised determination of the outlier probability of each element form a specific universe of data.