Research Article

Application of the Variable Precision Rough Sets Model to Estimate the Outlier Probability of Each Element

Table 5

Comparative table of RS-based algorithms.

AdvantagesDisadvantages

Pawlak rough sets algorithm
(i) Shows the computational viability of the Pawlak rough sets-based detection method.
(ii) Linear temporal and spatial lineal complexity regarding the cardinality of the dataset.
(i) DETERMINISTIC classification.
(ii) The user must define the outlier threshold.

VPRS algorithm
(i) Shows the computational viability of the VPRS-based detection method.
(ii) Linear temporal and spatial lineal complexity regarding the cardinality of the dataset.
(iii) NONDETERMINISTIC classification.
(i) The user must define the outlier threshold and the classification error.
(ii) An inadequate selection of the error may lead to unsatisfactory results. Requires sufficient knowledge of specific aspects of the dataset.

FIND_OUTLIER_REGION algorithm
(i) Shows the computational viability of the βμ Method.
(ii) Maintains the nondeterminism of VPRS.
(iii) Any specific result that could be obtained with the Pawlak rough sets and VPRS algorithms can be determined from the result obtained.
(iv) The obtained region allows us to establish a stochastic approach to solving the problem of determining the outlier probability of a given element from a given dataset.
(v) Its use is especially feasible when needing to determine the outlier condition of the elements of the dataset for a given set of threshold values.
(i) Temporal complexity:
in the worst case.
(ii) Spatial complexity:
in the worst case.

βμ_PROB algorithm
(i) Shows the computational viability of the method defined.
(ii) Maintains the nondeterminism of VPRS.
(iii) Has the same advantages as the FIND_OUTLIER _REGION algorithm.
(iv) The user does not need to define the outlier threshold, the allowed classification error, or other criteria, such as distance or density.
(v) No specific knowledge of the dataset is required, such as its distribution.
(vi) The result obtained is more general than that obtained with Pawlak rough sets and VPRS.
(vii) Is valid for datasets with mixed types of attributes (continuous and discrete).
(i) Temporal complexity:
in the worst case.
(ii) Spatial complexity:
in the worst case.