Research Article

A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

Algorithm 1

Clustering forest.
Output: Class label of the testing instance
Input: : The data chunk at each time stamp
    : The number of CTs in CF
    : The maximum number of CTs selected in the clustering forest
    : The current ensemble model
    : Misclassified subset at the -th time stamp
    : Rare-class subset at the -th time stamp.
    : The Rare-class subset as a proportion of the training chunk; : The threshold of
 (1) at the -th time stamp
 (2) built the training set by Algorithm 2
 (2) create a new CT using
 (3) if
 (4)
 (5) Endif
 (6) if
 (7)
 (8) Endif
 (9) compute the accuracy weight of each clustering tree
 (10) UPDATE( ) based on the adaptive selection method
 (11) obtain the misclassified subset and the rare-class subset
 (12) for each testing instance do:
 (13) PREDICT( ) by the voting method.
 (14) Endfor.