The Scientific World Journal

Research Article

Simple-Random-Sampling-Based Multiclass Text Classification Algorithm

Algorithm 1

Pseudo-Code of SRSMTC Algorithm.

//SRSMTC: SRS-based MTC Algorithm
//the -grams
Int ;
//the simple random sampling rate
Float ;
//the number of categories
Int ;
//the token level memory
TLM ;
//extract tokens based on overlapping word-level -grams model
String Tokenizer(Document )
//sample tokens based on the simple random sampling rate
String SimpleRandomSampler(String )
//compute conditional probability P() for each category
Float BayesianPredictor(String token)
(1) Float := new Float;
(2) Loop: For Each Int Do:
(2.1) := .TF(, token)/.DF();
(3) Float sum:= Sum(); //add the floats to a sum
(4) Loop: For Each Int Do:
(4.1) := /sum;
(5) Output .
//SRSMTC.T: Training Procedure of SRSMTC
SRSMTC.T(Document ; Category )
(1) Int := D.size; //get the number of training documents
(2) Loop: For Each Int Do:
(2.1) Document := ;
(2.2) Category := ;
(2.3) String := Tokenizer();
(2.4) String := SimpleRandomSampler();
(2.5) .DF():= .DF() + 1;
(2.6) Loop: For Each Do:
(2.6.1) If .contain() Then: .TF():= .TF() + 1;
(2.6.2) Else:
(2.6.2.1) .TF():= 1;
(2.6.2.2) .TF():= 0; // means all other categories
(2.6.2.3) .put().
//SRSMTC.P: Predicting Procedure of SRSMTC
Category SRSMTC.P(Document D)
(1) Int := .size; //get the number of testing documents
(2) Category := new Category;
(3) Loop: For Each Int Do:
(3.1) Document := ;
(3.2) String := Tokenizer();
(3.3) Float ep:= new Float;
(3.4) Loop: For Each String Do:
(3.4.1) Float := BayesianPredictor();
(3.4.2) Loop: For Each Int Do:
(3.4.2.1) ep[]:= ep[] + ;
(3.5) Float sum:= Sum(ep); //add the floats to a sum
(3.6) Loop: For Each Int Do:
(3.6.1) ep[]:= ep[]/sum;
(3.7) Int index:= Math.max(ep).getIndex;
(3.8) := ;
(4) Output C.