Research Article

Probing for Sparse and Fast Variable Selection with Model-Based Boosting

Table 1

Total number of selected variables and intersection size for four variable selection techniques (boosting with 25-fold bootstrap, probing, stability selection, and the lasso with 10-fold cross-validation) on three gene expression data sets. The last column compares algorithm runtime in seconds.

Cross-validationProbingStability selectionLasso (glmnet)Runtime (sec.)

Colon cancer
Cross-validation910.52
Probing551.78
Stability selection33349.4
Lasso (glmnet)75370.4

Breast carcinoma
Cross-validation3224
Probing14144.39
Stability selection111102.28
Lasso (glmnet)14141141.13

Riboflavin production
Cross-validation5014.2
Probing10106.89
Stability selection55566.46
Lasso (glmnet)2374300.68