Mathematical Problems in Engineering

Research Article

Instructor Activity Recognition through Deep Spatiotemporal Features and Feedforward Extreme Learning Machines

Table 7

Performance comparison of the proposed approach with state-of-the-art techniques.


Dataset	Validation scheme	Method	Accuracy

IAVID-1	Splits (70-30)	Proposed technique	81.43%
		C3D features with SVM classifier[17]	48.77%
		C3D features with CNN[17]	40.0%
		HOG representation of MHI with nearest neighbor classifier[20]	63.5%
		HOG and LBP representation of MHI with SVM classifier[31]	55%
		Harris 3D and HOG 3D with BOE[32]	26.67%
		Harris 3D, HOG/ HOF, BoF with MCV-ELM[29]	13.33%
		Harris 3D, HOG/ HOF, BoF with MV-ELM[30]	13.33%

MuHAVI-Uncut	LOAO	Proposed technique	93.66%
		HOG representation of MHI with nearest neighbor classifier[20]	84.1%
		Observable Markov model[33]	83.90%
		The sequence of key poses[34]	81.50%
		Learning discriminative key poses[35].	56.70%
	LOCO	Proposed technique	82.04%
		Deep spatiotemporal representation of MHI with MCV-ELM[29]	74.75%
		Deep spatiotemporal representation of MHI with MV-ELM[30]	74.75%
		HOG representation of MHI with nearest neighbor classifier[20]	52.2%
		The sequence of key poses [34]	50.4%
		Learning discriminative key poses [35].	31.4%
	LOSO	Proposed technique	97.02%
		HOG representation of MHI with nearest neighbor classifier[20]	96.6%
		The sequence of key poses [34]	86.5%
		Learning discriminative key poses [35].	56.6%

IXMAS	LOSO	Proposed technique	71.94%
		Substructure and boundary modeling [36]	76.5%
		Self-organizing map of action poses and fuzzy distance for MLP[37]	89.9%
		The sequence of key poses [34]	85.9%
		Multiview spatiotemporal histogram[38]	81.4%
		Spatiotemporal volumes (3DSTVs) mapped to 4D[39]	78%
	LOCO	Proposed technique	74.52%
		Spatiotemporal visual words to learn SVM model[40]	57.30%
		3D grid to learn HMM model for action recognition[41]	57.90%
		Sphere and rectangular feature trees with nearest neighbor classifier[42]	72.60%
		Histogram of silhouettes, horizontal and vertical optical-flow for action recognition[43]	58.10%