Research Article

LTTng CLUST: A System-Wide Unified CPU and GPU Tracing Tool for OpenCL Applications

Table 2

Synchronous OpenCL API function overhead benchmark.

Loop
Size
Base
ave.
(ns/call)
Base
Std. dev.
(ns/call)
Preload
ave.
(ns/call)
Preload
Std. dev.
(ns/call)
Trace
ave.
(ns/call)
Trace
Std. dev.
(ns/call)
Preload
overhead
(ns/call)
Trace
overhead
(ns/call)

116318338382367
105.20.57.80.6366.52.22.6361.3
4.640.046.660.05365.686.382.02361.04
4.2910.0066.0580.028365.1682.881.767360.877
4.2770.0126.2830.036359.78013.4252.006355.503
4.5260.0056.4840.101359.3791.0551.958354.853
4.5310.0296.4670.097363.3135.1381.936358.782
4.5370.0186.4990.150361.1452.7911.962356.608
4.5350.0226.4600.026361.1081.9661.925356.573

Sample size = 100.