BioMed Research International

Research Article

Massive Exploration of Perturbed Conditions of the Blood Coagulation Cascade through GPU Parallelization

Figure 3

Schematic description of CUDA’s architecture, in terms of threads and memory hierarchy. Left Side. Threads organization: a single kernel is launched from the host (the CPU) and is executed in multiple threads on the device (the GPU). Threads can be organized in three-dimensional structures named blocks which can be, in turn, organized in three-dimensional grids. The dimensions of blocks and grids are explicitly defined by the programmer. Right Side. Memory hierarchy: threads can access data from many different memories with different scopes. Registers and local memories are private for each thread. Shared memory lets threads belonging to the same block communicate and has low access latency. All threads can access the global memory, which suffers high latencies, but it is cached since the introduction of the Fermi architecture. Texture and constant memory can be read from any thread and are equipped with a cache as well; in this work we exploit the constant memory. Figures are taken from Nvidia’s CUDA programming guide [60].