Description
Our objective was to create an efficient multi-GPU implementation of the classical Feldkamp cone-beam algorithm. No quality trade-offs were allowed, all optimizations had to be carried out without compromising the final result. The main focus was the processing of large datasets (≥ 5K), but the efficiency for smaller ones also had to remain high.
The module developed over the course of the project is integrated in Neoscan microCT software and supports all common options, like 360/180+ scans, beam hardening, ring artifacts, smoothing, misalignment compensation, ROI, etc.
Our optimization goals were to develop a solution using CUDA, multi-GPU support, and server clusters. We have maximized the performance of all computational stages by efficiently balancing the load of different GPU subsystems like memory bandwidth, caches, texture blocks and SMs. We have also implemented direct asynchronous access to disk drives and used direct work in the main graphic file formats without using slow standard codecs.