Advanced Imaging

AdvancedImagingPro.com

   

Advanced Imaging Magazine

Updated: January 12th, 2011 10:01 AM CDT

High Performance 3D Image Reconstruction

Improved performance of high-end platforms accelerates reconstruction times and enhances image quality
3D geometry
3D geometry: One of the first steps consists of re-sampling the raw projections to compensate for source-detector geometry distortions. The projections are up-sampled in such a way that a nearest neighbor-approach can be used during the backprojection. For a given sub-volume, only a small part of the projection needs to be considered and loaded into fast memory for the best performance.
Run-Time Stack chart
The software tools available for the cell platform include all traditional development tools, such as compilers, debuggers and real-time trace analyzers. The run-time software layers make it possible to simplify the multiprocessing paradigm and the required data distribution techniques in such a way that the productivity can be maintained.
Advertisement

By Dr. Marc Kachelriess, Dr. Michael Knaup, and Olivier Bockenbach

However, image quality takes on a different meaning when it is evaluated in connection with reconstruction speed. For example, it takes approximately the same time for a cell processor to reconstruct a 1024³ volume as for a PC to reconstruct a 512³ volume. Provided that the detector allows for this increased resolution, high performance can offer much better resolution of the volume and in any case better image quality.

Performance

The cell processor offers the best performance when compared with all other designed architectures. The cell processor and the GPU we have selected are among the most recent technologies available on the market. The Virtex-II is definitely not among the most recent FPGA packages, and the PC reference platform certainly has more powerful successors with the Dual Core and Quad Core processors.

The newest Virtex-4 and Virtex-5 versions can run at clock speeds about five times faster than the version we have investigated. Furthermore, Xilinx proposes designs to implement DDR-2 interfaces on the Virtex-4 and Virtex-5 chips, thus giving the same increase of 5x in performance for the memory subsystem. Without considering the improvements in the current layout of the newest FPGAs, an increase in the performance factor of 5x is the absolute minimum to be expected.

The improvement in performance obtained with the newest DualCore and Quad Core architectures is more difficult to estimate. The way in which the I/O and memory access resources are shared and distributed depends on the design of the processor. Nevertheless, with even the best case of a 4x performance improvement, a quadcore system does not come close to the performance of a Cell processor or a GPU, not to mention the newest FPGAs: the reconstruction time using a standard quadcore system is at least six times slower compared to a cell processor and four times slower compared to a modern GPU.

Software complexity

The most obvious and stable implementation has been realized using a PC. The cell processor offers a multi-computing platform comparable to multi-computers, such as those developed by Mercury Computer Systems with RACEWay, RACE++ and RapidIO. Even though all GPU boards support OpenGL and DirectX, the level of efficiency for different GPU boards varies considerably, even when these are from the same manufacturer. The result is that performance is not predictable across GPU board generations. The coding of reconstruction algorithms is very much more difficult on FPGAs, mainly because they do not offer floating point operators and operators such as multiply, divide, sine and cosine. These functions must be coded as application-dependent look-up tables (LUTs).



Subscribe to our RSS Feeds