CAPS Entreprise was at sc09 announcing the latest version of their HMPP compiler toolsuite. If you are not familiar with their product, it is a toolsuite to aid in hybrid GPU/CPU software development. They were nice enough to invite us for a talk and give us a demonstration of their product.
At the heart of HMPP, it is an extension to existing compiler to help with GPU compute code. With the wide variety of compilers (Microsoft, PGI, GNU, Intel), IDE’s (Visual Studio, Eclipse, Emacs) and GPU Languages (CUDA, OpenCL, Streams, Brook) it can be daunting to develop applications that efficiently works across the wide variety of environments.. HMPP aims to make this all much simpler by allowing you to insert simple directives into your code indicating data structures and routines to accelerate.
You may have noticed that I didn’t say insert code. That’s one of the true strengths of HMPP: You write your code in your usual language (C and Fortran are both supported) and surround it with these directives. The directives mark the beginning and end of code to move to the GPU, data flow patterns, and some other “meta” directives such as selecting a target platform (like CUDA or Streams). When HMPP processes your code, it finds these directives and automatically transforms it into the necessary GPU code, using several advanced algorithms to optimize the execution and dataflow to maximize throughput. This code can then be processed by the usual compiler.
That’s not all tho. At runtime, when your program stumbles onto one of the accelerated blocks, it quickly determines if the required hardware is available. If it is, it’s used. If it is not, then the original code you wrote is executed. This pattern makes it great for writing portions of your code in C, and letting HMPP turn it into several different GPU languages, guaranteeing the fastest performance possible on a variety of platforms.
The HMPP product has been around for a little while, but the french based Caps Entreprise has been hard at work on the latest version. In this new version they have added “codelets”, which is a method of chaining multiple functions together in an optimized way. Previously if your code needed to run multiple kernels on a single block of GPU memory, you were required to read the results back into main memory and reload them again (or go to lengths to prevent it). With this new model you simply define the relationship of input and output parameters between the functions, and wind up with a single codelet that will execute all the functions in the GPU.
If you’re into GPGPU programming but tired of dealing with the many quirks and subleties of the various implementations, then HMPP is a tool you should definitely give serious consideration too. For more information:
If you’ve tried out HMPP, give us your opinions of it in the comments below.
