Speculation runs rampant on what Fermi can do, but ExtremeTech takes a deep look and talks to NVidia to get some real answers, and shares them in a new writeup on their site. Most of this is stuff I’ve already heard and confirmed from multiple other sources, but it’s a good consolidated writeup with a few important nuggets. One such tidbit is in the C++ Support:
Before Fermi, Nvidia ran C natively on the GPU, but not C++. Fermi changes that and facilitates running C++ by supporting features such as virtual functions, function pointers, the “new” and “delete” operators for dynamic object allocation and de-allocation, and try/catch/throw exception handling. It is important to note, however, that full C++ support will not be available with the first shipments of Fermi-based boards, but will evolve over time as Nvidia continues to update its software drivers.
Also, in recursion support:
For example, despite the fact that Fermi has a stack, one extremely powerful programming technique called recursion will not be available until sometime after the initial launch. Recursion is the ability for a program function to call itself, a technique frequently used by algorithms to break a big problem or data set into smaller pieces—this is fundamental to certain powerful rendering algorithms like ray tracing. But despite the lack of recursion, ray tracing has been available on Nvidia boards for a few years now. The software vendors get around the lack of recursion by implementing a limited depth state machine or in-lining multiple nested function calls to emulate it. Once recursion becomes available, developers will no longer have to spend time performing additional coding acrobatics.
Another big feature is the “Concurrent Kernel Execution” via the new GigaThread scheduler, and how much better is it?
Fermi’s GigaThread scheduler, the master code that schedules all the threads on the board for execution, also has improved context switching. GigaThread can perform a context switch in as little as 20 microseconds, providing another important performance boost to programs on the GPU because the time penalty for swapping code in and out for execution is significantly less. Fermi’s context switching is ten times faster than the GT200.
Read the full details on everything Fermi at their site.