One more great feature of the higher-end Quadro cards is the “Dual Copy Engines”, essentially two DMA-transfer units that enable you to upload and download data simultaneously.  This means you can upload data for the next step (or frame, or whatever) while work continues on the existing frame and is downloaded.  If you can tweak your algorithm to work in this manner, it can offer some great performance boosts.

The following results (Figure 6) show a download-processing-readback pipeline streaming HD (8 MB per frame) and 4K (32 MB per frame) images with varying processing times (10 ms, 20 ms, and 30 ms) comparing the four methods listed.

  • Synchronous
  • CPU asynchronous with PBO’s
  • GPU asynchronous using the copy engine for download
  • Static or cached case where no streaming is involved

It is seen that the performance measured by fps is almost the same between HD and 4K video streaming for all the processing times despite the 4× data size that is downloaded for the 4K images. This shows that download and processing is happening truly asynchronously on the GPU using Quadro copy engines.

Read the entire whitepaper here. (Download, View Online)

Tags