Last week we were able to interview Berk Geveci, Kitware’s Director of Scientfic Computing, about Kitware’s dedication to open source computing and their popular VTK and ParaView products.  Berk now leads a team of 18-20 people focused on high performance computing scientific visualization and, more recently, informatics and information visualization.

Kitware is approximately 12 years old, and started as a research company around the Visualization Toolkit (VTK).  The business model is founded on supporting and consulting on open-source software, tho now they have 5 differenct groups on various interests.  Berk runs the Scientific visualization group, but they also have groups on computer vision, medical imaging and computing, data publication, and software processes like CMake and CDash.

During the course of the interview we discussed how Kitware deals with open-source and proprietary technologies, their government and industry collaborations, and what we might see in the next version of these popular products.

Read the interview after the break.

Q: Kitware is well-known for several open-source products, how does it sustain itself when it gives so much away?

When we started, the idea was that we would build open-source infrastructure, and then sell traditional commercial products  based on those products.  As things have evolved, we have discovered that the commercial products are more of a distraction as the majority of revenue came from the service business.

We are essentially a service company and a collaboration company, getting most of our revenue from consulting fees and grants.  What really enabled that is that Kitware has a strong foundation in open-source software, we are not simply selling software services in the IT sense but rather providing the Open Source framework and then providing the necessary consulting services to extend it and customize it to several uses.  We’ve been very successful at that.  The important thing is that we have a very strong team, several Ph.d.s and several well-known figures in the field, and we have become experienced at writing grant proposals.

Q: With so much funding coming from Grants and NSF, DOE, DOD, SciDAC funding, has Export Controls become a problem?

That problem is not isolated to government agencies, but applies to commercial industry as well as several companies do not want to release their private IP or company secrets.  Individual partners can specify if they want their work released or not, but we advocate on behalf of releasing everything into open-source.  We tell them that they are contributing to the community, but also getting the benefits of the community for maintenance and updates.

There are simply some cases where the code must remain proprietary.  In my group, it doesn’t happen often since we come in with our existing Open Source software and that convinces people to relax and more carefully select what needs to be restricted and why.  Groups like Sandia and Los Alamos actually go through local security channels to get clearance to release everything open-source before work begins.  But, even when things are export-controlled we try to be open with it (not open-source) but to share it with other agencies under similar restrictions.  Such collaborations occur between DoE and DOD labs.

Q: How have patent restrictions impacted you, like the recent GE Marching Cubes patents that recently expired?

A 10-foot pole is the best answer.

We try out best to avoid patented algorithms and 3rd party libraries under restrictive licenses.  MPEG4 is a good example, as we have purchased a license allowing us to include MPEG4 in VTK, but we’ve decided to move away from it into less restrictive and more option alternatives.

Originally, Kitware went down the ‘Patent’ path but it conflicted with out business model.  Some users may remember a time when we removed all the patented algorithms from VTK and placed them in an optional module that provided warnings when enabled.  At the time, GE had allowed use of Marching Cubes for noncommercial uses.  Once the patent expired, we returned it to the main source base.

Q: Regarding the MPEG4 situation, Have you considered implementing the Google WebM or Ogg formats?

Ogg Theora is actually part of ParaView right now, as well as Motion JPEG.  Ogg has difficulties with playback and embedding into Powerpoint presentations, which is a popular use, so the Motion JPEG is a good fallback.

The Ogg Theora encoder is a good open-source success story, because we didn’t add that but rather it came from a community contribution.

Q: Several people believe open-source is low-quality.  How do you respond?

That’s an image we try to dispel, especially with our own software.  Software that is maintained by highly-skilled and talented employees don’t necessarily have to be part of a central company, but simply need a dedication to the product.  Obviously, we have ways to deal with quality such as an internal quality process based on software designed to manage the process like CMake and CDash.  We tell people what our software process is and what we do to develop high-quality software,

The way we pitch our software is, beyond anything else, talking about our software process and how widely it is used.  It makes a huge difference when we tell people that DoE and DOD have been using our quality software and developing for it and integrating it into their own products.  All of those together helps us to convince people.

Once issue we have encountered is that most open-source software lacks commercial support, and companies in particular want their users to be able to call some entity for professional support and assistance.  Kitware is unique in that we can provide that support for our products, products that we develop ourselves with community involvement.

Q:  Percentage-wise, how much of the contributor base is international?

The percentages vary widely depending on the particular product.  For ParaView, the large majority is in the US, but more recently we’ve been receiving more and more contributions from european partners.  We have had a fair number of contributions from the Swiss National Supercomputing Center and EDF (Electricity De France), as well as CEA which focuses on french military use of technology.

For VTK, there have been more european contributions proportionally, most of which was related to medical.  The medical community in Europe has been more closely involved in Europe, and the same goes for ITK.  In VTK and ParaView, probably somewhere around 80-90% for the US, with the majority remaining from Europe.

Q:  When Kitware created an image processing toolkit (ITK) they released it as a separate library, but when they recently began work on an informatics toolkit, they rolled it into VTK. Why the difference?

ITK wasn’t developed entirely under our control.  A consortium, the National Library of Medicine, originally developed ITK.  They wanted ITK to be separate from VTK, brand and license it individually rather than have it an addon to VTK.  That’s the main reason, but there were also software development reasons.  ITK is based on generated programming and heavily templated, while VTK uses a much simpler flavor of C++.  Basing ITK on VTK would have held them back from techniques they wanted to use.

When it came to Information Visualization, we didn’t have those same restrictions.  The group of organizations, mainly led by Sandia, were already familiar with VTK and inclined to include it.  Now, it’s more of a technical issue and as the toolkits move forward, in order to move forward and use existing technologies we began to integrate with external libraries.  We didn’t want it to become a kitchen sink, so the informatics is in an external tool that’s optionally available.

The toolkit is called ‘Titan‘, and includes a working distribution of VTK.  Currently, it is heavily based on text analysis with a dependency on the QT library for rendering and database connectivity.

We have begun thinking about how we can make VTK more modular to allow users to optionally enable only certain libraries.  For example, Volume Rendering shouldn’t require the informatics tolkit.  What drives me is to reduce the size of executables and libraries, in particular for supercomputers.

Q: At last year’s IEEE VIsWeek Paraview Tutorial, there was discussion of modular compile design for in-situ work.  Has any of this come to light yet?

Currently, in particular for in-situ paraView, we have resolved most of our dependency issues so that we can rely on the static linker to only link in the subset of libraries and objects required.  For example, it won’t load in all of the writer objects, just the one writer you need.  However, you still have to compile the entire toolkit once.

We’ve been able to implement some of this and have managed to create an in-situ use that’s only 8 megabytes, rather than the hundreds it is currently.

Currently, we’ve been able to run ParaView in-situ up to 1800 processors on a BlueGene system, but we are actively attempting to collect more information.

Q: Recently, there was a great IEEE paper where they analyzed use of VisIt scaled up to numbers like 32,000 and 64,000 cores.  What are you doing along these lines?

That was a really great paper, and I’ve been anxiously awaiting it for some time, so I’m very happy to see it finally published.

First off, we’re looking in-depth at some of the issues they ran into with high-end scaling with issues like point-to-point vs tree-based communication structures, and we are implementing similar techniques.

In the near-term, we are going to continue to develop and test the In-Situ uses of ParaView until we max our allocation.  As a company, we have to work with our partners to get these large allocations as we are not eligible to write grants for access to such structures which restricts our ability to do these massively scalable tests.

We are also scaling to a smaller number of cores but maintain interactivity.  The VisIt test was, I believe, all batch but we believe interactive visualization is still a useful environment.  We are interested in scaling at 10,000 to 15,000 cores but maintain interactivity, as several sites have stopped purchasing dedicated analysis clusters but rather ran directly on the supercomputers.

Q: What else is KitWare working on right now?

One product that I’m very excited about is a project for enabling collaboration within the ParaView framework, and part of this is to support web visualization.  What we’ve done previously is build a complete off-the-shelf solution for visualization, but we’re not looking at a different strategy where we build a set of applets for interactive visualization and a javascript library API to then access the server.  This will make the system more of a Web Toolkit for visualization allowing people to build their own web applications or embed them as applets into existing applications.

We have an operational demo up and running as a proof-of-concept, and are working to deploy a larger public demo allowing public use in a trial setting.  One big difference in this system is that we support everything over HTTP or HTTPS, hopefully solving some of the earlier security concerns and firewall issues.

The goal of the project in it’s final form, we are currently 8 months into the 2-year project, is to enable collaborative visualization where multiple users can view the data from multiple clients.  Some users can be web clients and some users can be using desktop clients, and perhaps a user is viewing from a large tiled display, all simultaneously.

Q: How does the web product compare to Tableau Public or TACC EnVision?

Obviously one major difference is that this is an open-source toolkit rather than a COTS solution.  A parallel backend is not required, but supported if the user needs it as it’s just a regular ParaView server.

This solution offers the full functionality of VTK and ParaView, so you can do information and scientific visualization via the web.

Q: How capable is the default deployment?

We have some sample applications that roughly expose large portions of the underlying capabilities, but we believe that full-blown ParaView is not very suitable for this use.

Right now we are mostly looking at customized and targeted applications for certain workflows  As an example, large scale Climate Data that has 5 filters to create a popular visualization and then expose certain relevant controls to the users.

We want to make it very easy for users to build these workflow-specific targeted applications.  In the long-term, one possibility is to provide more “visualization as a service” options connecting to a cloud rendering server, but that’s a rather long-term goal.

Q: What does Kitware see for the future?

We are currently investigating what we consider the “next generation” of visualization products. We are looking at a new pipeline architecture that will integrate a wide variety of parallel libraries.  It will, of course, include VTK and VisIt’s pipeline architecture, as well as modifying NumPY to be parallel capable.  None of this will be on a user’s desktop anytime soon, of course, but I’m excited about the possibilities.

We are also looking at the VisTrails product.  I’m very interested that it’s a full pipeline product, but it transfers not only the data but the parameters through the pipeline as well.  It’s loosely coupled, so it can be used as a great workflow tool for more than just visualization.  We are cooperating with VisTrails to integrate it closer as a workflow management tool beyond just visualization, possibly to add data management and HPC controls as well.