Let’s talk about Vulkan®. When you’re submerged deep within a subject, any subject, sometimes it’s difficult to see how fast a landscape evolves and changes in a relatively short period of time. This is certainly the case for the Vulkan API. When I started the design and implementation of our OpenGL® SC 1.0 and OpenGL® SC 2.0 application libraries on Vulkan, VkCoreGL® SC1 and VkCoreGL SC2 a little over two years ago, most people I interacted with, including some of our customers, had very limited knowledge of what the Vulkan API was and what it could do. Since then, Vulkan has seen wider industry adoption, and the conversations I have these days are less about convincing people why Vulkan is great, and more about specific technical details: how to take advantage of the Vulkan API to squeeze as much performance as possible out of a system, and how to scale up from a Vulkan foundation.
Although more people are familiar with Vulkan these days, there are still some concerns emanating from the unconverted, which I hope to be able to address in this piece. These are concerns related to the low-level nature of the API, and the perceived amount of code that needs to be written before anything really happens (for example, when a triangle is drawn, or a compute workload is processed). But before we get to that, there is a latent question that should be addressed: “Why Vulkan?”. This is a question that we, at CoreAVI, had to answer for ourselves.
In the fall of 2017, we were winding down the development of our safety critical OpenGL® drivers for the previous generation of GPUs, and we began planning for the future. We were already operating in a wide range of markets: mil-aero, industrial, rail. Automotive was just on the horizon. The range of requirements for system resources between each of these markets (for example memory resources, power consumption, CPU and GPU capabilities) meant that we had to continue to support a wide range of system architectures – some very low power SoCs, and other much hungrier, discreet GPUs. Customer needs were also rapidly evolving. Although operating with a much slower adoption rate than other sectors, the safety critical world was beginning to catch up to the explosion of machine learning and artificial intelligence algorithms, already being used in the mainstream world. We knew we needed to move beyond the OpenGL API. OpenGL was great when GPUs were just graphics devices. But the world had changed. Now most of the work performed by GPUs was performed to accelerate computations. On the pure-compute side there was another Khronos API, OpenCL™. The problem with OpenCL for us was that we still needed to provide a path for graphics to our customers, and although OpenCL had the interoperability APIs with OpenGL, those were never great. There were other popular compute APIs, but they were vendor specific, and impossible to standardize.
So, why Vulkan? Because Vulkan presented an interesting set of possibilities. It provides graphics and compute capabilities combined into a single API. This means that an application inferencing a video stream using a convolutional neural network could easily visualize the results of the neural net’s computations and display them on the video stream itself, for example, by drawing a bounding box around the objects, and displaying the predicted labels. Vulkan exposes capabilities available in modern GPUs which were not present in OpenGL SC 1.0 or OpenGL SC 2.0 as those standards were based off much older versions of OpenGL, from an era when GPUs were much simpler devices. Vulkan has the added benefit of being a very thin API. The API itself defines many functions, but each individual function typically does very little work compared to higher level APIs like OpenGL or OpenCL, where each function performs a lot of work under the hood. The benefit of a thin API is low overhead, and a high degree of control over resource management from the application’s point of view.
When it comes to API design there is always a tradeoff between the barrier to entry—the degree of difficulty for a new developer to adopt the new technology given similar industry standards—and performance. The higher level an API is, the more assumptions that an implementation must make when performing any given operation, and the more overhead each API call has. Low level APIs have the advantage that the application gets to decide exactly the state that it wants to create, and the application can define how to use and manage that state. Still, we must acknowledge that a lower-level API such as Vulkan presents increased responsibilities for the applications. But how much of a burden is this, really? Does it translate into thousands of extra lines of code that must be written? I often hear concerns of this nature, that Vulkan developers need to do a lot more work to achieve the same thing compared to other APIs. I hope to provide some background, and some examples that might help alleviate this concern.
Let’s define an example to help crystalize the different tasks that a Vulkan developer might have to do vs. an OpenGL developer. Suppose that we want to write a simple graphics application that draws a single triangle. In OpenGL such an application might require 500 lines of code. In Vulkan, the same application might be written in 1500 lines of code. At first, it seems like there is a significant increase in effort required for Vulkan. It turns out that in practice, the extra effort is performed only once, and not for every single application. This is because a Vulkan developer will implement a suite of helper functions that will be re-used for any future applications that they write. Let me explain. A Vulkan application has to create a number of objects and pipeline states before the operation can be submitted to the GPU. During initialization, an application typically instantiates a Vulkan object, and from this object it later finds the physical device, creates the Vulkan logical device, and all resources related to surfaces and framebuffers are created. The application must also create memory pool resources for command buffers and resource descriptors, as well as prepare pipeline states like depth/stencil/color blending, etc. If you’re not an engineer, you don’t have to understand what each of these things means. What is important to understand is that these are the extra steps that a Vulkan developer must take. This is work that an OpenGL implementation performs under the hood, hidden from the application. What is also important to understand, is that in practice this work is performed once with Vulkan, as the first Vulkan application is written, and not every single time a new application is written. Typically, application developers create a framework of helper functions that defines exactly the state that they require for most of their applications. This means that although initially that suite of Vulkan-based helper functions must be implemented, it ends up being re-used in over 90% of future use cases. The advantage of the upfront investment in the helper function is that unlike with higher level APIs like OpenGL or OpenCL, the application developer will implement exactly the state that they require for their uses cases, resulting in less overhead and memory footprint.
There is one more topic that I’d like to cover: building up from a Vulkan foundation. At CoreAVI, we are building a suite of libraries on top of our safety critical implementation of Vulkan. Thus far I have been referring to Vulkan in general because the concepts I was discussing apply to the Vulkan API in general. Of course, our implementations are based on the safety critical version of the API, which CoreAVI is currently helping define in the Vulkan® SC Khronos working group. Our suite of libraries include full implementations of OpenGL SC 1.0 and OpenGL SC 2.0 on top of our safety critical Vulkan driver. On the ML/AI front we are adding our SafeAI platform which includes ComputeCore™ –a safety critical implementation of the BLAS and FFT APIs–and VkCoreVX® SC, which is our safety critical OpenVX™ 1.3 implementation. These are libraries that run on top of our safety critical Vulkan driver. I mention our suite of libraries for two reasons. First, we want to show the importance of a thin API. Vulkan allows us to build up into the application stack by adding frameworks and helper libraries, specifically because it is a thin API, with minimal overhead. This means that products built on top of Vulkan lose very little control and performance to the implementation. If this weren’t the case, it would have been impossible to build an OpenGL stack or an OpenVX implementation on top of Vulkan. The second reason is to show that CoreAVI realizes that in some cases, our customers are willing to trade control over resource management for lower complexity in a higher-level API. The advantage of a framework built on top of Vulkan is that for sophisticated algorithms (like computer vision pre-processing, and neural network inferencing), the application can leverage the higher level OpenVX and ComputeCore platform, and for proprietary algorithms that require a high level of resource management and control, it can interact directly with the Vulkan implementation. To visualize the result of compute operations, the application could interact directly with Vulkan, or with one of our VkCoreGL SC1 or SC2 libraries for a safety critical OpenGL implementation.
We believe the future of safety critical compute and graphics lies in Vulkan SC. The technical arguments for Vulkan are simple: first, it’s a modern industry defined API capable of meeting the graphics and compute use cases of the future, and it is the only API with a clear path to meet safety critical certification requirements for both graphics and compute from multiple industry sectors: automotive, avionics and industrial. Second, the thin nature of the API, and increased application control, guarantees that that a software stack can be built to match market demands. At CoreAVI we are happy to meet our customers at different levels of complexity: close to the metal using Vulkan SC, and higher up the stack with our graphics and compute libraries.
*Vulkan, Vulkan SC, OpenVX 1.3 and OpenGL SC 2.0 products are based on a published Khronos Specification and is expected to pass the Khronos Conformance Process. Current conformance status can be found at www.khronos.org/conformance.