OpenCL 2.0 provisional spec gets outlined, OpenGL 4.4 released

SIGGRAPH has only just begun, but the Khronos Group is already giving folks of the graphics programming persuasion some fresh APIs to talk about. Yesterday marked the release of the OpenCL 2.0 provisional specification, and it's boasting an Android installable client driver extension, along with improvements to image handling, shared virtual memory and more. It's expected that the new version of OpenCL will be finalized in six month's time, and feedback regarding the changes are being welcomed. The fresh OpenGL 4.4 spec revamps everything from shaders to asynchronous queries while keeping full backwards compatibility, and includes additional functions to make porting Direct3D apps a smoother process. If parallel programming and cross-platform graphics are your thing, hit the break for the full feature breakdown in the press releases.

Show full PR text

Khronos Releases OpenCL 2.0

New generation of industry open standard for cross-platform parallel programming delivers increased flexibility, functionality and performance

July 22nd 2013 – SIGGRAPH - Anaheim, CA – The Khronos™ Group today announced the ratification and public release of the OpenCL™ 2.0 provisional specification. OpenCL 2.0 is a significant evolution of the open, royalty-free standard that is designed to further simplify cross-platform, parallel programming while enabling a significantly richer range of algorithms and programming patterns to be easily accelerated. As the foundation for these increased capabilities, OpenCL 2.0 defines an enhanced execution model and a subset of the C11 and C++11 memory model, synchronization and atomic operations. The release of the specification in provisional form is to enable developers and implementers to provide feedback before specification finalization, which is expected within 6 months. The OpenCL 2.0 provisional specification and reference cards are available at

"The OpenCL working group has combined developer feedback with emerging hardware capabilities to create a state-of–the-art parallel programming platform - OpenCL 2.0," said Neil Trevett, chair of the OpenCL working group, president of the Khronos Group and vice president of mobile content at NVIDIA. "OpenCL continues to gather momentum on both desktop and mobile devices. In addition to enabling application developers it is providing foundational, portable acceleration for middleware libraries, engines and higher-level programming languages that need to take advantage of heterogeneous compute resources including CPUs, GPUs, DSPs and FPGAs."

Updates and additions to OpenCL 2.0 include:

Shared Virtual Memory
Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.

Dynamic Parallelism
Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.

Generic Address Space
Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.

Improved image support including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.

C11 Atomics
A subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across work-groups executing on a device or for sharing data between the OpenCL device and host.

Pipes are memory objects that store data organized as a FIFO and OpenCL 2.0 provides built-in functions for kernels to read from or write to a pipe, providing straightforward programming of pipe data structures that can be highly optimized by OpenCL implementers.

Android Installable Client Driver Extension
Enables OpenCL implementations to be discovered and loaded as a shared object on Android systems.

OpenCL SPIR 1.2 Provisional Specification
In addition, the OpenCL Working Group also today released the OpenCL SPIR 1.2 provisional specification for public review. 'SPIR' stands for Standard Portable Intermediate Representation and is a portable non-source representation for OpenCL 1.2 device programs. It enables application developers to avoid shipping kernel source and to manage the proliferation of devices and drivers from multiple vendors. OpenCL SPIR will enable consumption of code from third party compiler front-ends for alternative languages, such as C++, and is based on LLVM 3.2. Khronos has contributed open source patches for Clang 3.2 to enable SPIR code generation.

Industry Support
"These 2 new OpenCL specifications will allow software developers to accelerate a much wider variety of applications on a greater range of devices than previously possible. OpenCL 2.0 will allow applications to process more complex data and algorithms in parallel than was possible in previous standards, while OpenCL SPIR will allow a variety of different programming languages to be compiled directly into OpenCL code for heterogeneous systems," said Andrew Richards, CEO of Codeplay. "These are 2 big steps forwards to enable software developers to embrace heterogeneous platforms and Codeplay is actively involved in developing for both already."

"Intel has been deeply involved in shaping new OpenCL 2.0 features like Shared Virtual Memory and OpenCL SPIR", said Jonathan Khazam, vice president and general manager of Intel's Visual & Parallel Computing Group. "We are very excited about the improved programmability of OpenCL 2.0 and the potential to create new experiences with Intel® Iris™ Graphics Products."

Tony King-Smith, EVP marketing for Imagination Technologies, said: "As a long-standing Promoter, Imagination is delighted to see Khronos release this major upgrade to the OpenCL API standard. We see an ever widening portfolio of markets relevant to OpenCL, from mobile and consumer multimedia-rich devices through automotive infotainment up to advanced cloud servers and supercomputers. OpenCL is gaining traction among our customers as a means to deliver high-performance compute on our widely deployed PowerVR GPUs as well as our MIPS CPUs. Indeed we have been among the first to enable OpenCL in GPUs for mobile and embedded SoC devices already in production, including some of the leading smartphones and tablets shipping today. We have also been one of the first to demonstrate the significant power saving advantages of OpenCL on GPU running alongside OpenGL ES in real applications – a benefit often overlooked by application developers today. We look forward to continued industry momentum behind OpenCL as a key enabling API for GPU compute and heterogeneous processing."

"The ability to perform compute-intensive tasks in parallel, using virtually any processor present in the device opens the door for significant performance and functionality improvements in several industries from Automotive to SmartTVs, game consoles and the smartphones. Vivante's GPU family have been utilizing OpenCL API for long time and we continue to be in forefront to support this new major API version as it will further improve the flow of getting even more complex things done, much faster and better," said Weijin Dai, CEO of Vivante Corp. "We're pleased to equip our customers with our GPUs that are faster, smaller and cooler when we see OpenCL to become a significant standard in our customers' multi-core implementations."

Show full PR text

Khronos Releases OpenGL 4.4 Specification

Conformance Tests created to accompany API specification and reference documentation available now; Full backwards compatibility maintained

July 22nd 2013 – SIGGRAPH - Anaheim, CA – The Khronos™ Group today announced the immediate release of the OpenGL® 4.4 specification, bringing the very latest graphics functionality to the most advanced and widely adopted cross-platform 2D and 3D graphics API (application programming interface). OpenGL 4.4 unlocks capabilities of today's leading-edge graphics hardware while maintaining full backwards compatibility, enabling applications to incrementally use new features while portably accessing state-of-the-art graphics processing units (GPUs) across diverse operating systems and platforms. Also, OpenGL 4.4 defines new functionality to streamline the porting of applications and titles from other platforms and APIs. The full specification and reference materials are available for immediate download at

In addition to the OpenGL 4.4 specification, the OpenGL ARB (Architecture Review Board) Working Group at Khronos has created the first set of formal OpenGL conformance tests since OpenGL 2.0. Khronos will offer certification of drivers from version 3.3, and full certification is mandatory for OpenGL 4.4 and onwards. This will help reduce differences between multiple vendors' OpenGL drivers, resulting in enhanced portability for developers.

"The delivery of conformance tests for OpenGL 4.4 is a significant milestone – as it is vital for developers to be able to rely on the API they are trusting to accelerate their content across multiple platforms," said Barthold Lichtenbelt, OpenGL ARB working group chair. "The OpenGL ARB is committed to continue to deepen communications with the developer community so we can continue to build OpenGL functionality that creates real-world business opportunities for the 3D industry."

New functionality in the OpenGL 4.4 specification includes:

Buffer Placement Control (GL_ARB_buffer_storage)
Significantly enhances memory flexibility and efficiency through explicit control over the position of buffers in the graphics and system memory, together with cache behavior control - including the ability of the CPU to map a buffer for direct use by a GPU.

Efficient Asynchronous Queries (GL_ARB_query_buffer_object)
Buffer objects can be the direct target of a query to avoid the CPU waiting for the result and stalling the graphics pipeline. This provides significantly boosted performance for applications that intend to subsequently use the results of queries on the GPU, such as dynamic quality reduction strategies based on performance metrics.

Shader Variable Layout (GL_ARB_enhanced_layouts)
Detailed control over placement of shader interface variables, including the ability to pack vectors efficiently with scalar types. Includes full control over variable layout inside uniform blocks and enables shaders to specify transform feedback variables and buffer layout.

Efficient Multiple Object Binding (GL_ARB_multi_bind)
New commands which enable an application to bind or unbind sets of objects with one API call instead of separate commands for each bind operation, amortizing the function call, name space lookup, and potential locking overhead. The core rendering loop of many graphics applications frequently bind different sets of textures, samplers, images, vertex buffers, and uniform buffers and so this can significantly reduce CPU overhead and improve performance.

Streamlined Porting of Direct3D applications
A number of core functions contribute to easier porting of applications and games written in Direct3D including GL_ARB_buffer_storage for buffer placement control, GL_ARB_vertex_type_10f_11f_11f_rev which creates a vertex data type that packs three components in a 32 bit value that provides a performance improvement for lower precision vertices and is a format used by Direct3D, and GL_ARB_texture_mirror_clamp_to_edge that provides a texture clamping mode also used by Direct3D.
Extensions released alongside the OpenGL 4.4 specification include:

Bindless Texture Extension (GL_ARB_bindless_texture)
Shaders can now access an effectively unlimited number of texture and image resources directly by virtual addresses. This bindless texture approach avoids the application overhead due to explicitly binding a small window of accessible textures. Ray tracing and global illumination algorithms are faster and simpler with unfettered access to a virtual world's entire texture set.

Sparse Texture Extension (GL_ARB_sparse_texture)
Enables handling of huge textures that are much larger than the GPUs physical memory by allowing an application to select which regions of the texture are resident for 'mega-texture' algorithms and very large data-set visualizations.

Industry Support
"AMD has a long tradition of supporting open industry standards, and congratulates the Khronos Group on the announcement of the OpenGL 4.4 specification for state-of-the-art graphics processing," said Matt Skynner, corporate vice president and general manager, Graphics Business Unit, AMD. "Maintaining and enhancing OpenGL as a strong and viable graphics API is very important to AMD in support of our APUs and GPUs. We're proud to continue support for the OpenGL development community."

"We worked closely with Khronos on OpenGL 4.4, so we wanted to make sure the day it was announced we had compliant drivers for our Fermi and Kepler GPUs," said Tony Tamasi, senior vice president, Content and Technology at NVIDIA. "We're also working to bring support to Tegra, so developers can create amazing content that scales from high-end PCs down to mobile devices." (These products are based on the published OpenGL 4.4 Specification, and are submitted to, and are expected to pass, the Khronos Conformance Testing Process. Current conformance status can be found at