.Vulkan

Vulkan Tutorial - 101

Welcome. In this tutorial we will be learning about Vulkan and specifically all the steps and code we need to render a triangle to the screen.

First things first, let me just warn you that this is not a beginners tutorial to rendering APIs. I assume you have some experience with something like OpenGL or DirectX and you are here just to get to know the particularities of Vulkan. My main goal with this "tutorial" is to get to a complete but minimal C program running a graphical pipeline using Vulkan on Windows (linux also, if I get the time. If you are interested in the ongoing, but working port of this code to linux/XCB you can check this commit 15914e3). So, let's start.

House keeping

I will be posting all the code on this page. I will do it progressively, but you will be able to see every piece of code going into the tutorial. If you want to follow along and compiling the code on your side you can clone the following git repo:

git clone https://bitbucket.org/jose_henriques/vulkan_tutorial.git

I have successfully compiled and ran every commit on Windows 7 and 10 running Visual Studio 2013. Because I don't use the IDE (I only use the compiler), I provide a build.bat that you should be able to use to compile the code. You do need to have the cl compiler on your path before you can call the build.bat from your console. [you need to find and run the right vcvars*.bat for your setup. For the setup I'm using you can find it at "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64\vcvarsx86_amd64.bat"]

For each step I will point out the commit that you can checkout to compile and play around with on your side. For example, to get the initial commit with the platform skeleton code, you can do the following:

git checkout 39534dc

Now, as important as our goals, here are the things that I will not be trying to accomplish with this tutorial. First, I will not be creating a "framework" that you can take and start coding your next engine... I will indeed not even try to create functions for code that repeats itself some times. I see some value on having all the code that is involved on a process directly, instead of having to navigate a couple of indirections to get the full picture, specially for a tutorial.
This tutorial will finish once we get a triangle on the screen with a graphics pipeline with a vertex and fragment shader running. I might do some other tutorials for other topics, but this is not it!

You can use this code free of charge if that will bring you any value... I think this code is only useful to learn the API, but if you end up using it, credits are welcome.

Windows platform code

[Commit: 39534dc]

This is your typical windows platform code to register and open a new window. If you are familiar with this feel free to skip. We will be starting with this minimal setup and adding/completing it until we have our rendering going. [I will not explain this code, sorry...]

#include <windows.h>
                        
LRESULT CALLBACK WindowProc( HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam ) {
    switch( uMsg ) {
        case WM_CLOSE: { 
            PostQuitMessage( 0 );
            break;
        }
        default: {
            break;
        }
    }
    
    // a pass-through for now. We will return to this callback
    return DefWindowProc( hwnd, uMsg, wParam, lParam );
}

int CALLBACK WinMain( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow ) {

    WNDCLASSEX windowClass = {};
    windowClass.cbSize = sizeof(WNDCLASSEX);
    windowClass.style = CS_OWNDC | CS_VREDRAW | CS_HREDRAW;
    windowClass.lpfnWndProc = WindowProc;
    windowClass.hInstance = hInstance;
    windowClass.lpszClassName = "VulkanWindowClass";
    RegisterClassEx( &windowClass );

    HWND windowHandle = CreateWindowEx( NULL, "VulkanWindowClass", "Core",
                                        WS_OVERLAPPEDWINDOW | WS_VISIBLE,
                                        100, 
                                        100, 
                                        800,    // some random values for now. 
                                        600,    // we will come back to these soon.
                                        NULL,
                                        NULL,
                                        hInstance,
                                        NULL );
                               
    MSG msg;
    bool done = false;
    while( !done ) {
        PeekMessage( &msg, NULL, NULL, NULL, PM_REMOVE );
        if( msg.message == WM_QUIT ) {
            done = true;
        } else {
            TranslateMessage( &msg ); 
            DispatchMessage( &msg );
        }

        RedrawWindow( windowHandle, NULL, NULL, RDW_INTERNALPAINT );
    }

    return msg.wParam;
}

If you got the repo and checkout this commit you should be able to call build.bat to compile this code. This is the contents of the bat file if you just want to copy/paste this code and compile on your own:

@echo off

mkdir build
pushd build
cl /Od /Zi ..\main.cpp user32.lib
popd

This should compile our test application and create a binary called main.exe in your <project>/build folder. If you run this application you will get a white window at position (100,100) of size (800,600) that you can quit. That's it for the platform code.... Almost... We still need some more setup before we are done with the platform code.

Dynamically Loading Vulkan

[Commit: bccc3df]

Ok, now we need to start talking on how we get Vulkan on our system... It is not made very clear by Khronos or by LunarG whether you need or not their SDK. Short answer is no, you do not need their SDK to start programming our Vulkan application. In a later chapter I will show you that even for validation layer, you can, if you want, skip the SDK.

So, we need two things: the library and the headers. The library should already be on your system, as it is provided by your GPU driver. On windows it is called vulkan-1.dll (libvulkan.so.1 on linux) and should be in your system folder.
Khronos says that the headers provided with a loader and/or driver should be sufficient. I did not find them on my machine so I just got them from the Khronos Registry vulkan-docs repo:

git clone https://github.com/KhronosGroup/Vulkan-Docs.git

I found myself also needing the following repo:

git clone https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers.git

We will need this one later. But for now, just copy vulkan.h and vk_platform.h to your application folder. (if you are following along with the git repo, I added these headers to this commit.)

We include Vulkan.h and start loading the API functions we need. We will be dynamically loading Vulkan functions and we want to make sure we are using windows platform specific defines. So we will add the following code:

#define VK_USE_PLATFORM_WIN32_KHR
#define VK_NO_PROTOTYPES
#include "vulkan.h"

For every Vulkan function we want to use we first must declare it and load it from the dynamic library. This process is platform dependent. Let us for now create a win32_LoadVulkan() function.
Pay special attention that similar code to the vkCreateInstance() loading code must be added to this function for every Vulkan function we need to call.

PFN_vkCreateInstance vkCreateInstance = NULL;

void win32_LoadVulkan( ) {

    HMODULE vulkan_module = LoadLibrary( "vulkan-1.dll" );
    assert( vulkan_module, "Failed to load vulkan module." );

    vkCreateInstance = (PFN_vkCreateInstance) GetProcAddress( vulkan_module, "vkCreateInstance" );    
    assert( vkCreateInstance, "Failed to load vkCreateInstance function pointer." );
    
}

I have also created the helper function assert() that does what you would expect. This will be our "debugging" facilities! :) (Do feel free to use your preferred version of this function.)

void assert( bool flag, char *msg = "" ) {
							
    if( !flag ) {
        OutputDebugStringA( "ASSERT: " );
        OutputDebugStringA( msg );
        OutputDebugStringA( "\n" );
        int *base = 0;
        *base = 1;
    }
    
}

And that should be all we need that is windows specific. Next we will start talking about Vulkan proper and it's specific quirks.

Creating a Vulkan Instance

[Commit: 52259bb]

A word on Vulkan data structures and their use: filling them follows a generic mechanism, and their main use is to manage functions parameters. Here is an example:

VkApplicationInfo applicationInfo;
applicationInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO; // sType is a member of all structs
applicationInfo.pNext = NULL;                               // as is pNext and flag
applicationInfo.pApplicationName = "First Test";            // The name of our application
applicationInfo.pEngineName = NULL;                         // The name of the engine
applicationInfo.engineVersion = 1;                          // The version of the engine
applicationInfo.apiVersion = VK_MAKE_VERSION(1, 0, 0);      // The version of Vulkan we're using

Now, if we take a look at what the specification has to say about VkApplicationInfo we find out that most of these fields can be zero. In all cases .sType is known (always VK_STRUCTURE_TYPE_<uppercase_structure_name>). While for this tutorial in some points I will try to be explicit about most of the values we use to fill up this data structure, I might be leaving something at 0 because I will always be doing something like this:

VkApplicationInfo applicationInfo = { };   // notice me senpai!
applicationInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
applicationInfo.pApplicationName = "First Test";
applicationInfo.engineVersion = 1;
applicationInfo.apiVersion = VK_MAKE_VERSION(1, 0, 0);

Next, almost all functions will return a VkResult enum. So, let's write a simple helper leveraging our awesome debug facilities:

void checkVulkanResult( VkResult &result, char *msg ) {
    assert( result == VK_SUCCESS, msg );
}

During the creation of the graphics pipeline we will be setting up a whole lot of state and creating/initing a whole lot of "context". To help us keep track of all this Vulkan state, we will create the following:

struct vulkan_context {
								
    uint32_t width;
    uint32_t height;

    VkInstance instance;
}

vulkan_context context;

This context will grow... but for now let's keep marching. You probably noticed that I have sneaked in a thing called an instance into our context. Vulkan keeps no global state at all. Every time Vulkan requires some application state you will need to pass your VkInstance. And this is true for many constructs, including our graphics pipeline. It's just one of the things we need to create, init and keep around. So let's do it.

Because this process will repeat itself for almost all function calls I will be a bit more detailed for this first instance (pun intended!).
So, checking the spec, to create a VkInstance we need to call:

VkResult vkCreateInstance( const VkInstanceCreateInfo* pCreateInfo,
                           const VkAllocationCallbacks* pAllocator, 
                           VkInstance* pInstance);

Quick note about allocators: As a rule of thumb whenever functions asks for a pAllocator you can pass NULL and Vulkan will use the default allocator. Using a custom allocator is not a topic I will be covering in this tutorial. Suffice to notice them and know that Vulkan does allow your application to control the memory allocation of Vulkan.

Now, the process I was talking about is that the function requires you to fill some data structure, generally some Vk*CreateInfo, and pass it to the Vulkan function, in this case vkCreateInstance, which will return the result in its last parameter:

VkInstanceCreateInfo instanceInfo = { };
instanceInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
instanceInfo.pApplicationInfo = &applicationInfo;
instanceInfo.enabledLayerCount = 0;
instanceInfo.ppEnabledLayerNames = NULL;
instanceInfo.enabledExtensionCount = 0;
instanceInfo.ppEnabledExtensionNames = NULL;

result = vkCreateInstance( &instanceInfo, NULL, &context.instance );
checkVulkanResult( result, "Failed to create vulkan instance." );

You can compile and run this code but nothing new will happen... We need to fill the instance info with some validation layer we might want to be using and with the extensions we will be requiring to be present so that we can do something more interesting than a white window...

Validation Layers

[Commit: eb1cf65]

One of the core principles of Vulkan is efficiency. The counter part to this is that all validation and error checking is basically non existant! Vulkan will indeed crash and/or have undefined behavior if you make a mistake. This is all fine, but while developing our application we might want to know why our application is not showing what we expect or, when crashed, exactly why it crashed.

Enter Validation Layers.

Vulkan is a layered API. There is a core layer that we are calling into, but inbetween the API calls and the loader other "layers" can intercept the API calls. The ones we are interested in here are the validation layers that will help us debug and track problems with our usage of the API.
You want to develop your application with this layers on but when shipping you should disable them.

To find out the layers our loader knows about we need to call:

uint32_t layerCount = 0;
vkEnumerateInstanceLayerProperties( &layerCount, NULL );

assert( layerCount != 0, "Failed to find any layer in your system." );

VkLayerProperties *layersAvailable = new VkLayerProperties[layerCount];
vkEnumerateInstanceLayerProperties( &layerCount, layersAvailable );

(Don't forget to add the declaration at the top and the loading of the vkEnumerateInstanceLayerProperties to the win32_LoadVulkan() function.)

This is another repeating mechanism. We call the function twice. First time we pass a NULL as the parameter to the VkLayerProperties to query the layer count. Next we allocate the necessary space to hold that amount of elements and we call the function a second time to fill our data structures.

If you run this piece of code you will notice that you might have found no layer... This is because, at leat on my system, the loader could not find any layer. To get some validation layers we need the SDK and/or to compile the code in Vulkan-LoaderAndValidationLayers.git.

What I found out during the process of trying to figure out if you needed the SDK or not is that you only need the *.json and the *.dll of the layer you want somewhere on your project folder and then you can setup the VK_LAYER_PATH environment variable to the path to the folder with those files. I kinda prefer this solution over the more obscure way where the SDK sets up layer information in the windows registry key HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Khronos\Vulkan\ExplicitLayers because this way you can better control which ones are loaded by your application. (I do wonder about security problems this might raise?)

The layer we will be using is called VK_LAYER_LUNARG_standard_validation. This layer works as a kind of super set of a bunch of other layers. [this one comes from the SDK]. So, I will assume you have either installed the SDK or you have moved all the VkLayer_*.dll and the VkLayer_*.json files from the ones you want to use to a layers folder and set VK_LAYER_PATH=/path/to/layers/folder.

We can now complete this validation layer section by making sure we found the VK_LAYER_LUNARG_standard_validation layer and configure the instance with this info:

bool foundValidation = false;
for( int i = 0; i < layerCount; ++i ) {
   if( strcmp( layersAvailable[i].layerName, "VK_LAYER_LUNARG_standard_validation" ) == 0 ) {
        foundValidation = true;
   }
}
assert( foundValidation, "Could not find validation layer." );
const char *layers[] = { "VK_LAYER_LUNARG_standard_validation" };
// update the VkInstanceCreateInfo with:
instanceInfo.enabledLayerCount = 1;
instanceInfo.ppEnabledLayerNames = layers;

The sad thing is, this commit will still produce the same result as before. We need to handle the extensions to start producing some debug info.

Extensions

[Commit: 9c416b3]

Much like in OpenGL and other APIs, extensions can add new functionality to Vulkan that are not part of the core API.
To start debugging our application we need the VK_EXT_debug_report extension. The following code is similar to the layers loading code, the notable difference being that we are looking for 3 specific extensions. I will sneak in two other extensions that we will need later, so don't worry about them for now.

uint32_t extensionCount = 0;
vkEnumerateInstanceExtensionProperties( NULL, &extensionCount, NULL );
VkExtensionProperties *extensionsAvailable = new VkExtensionProperties[extensionCount];
vkEnumerateInstanceExtensionProperties( NULL, &extensionCount, extensionsAvailable );

const char *extensions[] = { "VK_KHR_surface", "VK_KHR_win32_surface", "VK_EXT_debug_report" };
uint32_t numberRequiredExtensions = sizeof(extensions) / sizeof(char*);
uint32_t foundExtensions = 0;
for( uint32_t i = 0; i < extensionCount; ++i ) {
    for( int j = 0; j < numberRequiredExtensions; ++j ) {
        if( strcmp( extensionsAvailable[i].extensionName, extensions[j] ) == 0 ) {
            foundExtensions++;
        }
    }
}
assert( foundExtensions == numberRequiredExtensions, "Could not find debug extension" );

This extension adds three new functions: vkCreateDebugReportCallbackEXT(), vkDestroyDebugReportCallbackEXT(), and vkDebugReportMessageEXT().
Because this functions are not part of the core Vulkan API, we can not load them the same way we have been loading other functions. We need to use vkGetInstanceProcAddr(). Once we add that function to our win32_LoadVulkan() we can define another helper function that should look familiar:

PFN_vkCreateDebugReportCallbackEXT vkCreateDebugReportCallbackEXT = NULL;
PFN_vkDestroyDebugReportCallbackEXT vkDestroyDebugReportCallbackEXT = NULL;
PFN_vkDebugReportMessageEXT vkDebugReportMessageEXT = NULL;

void win32_LoadVulkanExtensions( vulkan_context &context ) {

    *(void **)&vkCreateDebugReportCallbackEXT = vkGetInstanceProcAddr( context.instance, 
                                                "vkCreateDebugReportCallbackEXT" );
    *(void **)&vkDestroyDebugReportCallbackEXT = vkGetInstanceProcAddr( context.instance, 
                                                "vkDestroyDebugReportCallbackEXT" );
    *(void **)&vkDebugReportMessageEXT = vkGetInstanceProcAddr( context.instance, 
                                                "vkDebugReportMessageEXT" );
}

The extension expects us to provide a callback where all debugging info will be provided. Here is our callback:

VKAPI_ATTR VkBool32 VKAPI_CALL MyDebugReportCallback( VkDebugReportFlagsEXT flags, 
    VkDebugReportObjectTypeEXT objectType, uint64_t object, size_t location, 
    int32_t messageCode, const char* pLayerPrefix, const char* pMessage, void* pUserData ) {

    OutputDebugStringA( pLayerPrefix );
    OutputDebugStringA( " " );
    OutputDebugStringA( pMessage );
    OutputDebugStringA( "\n" );
    return VK_FALSE;
}

Nothing fancy as we only need to know the layer the message is coming from and the message itself.
I have not yet talked about this, but I normally debug with Visual Studio. I told you I don't use the IDE but for debugging there really is no alternative. What I do is I just start a debugging session with devenv .\build\main.exe. You might need to load the main.cpp and then you are set to start setting breakpoints, watchs, etc...

The only thing missing is to add the call to load our Vulkan extension functions, registering our callback, and destroying it at the end of the app:
(Notice that we can control the kind of reporting we want with the callbackCreateInfo.flags and that we added a VkDebugReportCallbackEXT member to our vulkan_context structure.)

win32_LoadVulkanExtensions( context );
	                        
VkDebugReportCallbackCreateInfoEXT callbackCreateInfo = { };
callbackCreateInfo.sType = VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT;
callbackCreateInfo.flags =  VK_DEBUG_REPORT_ERROR_BIT_EXT |
                            VK_DEBUG_REPORT_WARNING_BIT_EXT |
                            VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT;
callbackCreateInfo.pfnCallback = &MyDebugReportCallback;
callbackCreateInfo.pUserData = NULL;

result = vkCreateDebugReportCallbackEXT( context.instance, &callbackCreateInfo, 
                                         NULL, &context.callback );
checkVulkanResult( result, "Failed to create degub report callback." );

When finished we can clean up with:

vkDestroyDebugReportCallbackEXT( context.instance, context.callback, NULL );

So, we are now ready to start creating our rendering surfaces, but for that I need to explain those two extra extensions.

Devices

[Commit: b5d2444]

We have everything in place to start setting up our windows rendering backend. Now we need to create a rendering surface and to find out which physical devices of our machine support this rendering surface. Therefore we use those two extra extensions we sneak in on our instance creation: VK_KHR_surface and VK_KHR_win32_surface. The VK_KHR_surface extension should be present in all systems as it abstracts each platform way of showing a native window/surface. Then we have another extension that is responsible for creation the VkSurface on a particular system. For windows this is the VK_KHR_win32_surface.

Before that though, a word about physical and logical devices, and queues. A physical device represents one single GPU on your system. You can have several on your system. A logical device is how the application keeps track of it's use of the physical device. Each physical device defines the number and type of queues it supports. (Think compute and graphics queues). What we need to do is to enumerate the physical devices in our system and pick the one we want to use. In this tutorial we will just pick the first one that we find that has a graphics queue and that can present our renderings... if we can not find any, we fail miserably!

We start by creating a surface for our rendering that is connected to the window we created: (Notice that vkCreateWin32SurfaceKHR() is an instance function provided by the VK_KHR_win32_surface extension. You must add it to the win32_LoadVulkanExtensions())

VkWin32SurfaceCreateInfoKHR surfaceCreateInfo = {};
surfaceCreateInfo.sType = VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR;
surfaceCreateInfo.hinstance = hInstance;
surfaceCreateInfo.hwnd = windowHandle;

result = vkCreateWin32SurfaceKHR( context.instance, &surfaceCreateInfo, NULL, &context.surface );
checkVulkanResult( result, "Could not create surface." );

Next, we need to iterate over all physical devices and find the one that supports rendering to this surface and has a graphics queue:

uint32_t physicalDeviceCount = 0;
vkEnumeratePhysicalDevices( context.instance, &physicalDeviceCount, NULL );
VkPhysicalDevice *physicalDevices = new VkPhysicalDevice[physicalDeviceCount];
vkEnumeratePhysicalDevices( context.instance, &physicalDeviceCount, physicalDevices );
    
for( uint32_t i = 0; i < physicalDeviceCount; ++i ) {
        
    VkPhysicalDeviceProperties deviceProperties = {};
    vkGetPhysicalDeviceProperties( physicalDevices[i], &deviceProperties );

    uint32_t queueFamilyCount = 0;
    vkGetPhysicalDeviceQueueFamilyProperties( physicalDevices[i], &queueFamilyCount, NULL );
    VkQueueFamilyProperties *queueFamilyProperties = new VkQueueFamilyProperties[queueFamilyCount];
    vkGetPhysicalDeviceQueueFamilyProperties( physicalDevices[i], 
                                              &queueFamilyCount, 
                                              queueFamilyProperties );

    for( uint32_t j = 0; j < queueFamilyCount; ++j ) {

        VkBool32 supportsPresent;
        vkGetPhysicalDeviceSurfaceSupportKHR( physicalDevices[i], j, context.surface, 
                                              &supportsPresent );

        if( supportsPresent && ( queueFamilyProperties[j].queueFlags & VK_QUEUE_GRAPHICS_BIT ) ) {
            context.physicalDevice = physicalDevices[i];
            context.physicalDeviceProperties = deviceProperties;
            context.presentQueueIdx = j;
            break;
        }
    }
    delete[] queueFamilyProperties;

    if( context.physicalDevice ) {
        break;
    }   
}
delete[] physicalDevices;
    
assert( context.physicalDevice, "No physical device detected that can render and present!" );

That is a lot of code, but for most of it we have seen something similar already. First, there are a lot of new functions that you need to load dynamically (check the repo code) and our vulkan_context gained some new members. Of notice is that we now know the queue index on the physical device where we can submit some rendering work.

What is missing is to create the logical device i.e., our connection to the physical device. I will again sneak in something we will be using for the next step: the VK_KHR_swapchain device extension:

// info for accessing one of the devices rendering queues:
VkDeviceQueueCreateInfo queueCreateInfo = {};
queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queueCreateInfo.queueFamilyIndex = context.presentQueueIdx;
queueCreateInfo.queueCount = 1;
float queuePriorities[] = { 1.0f };   // ask for highest priority for our queue. (range [0,1])
queueCreateInfo.pQueuePriorities = queuePriorities;

VkDeviceCreateInfo deviceInfo = {};
deviceInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
deviceInfo.queueCreateInfoCount = 1;
deviceInfo.pQueueCreateInfos = &queueCreateInfo;
deviceInfo.enabledLayerCount = 1;
deviceInfo.ppEnabledLayerNames = layers;
    
const char *deviceExtensions[] = { "VK_KHR_swapchain" };
deviceInfo.enabledExtensionCount = 1;
deviceInfo.ppEnabledExtensionNames = deviceExtensions;

VkPhysicalDeviceFeatures features = {};
features.shaderClipDistance = VK_TRUE;
deviceInfo.pEnabledFeatures = &features;

result = vkCreateDevice( context.physicalDevice, &deviceInfo, NULL, &context.device );
checkVulkanResult( result, "Failed to create logical device!" );

Don't forget to remove the layers information when you stop debugging your application. VkPhysicalDeviceFeatures gives us access to fine-grained optional specification features that our implementation may support. They are enabled per-feature. You can check the spec for a list of members. Our shader will require this one particular feature to be enabled. Without it our pipeline does not work properly. (By the way, I got this information out of the validation layers. So they are useful!) Next we will create our swap chain which will finally enable us to put something on the screen.

Swap Chain

[Commit: 3f07df7]

Now that we have the surface we need to get a handle of the image buffers we will be writing to. We use the swap chain extension to do this. On creation we pass the number of buffers we want (think single/double/n buffered), the resolution, color formats and color space, and the presentation mode. There is a significant amount of setup until we can create a swap chain, but there is nothing hard to understand.

We start by figuring out what color format and color space we will be using:

uint32_t formatCount = 0;
vkGetPhysicalDeviceSurfaceFormatsKHR( context.physicalDevice, context.surface, 
                                      &formatCount, NULL );
VkSurfaceFormatKHR *surfaceFormats = new VkSurfaceFormatKHR[formatCount];
vkGetPhysicalDeviceSurfaceFormatsKHR( context.physicalDevice, context.surface, 
                                      &formatCount, surfaceFormats );

// If the format list includes just one entry of VK_FORMAT_UNDEFINED, the surface has
// no preferred format. Otherwise, at least one supported format will be returned.
VkFormat colorFormat;
if( formatCount == 1 && surfaceFormats[0].format == VK_FORMAT_UNDEFINED ) {
    colorFormat = VK_FORMAT_B8G8R8_UNORM;
} else {
    colorFormat = surfaceFormats[0].format;
}
VkColorSpaceKHR colorSpace;
colorSpace = surfaceFormats[0].colorSpace;
delete[] surfaceFormats;

Next we need to check the surface capabilities to figure out the number of buffers we can ask for, the resolution we will be using. Also we need to decide if we will be applying some surface transformation (like rotating 90 degrees... we are not). We must make sure that the resolution we ask for the swap chain matches the surfaceCapabilities.currentExtent. In the case where both width and height are -1 (and they are both not -1 otherwise!) it means the surface size is undefined and can effectively be set to any value. However, if the size is set, the swap chain size MUST match!

VkSurfaceCapabilitiesKHR surfaceCapabilities = {};
vkGetPhysicalDeviceSurfaceCapabilitiesKHR( context.physicalDevice, context.surface, 
                                           &surfaceCapabilities );

// we are effectively looking for double-buffering:
// if surfaceCapabilities.maxImageCount == 0 there is actually no limit on the number of images! 
uint32_t desiredImageCount = 2;
if( desiredImageCount < surfaceCapabilities.minImageCount ) {
    desiredImageCount = surfaceCapabilities.minImageCount;
} else if( surfaceCapabilities.maxImageCount != 0 && 
           desiredImageCount > surfaceCapabilities.maxImageCount ) {
    desiredImageCount = surfaceCapabilities.maxImageCount;
}

VkExtent2D surfaceResolution =  surfaceCapabilities.currentExtent;
if( surfaceResolution.width == -1 ) {
    surfaceResolution.width = context.width;
    surfaceResolution.height = context.height;
} else {
    context.width = surfaceResolution.width;
    context.height = surfaceResolution.height;
}

VkSurfaceTransformFlagBitsKHR preTransform = surfaceCapabilities.currentTransform;
if( surfaceCapabilities.supportedTransforms & VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR ) {
    preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
}

For the presentation mode we have some options. VK_PRESENT_MODE_MAILBOX_KHR maintains a single entry queue for presentation, where it removes an entry at every vertical sync if the queue is not empty. But, when a frame is committed it obviously replaces the previous. So, in a sense it does not vertically synchronise because a frame might not be displayed at all if a newer one was generated in-between syncs nor does it screen-tears. This is our preferred presentation mode if supported for it is the lowest latency non-tearing presentation mode. VK_PRESENT_MODE_IMMEDIATE_KHR does not vertical synchronise and will screen-tear if a frame is late. VK_PRESENT_MODE_FIFO_RELAXED_KHR keeps a queue and will v-sync but will screen-tear if a frame is late. VK_PRESENT_MODE_FIFO_KHR is similar to the previous one but it won't screen-tear. This is the only present mode that is required by the spec to be supported and as such it is our default value:

uint32_t presentModeCount = 0;
vkGetPhysicalDeviceSurfacePresentModesKHR( context.physicalDevice, context.surface, 
                                           &presentModeCount, NULL );
VkPresentModeKHR *presentModes = new VkPresentModeKHR[presentModeCount];
vkGetPhysicalDeviceSurfacePresentModesKHR( context.physicalDevice, context.surface, 
                                           &presentModeCount, presentModes );

VkPresentModeKHR presentationMode = VK_PRESENT_MODE_FIFO_KHR;   // always supported.
for( uint32_t i = 0; i < presentModeCount; ++i ) {
    if( presentModes[i] == VK_PRESENT_MODE_MAILBOX_KHR ) {
        presentationMode = VK_PRESENT_MODE_MAILBOX_KHR;
        break;
    }   
}
delete[] presentModes;

And the only thing missing is putting this all together and create our swap chain:

VkSwapchainCreateInfoKHR swapChainCreateInfo = {};
swapChainCreateInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
swapChainCreateInfo.surface = context.surface;
swapChainCreateInfo.minImageCount = desiredImageCount;
swapChainCreateInfo.imageFormat = colorFormat;
swapChainCreateInfo.imageColorSpace = colorSpace;
swapChainCreateInfo.imageExtent = surfaceResolution;
swapChainCreateInfo.imageArrayLayers = 1;
swapChainCreateInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
swapChainCreateInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;   // <--
swapChainCreateInfo.preTransform = preTransform;
swapChainCreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
swapChainCreateInfo.presentMode = presentationMode;
swapChainCreateInfo.clipped = true;     // If we want clipping outside the extents
                                        // (remember our device features?)

result = vkCreateSwapchainKHR( context.device, &swapChainCreateInfo, NULL, &context.swapChain );
checkVulkanResult( result, "Failed to create swapchain." );

The sharing mode deserves a note. In all the code of this tutorial there is no sharing of work queues or any other resource. Managing multiple work queues and synchronisation of execution is something worth to investigate in another tutorial as this is one of the main benefits of Vulkan over other APIs, like OpenGL.

Our swap chain is now created and ready to use. But, before moving on we need to talk about image layouts which will lead us to talk about memory barriers, semaphores, and fences which are essential constructs of Vulkan that we must use and understand.
The swap chain provides us with the number of VkImages we asked for in desiredImageCount. It has allocated and owns the resources backing this images. A VkImage is created in either VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED layout. To be able to, for example, render to this image, the layout must change to either VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL or VK_IMAGE_LAYOUT_GENERAL layout.

So what are layouts and why are layouts important? The image data is stored in memory in an implementation-dependent way. By knowing the use for a specific memory beforehand and possibly applying limitations to what kind of operations are possible on the data, implementations can make decisions on the way the data is stored that make accesses more performant. Image layout transitions can be costly and require us to synchronise all access by using memory barriers when changing layouts. For example, transitioning from VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR requires us to make sure we are done writing all our color information before the image is moved to the present layout. We accomplish this by calling vkCmdPipelineBarrier(). Here is the function definition:

void vkCmdPipelineBarrier( VkCommandBuffer commandBuffer,
                           VkPipelineStageFlags srcStageMask,
                           VkPipelineStageFlags dstStageMask,
                           VkDependencyFlags dependencyFlags,
                           uint32_t memoryBarrierCount,
                           const VkMemoryBarrier* pMemoryBarriers,
                           uint32_t bufferMemoryBarrierCount,
                           const VkBufferMemoryBarrier* pBufferMemoryBarriers,
                           uint32_t imageMemoryBarrierCount,
                           const VkImageMemoryBarrier* pImageMemoryBarriers);

This one function will allow us to insert in our queues an execution dependency and a set of memory dependencies between commands before and after our barrier in the command buffer. vkCmdPipelineBarrier is part of the set of functions of the form vkCmd* that records work to a command buffer that can later be submitted to our work queues. There is a lot going on in here... first, you must have already realised that command buffers are created asynchronously and that you must take care that at processing (submit) time your command are processed in the order you intent. We will make a small detour from our swap chain to learn about queues, command buffers and submitting work.

Queues & Command Buffers

[Commit: 6734ea6]

Command buffers are submitted to a work queue. Queues are created at logical device creation. If you look back you will see that we filled up a VkDeviceQueueCreateInfo before we created the logical device. This created our graphics queue where we can commit our rendering commands. Only thing missing is getting the queue's handle and store it in our vulkan_context structure:

vkGetDeviceQueue( context.device, context.presentQueueIdx, 0, &context.presentQueue );

To create command buffers we need to create a command pool. Command pools are opaque objects from where we allocate command buffers. They allow the Vulkan implementation to amortise the cost of resource creation across multiple command buffers.

VkCommandPoolCreateInfo commandPoolCreateInfo = {};
commandPoolCreateInfo.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO;
commandPoolCreateInfo.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT;
commandPoolCreateInfo.queueFamilyIndex = context.presentQueueIdx;

VkCommandPool commandPool;
result = vkCreateCommandPool( context.device, &commandPoolCreateInfo, NULL, &commandPool );
checkVulkanResult( result, "Failed to create command pool." );

Commands allocated from this command pool can be reseted individually (instead of with a entire pool reset) and can only be submitted to our working queue. Finally ready to create a couple command buffers. We will create one for our setup and another one exclusively for our rendering commands:

VkCommandBufferAllocateInfo commandBufferAllocationInfo = {};
commandBufferAllocationInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
commandBufferAllocationInfo.commandPool = commandPool;
commandBufferAllocationInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
commandBufferAllocationInfo.commandBufferCount = 1;

result = vkAllocateCommandBuffers( context.device, &commandBufferAllocationInfo, 
                                   &context.setupCmdBuffer );
checkVulkanResult( result, "Failed to allocate setup command buffer." );

result = vkAllocateCommandBuffers( context.device, &commandBufferAllocationInfo, 
                                   &context.drawCmdBuffer );
checkVulkanResult( result, "Failed to allocate draw command buffer." );

Command buffers start and end recording with:

VkResult vkBeginCommandBuffer( VkCommandBuffer commandBuffer,
                               const VkCommandBufferBeginInfo* pBeginInfo);
                               
VkResult vkEndCommandBuffer( VkCommandBuffer commandBuffer);

In-between these two functions we can call the vkCmd*() class of functions.

Too submit our command buffers we call:

VkResult vkQueueSubmit( VkQueue queue,
                        uint32_t submitCount,
                        const VkSubmitInfo* pSubmits,
                        VkFence fence );

We will take a closer look at both VkCommandBufferBeginInfo and VkSubmitInfo soon in our code to change an image layout, but there is one more very important topic we need to talk about: Synchronisation.
For this tutorial we are only worried about synching our queue submits and between commands within a command buffer. Vulkan provides a set of synchronisation primitives that include Fences, Semaphores and Events. Vulkan also offers Barriers to help with cache control and flow (exactly what we need for the image layout).

We are not using events in this tutorial. Fences and Semaphores are your typical constructs. They can be in "signaled" or "unsignaled" state.
Fences are normally used by the host to determine the completion of execution of submitted work to queues (as you saw it is a parameter of the vkQueueSubmit()). Semaphores can be used to coordinate operations between queues and between internal queue submissions. They are signaled by queues and can be waited on in the same or different queues. We will be using semaphores for our memory barrier.

I will repeat myself, but this is important. To make proper use of Vulkan you need to know about synchronisation. I advice you to read the specification chapter 6. Ok, onwards to the image layout changing!

Image Layouts

[Commit: a85127b]

[Well, this was bound to happen, wasn't it?... Even if the validation layers have nothing to say about it, there is some incorrect usage of the API in this section. We are not allowed to do the memory barrier/change layout on the swap chains before we acquire them! This does not however invalidate this chapter. While I rework it, please check this commit for a better/correct way of doing what we do here. (thanks to ratchet freak for pointing this out!)]

What we will be doing now is grabbing the images that the swap chain created for us and we will move them from the VK_IMAGE_LAYOUT_UNDEFINED that they are initialised as into VK_IMAGE_LAYOUT_PRESENT_SRC_KHR that they need to be in order to present. (At this point we are not yet talking about rendering to them. We will get there... eventually... maybe... don't lose hope! )

At some point we will need to access these images for reading/writing. We can not do that with VkImages. Image objects are not directly accessed by the pipeline. We need to create a VkImageView which represents contiguous ranges of the image as well as additional metadata that allows access to the image data. There is another significant amount of code incoming, so let's start:

uint32_t imageCount = 0;
vkGetSwapchainImagesKHR( context.device, context.swapChain, &imageCount, NULL );
context.presentImages = new VkImage[imageCount];
vkGetSwapchainImagesKHR( context.device, context.swapChain, &imageCount, context.presentImages );

VkImageViewCreateInfo presentImagesViewCreateInfo = {};
presentImagesViewCreateInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
presentImagesViewCreateInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
presentImagesViewCreateInfo.format = colorFormat;
presentImagesViewCreateInfo.components = { VK_COMPONENT_SWIZZLE_R, 
                                           VK_COMPONENT_SWIZZLE_G, 
                                           VK_COMPONENT_SWIZZLE_B, 
                                           VK_COMPONENT_SWIZZLE_A };
presentImagesViewCreateInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
presentImagesViewCreateInfo.subresourceRange.baseMipLevel = 0;
presentImagesViewCreateInfo.subresourceRange.levelCount = 1;
presentImagesViewCreateInfo.subresourceRange.baseArrayLayer = 0;
presentImagesViewCreateInfo.subresourceRange.layerCount = 1;

The first thing we do is get the swap chain images and store them in our context. We will need them later. Next we fill up a reusable structure that you should by now be familiar with. Yes, that is something we will be passing to the function that creates a VkImageView. Still need more init code:

VkCommandBufferBeginInfo beginInfo = {};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;

VkFenceCreateInfo fenceCreateInfo = {};
fenceCreateInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
VkFence submitFence;
vkCreateFence( context.device, &fenceCreateInfo, NULL, &submitFence );

Because we will be recording some commands and submitting them to our queue we need both a VkCommandBufferBeginInfo and a VkFence. Next, we can start looping the present images and changing their layout:

VkImageView *presentImageViews = new VkImageView[imageCount];
for( uint32_t i = 0; i < imageCount; ++i ) {

    // complete VkImageViewCreateInfo with image i:
    presentImagesViewCreateInfo.image = context.presentImages[i];

    // start recording on our setup command buffer:
    vkBeginCommandBuffer( context.setupCmdBuffer, &beginInfo );

    VkImageMemoryBarrier layoutTransitionBarrier = {};
    layoutTransitionBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
    layoutTransitionBarrier.srcAccessMask = 0; 
    layoutTransitionBarrier.dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    layoutTransitionBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    layoutTransitionBarrier.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
    layoutTransitionBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    layoutTransitionBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    layoutTransitionBarrier.image = context.presentImages[i];
    VkImageSubresourceRange resourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
    layoutTransitionBarrier.subresourceRange = resourceRange;

    vkCmdPipelineBarrier(   context.setupCmdBuffer, 
                            VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 
                            VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 
                            0,
                            0, NULL,
                            0, NULL, 
                            1, &layoutTransitionBarrier );

    vkEndCommandBuffer( context.setupCmdBuffer );

    // submitting code to the queue:
    VkPipelineStageFlags waitStageMask[] = { VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
    VkSubmitInfo submitInfo = {};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
    submitInfo.waitSemaphoreCount = 0;
    submitInfo.pWaitSemaphores = NULL;
    submitInfo.pWaitDstStageMask = waitStageMask;
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &context.setupCmdBuffer;
    submitInfo.signalSemaphoreCount = 0;
    submitInfo.pSignalSemaphores = NULL;
    result = vkQueueSubmit( context.presentQueue, 1, &submitInfo, submitFence );

    // waiting for it to finish:
    vkWaitForFences( context.device, 1, &submitFence, VK_TRUE, UINT64_MAX );
    vkResetFences( context.device, 1, &submitFence );

    vkResetCommandBuffer( context.setupCmdBuffer, 0 );

    // create the image view:
    result = vkCreateImageView( context.device, &presentImagesViewCreateInfo, NULL, 
                                &presentImageViews[i] );
    checkVulkanResult( result, "Could not create ImageView." );
}

Don't be scared by the amount of code... this is divided in 3 sections.

The first one is the recording of our pipeline barrier command which changes the image layout from the oldLayout to the newLayout layout. The important parts are the src and dst AccessMask which places a memory access barrier between commands that will operate before and commands that will operate after this vkCmdPipelineBarrier(). Basically we are saying that commands that come after this barrier that need read access to this image memory must wait. In this case there are no other commands, but we will be doing something similar in our render functions where this is not the case!

The second part is the actual submitting of work to the queue with vkQueueSubmit() and then waiting for work to finish by waiting on the fence to be signaled. We pass in the setup command buffer we just finished recording and the fence we will wait on to be signaled.

The last part is the image view creation where we make use of the structure we created outside of the cycle. Note that we could have recorded the two vkCmdPipelineBarrier() with the same command buffer and then commit work just once.

We have covered a lot of ground already but we still have nothing to show for it... Before we go about creating our Framebuffers, we will take it easy for a bit and make the window change color just because.

Rendering Black

[Commit: 0ab9bbf]

We currently have a set of images that we can ping-pong and show in our window. No, they have nothing on them but we can already setup our rendering loop. And that nothing is actually black, which is remarkably different from white! So, we add some platform code and define our borken render function next:

void render( ) {

    uint32_t nextImageIdx;
    vkAcquireNextImageKHR( context.device, context.swapChain, UINT64_MAX,
                           VK_NULL_HANDLE, VK_NULL_HANDLE, &nextImageIdx );  

    VkPresentInfoKHR presentInfo = {};
    presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
    presentInfo.pNext = NULL;
    presentInfo.waitSemaphoreCount = 0;
    presentInfo.pWaitSemaphores = NULL;
    presentInfo.swapchainCount = 1;
    presentInfo.pSwapchains = &context.swapChain;
    presentInfo.pImageIndices = &nextImageIdx;
    presentInfo.pResults = NULL;
    vkQueuePresentKHR( context.presentQueue, &presentInfo );
    
}
// add another case to our WindowProc() switch:
case WM_PAINT: {
    render( );
    break;
}

We call vkAcquireNextImageKHR() to get the next available swap chain image. We ask to block until one is available by passing the UINT64_MAX as the timeout. Once it returns, nextImageIdx has the index of the image we can use for our rendering.

Once we are done and want to present our results, we must call vkQueuePresentKHR() which will queue the rendering of our present image to the surface.

That was easy, wasn't it?! ...well, unfortunately, while we do have a black instead of a white window if you take a look at our validation layers debug output, there is a lot wrong with this code. I did it on purpose as to go back and explain the remaining swap chain interface and do it without the emerging complexity of our render function. Don't worry, we will be fixing all of this problems, but we will need to dive back in... I hope you got enough O2 in.

Depth image buffer

[Commit: eaeda89]

The buffers provided by the swap chains are image buffers. There is no depth buffer created for us by the swap chain. And we do need them to create our framebuffers and ultimately to render. This means we will need to go trough the process of creating image buffers, allocating and binding memory. To do memory handling we need to go back to the physical device creation and get hold on the physical device memory properties:

// Fill up the physical device memory properties: 
vkGetPhysicalDeviceMemoryProperties( context.physicalDevice, &context.memoryProperties );

Now we create a new VkImage that will serve as our depth image buffer:

VkImageCreateInfo imageCreateInfo = {};
imageCreateInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
imageCreateInfo.imageType = VK_IMAGE_TYPE_2D;
imageCreateInfo.format = VK_FORMAT_D16_UNORM;                          // notice me senpai!
imageCreateInfo.extent = { context.width, context.height, 1 };
imageCreateInfo.mipLevels = 1;
imageCreateInfo.arrayLayers = 1;
imageCreateInfo.samples = VK_SAMPLE_COUNT_1_BIT;                       // notice me senpai!
imageCreateInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
imageCreateInfo.usage = VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;   // notice me senpai!
imageCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
imageCreateInfo.queueFamilyIndexCount = 0;
imageCreateInfo.pQueueFamilyIndices = NULL;
imageCreateInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;             // notice me senpai!

result = vkCreateImage( context.device, &imageCreateInfo, NULL, &context.depthImage );
checkVulkanResult( result, "Failed to create depth image." );

One would think this was it right? Nope. This does not allocate nor does it bind any memory to this resource. We must allocate ourselves some memory on the device and then bind it to this image. The thing is we must look in the physical device memory properties for the heap index that matches our requirements. We are asking for memory local to the device. VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT After we have that info, allocating and binding the memory to the resource is straightforward:

VkMemoryRequirements memoryRequirements = {};
vkGetImageMemoryRequirements( context.device, context.depthImage, &memoryRequirements );

VkMemoryAllocateInfo imageAllocateInfo = {};
imageAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
imageAllocateInfo.allocationSize = memoryRequirements.size;

// memoryTypeBits is a bitfield where if bit i is set, it means that 
// the VkMemoryType i of the VkPhysicalDeviceMemoryProperties structure 
// satisfies the memory requirements:
uint32_t memoryTypeBits = memoryRequirements.memoryTypeBits;
VkMemoryPropertyFlags desiredMemoryFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
for( uint32_t i = 0; i < 32; ++i ) {
    VkMemoryType memoryType = context.memoryProperties.memoryTypes[i];
    if( memoryTypeBits & 1 ) {
        if( ( memoryType.propertyFlags & desiredMemoryFlags ) == desiredMemoryFlags ) {
            imageAllocateInfo.memoryTypeIndex = i;
            break;
        }
    }
    memoryTypeBits = memoryTypeBits >> 1;
}

VkDeviceMemory imageMemory = {};
result = vkAllocateMemory( context.device, &imageAllocateInfo, NULL, &imageMemory );
checkVulkanResult( result, "Failed to allocate device memory." );

result = vkBindImageMemory( context.device, context.depthImage, imageMemory, 0 );
checkVulkanResult( result, "Failed to bind image memory." );

This image was created in the VK_IMAGE_LAYOUT_UNDEFINED layout. We need to change it's layout to VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL. You have already seen similar code, but there are some differences related to handling of a depth buffer instead of a color buffer:

VkCommandBufferBeginInfo beginInfo = {};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;

vkBeginCommandBuffer( context.setupCmdBuffer, &beginInfo );

VkImageMemoryBarrier layoutTransitionBarrier = {};
layoutTransitionBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
layoutTransitionBarrier.srcAccessMask = 0;
layoutTransitionBarrier.dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | 
                                        VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
layoutTransitionBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
layoutTransitionBarrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
layoutTransitionBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
layoutTransitionBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
layoutTransitionBarrier.image = context.depthImage;
VkImageSubresourceRange resourceRange = { VK_IMAGE_ASPECT_DEPTH_BIT, 0, 1, 0, 1 };
layoutTransitionBarrier.subresourceRange = resourceRange;

vkCmdPipelineBarrier(   context.setupCmdBuffer, 
                        VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 
                        VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 
                        0,
                        0, NULL,
                        0, NULL, 
                        1, &layoutTransitionBarrier );

vkEndCommandBuffer( context.setupCmdBuffer );

VkPipelineStageFlags waitStageMask[] = { VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
VkSubmitInfo submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.waitSemaphoreCount = 0;
submitInfo.pWaitSemaphores = NULL;
submitInfo.pWaitDstStageMask = waitStageMask;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &context.setupCmdBuffer;
submitInfo.signalSemaphoreCount = 0;
submitInfo.pSignalSemaphores = NULL;
result = vkQueueSubmit( context.presentQueue, 1, &submitInfo, submitFence );

vkWaitForFences( context.device, 1, &submitFence, VK_TRUE, UINT64_MAX );
vkResetFences( context.device, 1, &submitFence );
vkResetCommandBuffer( context.setupCmdBuffer, 0 );

And we are practically done. Only missing the VkImageView and we have ourselves the depth buffer initialised and ready to use:

VkImageAspectFlags aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;
VkImageViewCreateInfo imageViewCreateInfo = {};
imageViewCreateInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
imageViewCreateInfo.image = context.depthImage;
imageViewCreateInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
imageViewCreateInfo.format = imageCreateInfo.format;
imageViewCreateInfo.components = { VK_COMPONENT_SWIZZLE_IDENTITY, VK_COMPONENT_SWIZZLE_IDENTITY, 
                                   VK_COMPONENT_SWIZZLE_IDENTITY, VK_COMPONENT_SWIZZLE_IDENTITY };
imageViewCreateInfo.subresourceRange.aspectMask = aspectMask;
imageViewCreateInfo.subresourceRange.baseMipLevel = 0;
imageViewCreateInfo.subresourceRange.levelCount = 1;
imageViewCreateInfo.subresourceRange.baseArrayLayer = 0;
imageViewCreateInfo.subresourceRange.layerCount = 1;

result = vkCreateImageView( context.device, &imageViewCreateInfo, NULL, &context.depthImageView );
checkVulkanResult( result, "Failed to create image view." );

Render Pass & Framebuffers

[Commit: 07dea10]

And we start setting up our rendering pipeline now. The pipeline will glue everything together. But we must first create a render pass and by consequence create all our framebuffers. A render pass binds the attachments, subpasses and the dependencies between subpasses. We will be creating a render pass with one single subpass and with two attachments, one for our color buffer and one for our depth buffer. Things can get complex when setting up renderpasses where a subpass renders into an attachment that will be the input of another subpass... but we will not get that far. So, let's first create our attachment info:

VkAttachmentDescription passAttachments[2] = { };
passAttachments[0].format = colorFormat;
passAttachments[0].samples = VK_SAMPLE_COUNT_1_BIT;
passAttachments[0].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
passAttachments[0].storeOp = VK_ATTACHMENT_STORE_OP_STORE;
passAttachments[0].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
passAttachments[0].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
passAttachments[0].initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
passAttachments[0].finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

passAttachments[1].format = VK_FORMAT_D16_UNORM;
passAttachments[1].samples = VK_SAMPLE_COUNT_1_BIT;
passAttachments[1].loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
passAttachments[1].storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
passAttachments[1].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
passAttachments[1].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
passAttachments[1].initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
passAttachments[1].finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

VkAttachmentReference colorAttachmentReference = {};
colorAttachmentReference.attachment = 0;
colorAttachmentReference.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

VkAttachmentReference depthAttachmentReference = {};
depthAttachmentReference.attachment = 1;
depthAttachmentReference.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

Next, we create a VkRenderPass with a single subpass that uses our two attachments. There is a bit more going on in here (descriptors and all that fun!) than we care about for this tutorial, and as such this is all we need for now:

VkSubpassDescription subpass = {};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentReference;
subpass.pDepthStencilAttachment = &depthAttachmentReference;

VkRenderPassCreateInfo renderPassCreateInfo = {};
renderPassCreateInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassCreateInfo.attachmentCount = 2;
renderPassCreateInfo.pAttachments = passAttachments;
renderPassCreateInfo.subpassCount = 1;
renderPassCreateInfo.pSubpasses = &subpass;

result = vkCreateRenderPass( context.device, &renderPassCreateInfo, NULL, &context.renderPass );
checkVulkanResult( result, "Failed to create renderpass" );

That is it for the render pass. The render pass object basically defines what kind of framebuffers and pipelines we can create and that is why we created it first. We can now create the framebuffers that are compatible with this render pass:

VkImageView frameBufferAttachments[2];
frameBufferAttachments[1] = context.depthImageView;

VkFramebufferCreateInfo frameBufferCreateInfo = {};
frameBufferCreateInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
frameBufferCreateInfo.renderPass = context.renderPass;
frameBufferCreateInfo.attachmentCount = 2;  // must be equal to the attachment count on render pass
frameBufferCreateInfo.pAttachments = frameBufferAttachments;
frameBufferCreateInfo.width = context.width;
frameBufferCreateInfo.height = context.height;
frameBufferCreateInfo.layers = 1;

// create a framebuffer per swap chain imageView:
context.frameBuffers = new VkFramebuffer[ imageCount ];
for( uint32_t i = 0; i < imageCount; ++i ) {
    frameBufferAttachments[0] = presentImageViews[ i ];
    result = vkCreateFramebuffer( context.device, &frameBufferCreateInfo, 
                                  NULL, &context.frameBuffers[i] );
    checkVulkanResult( result, "Failed to create framebuffer.");
}

Notice that we create 2 framebuffers but always use the same depth buffer for both while we set the color attachment to the swap chain present images.

Vertex Buffer

[Commit: 8e2efed]

Time to define our vertex info. Our goal is to render a triangle, so we need to somehow define our vertex info, allocate sufficient memory for 3 vertices, and upload them to some kinda buffer. Let's start with defining a simple struct and creating a buffer for 3 vertices:

struct vertex {
    float x, y, z, w;
};

// create our vertex buffer:
VkBufferCreateInfo vertexInputBufferInfo = {};
vertexInputBufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
vertexInputBufferInfo.size = sizeof(vertex) * 3; // size in Bytes
vertexInputBufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT;
vertexInputBufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

result = vkCreateBuffer( context.device, &vertexInputBufferInfo, NULL, 
                         &context.vertexInputBuffer );  
checkVulkanResult( result, "Failed to create vertex input buffer." );

Buffer is created. Like we did for the VkImage, we need to do something similar to allocate memory for this VkBuffer. Difference being that we now actually need memory on an heap that we can write to from the host: (VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)

VkMemoryRequirements vertexBufferMemoryRequirements = {};
vkGetBufferMemoryRequirements( context.device, context.vertexInputBuffer, 
                               &vertexBufferMemoryRequirements );

VkMemoryAllocateInfo bufferAllocateInfo = {};
bufferAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
bufferAllocateInfo.allocationSize = vertexBufferMemoryRequirements.size;

uint32_t vertexMemoryTypeBits = vertexBufferMemoryRequirements.memoryTypeBits;
VkMemoryPropertyFlags vertexDesiredMemoryFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT;
for( uint32_t i = 0; i < 32; ++i ) {
    VkMemoryType memoryType = context.memoryProperties.memoryTypes[i];
    if( vertexMemoryTypeBits & 1 ) {
        if( ( memoryType.propertyFlags & vertexDesiredMemoryFlags ) == vertexDesiredMemoryFlags ) {
            bufferAllocateInfo.memoryTypeIndex = i;
            break;
        }
    }
    vertexMemoryTypeBits = vertexMemoryTypeBits >> 1;
}

VkDeviceMemory vertexBufferMemory;
result = vkAllocateMemory( context.device, &bufferAllocateInfo, NULL, &vertexBufferMemory );
checkVulkanResult( result, "Failed to allocate buffer memory." );

Even if we ask for host accessible memory, this memory is not directly accessible by the host. What it does is to create a mappable memory. To be able to write to this memory we must first retrieve a host virtual address pointer to a mappable memory object by calling vkMapMemory() So lets us map this memory so we can write to it and bind it:

void *mapped;
result = vkMapMemory( context.device, vertexBufferMemory, 0, VK_WHOLE_SIZE, 0, &mapped );
checkVulkanResult( result, "Failed to map buffer memory." );

vertex *triangle = (vertex *) mapped;
vertex v1 = { -1.0f, -1.0f, 0, 1.0f };
vertex v2 = {  1.0f, -1.0f, 0, 1.0f };
vertex v3 = {  0.0f,  1.0f, 0, 1.0f };
triangle[0] = v1;
triangle[1] = v2;
triangle[2] = v3;

vkUnmapMemory( context.device, vertexBufferMemory );

result = vkBindBufferMemory( context.device, context.vertexInputBuffer, vertexBufferMemory, 0 );
checkVulkanResult( result, "Failed to bind buffer memory." );

There you go. One triangle set to go through our pipeline. Thing is, we don't have a pipeline, do we mate? We are almost there.. we just need to talk about shaders!

Shaders

[Commit: d2cf6be]

Our goal is to setup a simple vertex and fragment shader. Vulkan expects the shader code to be in SPIR-V format but that is not such a big problem because we can use some freely available tools to convert our GLSL shaders to SPIR-V shaders: glslangValidator. You can get access to the git repo here:

git clone https://github.com/KhronosGroup/glslang

So, for example, if for our simple.vert vertex shader we have the following code:

#version 400
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable

layout (location = 0) in vec4 pos;

void main() {
    gl_Position = pos;
}

we can call:

glslangValidator -V simple.vert

and this will create a vert.spv in the same folder. Neat, right?

And the same for our simple.frag fragment shader:

#version 400
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable

layout (location = 0) out vec4 uFragColor;

void main() {
    uFragColor = vec4( 0.0, 0.5, 1.0, 1.0 );
}
glslangValidator -V simple.frag

And we end up with our frag.spv.

Keeping to our principle of showing all the code in place, to load it up to Vulkan we can go and do the following:

uint32_t codeSize;
char *code = new char[10000];
HANDLE fileHandle = 0;

// load our vertex shader:
fileHandle = CreateFile( "..\\vert.spv", GENERIC_READ, 0, NULL, 
                         OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
if( fileHandle == INVALID_HANDLE_VALUE ) {
    OutputDebugStringA( "Failed to open shader file." );
    exit(1);
}
ReadFile( (HANDLE)fileHandle, code, 10000, (LPDWORD)&codeSize, 0 );
CloseHandle( fileHandle );

VkShaderModuleCreateInfo vertexShaderCreationInfo = {};
vertexShaderCreationInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
vertexShaderCreationInfo.codeSize = codeSize;
vertexShaderCreationInfo.pCode = (uint32_t *)code;

VkShaderModule vertexShaderModule;
result = vkCreateShaderModule( context.device, &vertexShaderCreationInfo, NULL, &vertexShaderModule );
checkVulkanResult( result, "Failed to create vertex shader module." );

// load our fragment shader:
fileHandle = CreateFile( "..\\frag.spv", GENERIC_READ, 0, NULL, 
                         OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
if( fileHandle == INVALID_HANDLE_VALUE ) {
    OutputDebugStringA( "Failed to open shader file." );
    exit(1);
}
ReadFile( (HANDLE)fileHandle, code, 10000, (LPDWORD)&codeSize, 0 );
CloseHandle( fileHandle );

VkShaderModuleCreateInfo fragmentShaderCreationInfo = {};
fragmentShaderCreationInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
fragmentShaderCreationInfo.codeSize = codeSize;
fragmentShaderCreationInfo.pCode = (uint32_t *)code;

VkShaderModule fragmentShaderModule;
result = vkCreateShaderModule( context.device, &fragmentShaderCreationInfo, NULL, &fragmentShaderModule );
checkVulkanResult( result, "Failed to create vertex shader module." );

Notice that we fail miserably if we can not find the shader code and that we expect to find it in the parent folder of where we run. This is fine if you run it from the Visual Studio devenv, but it will simply crash and not report anything if you run from the command line. I suggest you change this to whatever fits you better.

A cursory glance at this code and you should be calling me all kinds of names... I will endure it. I know what you are complaining about but, for the purpose of this tutorial, I don't care. Believe me this is not the code I use in my own internal engines. ;)
Hopefully, after you stop calling me names, you should by now know what you need to do to load your own shaders.

Ok. I think we are finally ready to start setting up our rendering pipeline.

Graphics Pipeline

[Commit: 0baeb96]

A graphics pipeline keeps track of all the state required to render. It is a collection of multiple shader stages, multiple fixed-function pipeline stages and pipeline layout. Everything that we have been creating up to this point is so that we can config the pipeline in one way or another. We need to set everything up front. Remember that Vulkan keeps no state and as such we need to config and store all the state we want/need and we do it by creating a VkPipeline.

As you know, or at least imagine, there is a whole lot of state in a graphics pipeline. From the viewport to the blend functions, from the shader stages, to the bindings... As such, what follows is setting up all this state. (In this instance, we will be leaving out some big parts, like the descriptor sets, bindings, etc...) So, let's start by creating an empty layout:

VkPipelineLayoutCreateInfo layoutCreateInfo = {};
layoutCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
layoutCreateInfo.setLayoutCount = 0;
layoutCreateInfo.pSetLayouts = NULL;    // Not setting any bindings!
layoutCreateInfo.pushConstantRangeCount = 0;
layoutCreateInfo.pPushConstantRanges = NULL;

result = vkCreatePipelineLayout( context.device, &layoutCreateInfo, NULL, 
                                 &context.pipelineLayout );
checkVulkanResult( result, "Failed to create pipeline layout." );

We might return to this stage later so that we can, for example, set a uniform buffer object to pass some uniform values to our shaders, But for this first tutorial empty is fine! Next we setup our shader stages with the shader modules we loaded:

VkPipelineShaderStageCreateInfo shaderStageCreateInfo[2] = {};
shaderStageCreateInfo[0].sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
shaderStageCreateInfo[0].stage = VK_SHADER_STAGE_VERTEX_BIT;
shaderStageCreateInfo[0].module = vertexShaderModule;
shaderStageCreateInfo[0].pName = "main";        // shader entry point function name
shaderStageCreateInfo[0].pSpecializationInfo = NULL;

shaderStageCreateInfo[1].sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
shaderStageCreateInfo[1].stage = VK_SHADER_STAGE_FRAGMENT_BIT;
shaderStageCreateInfo[1].module = fragmentShaderModule;
shaderStageCreateInfo[1].pName = "main";        // shader entry point function name
shaderStageCreateInfo[1].pSpecializationInfo = NULL;

Nothing special going on here. To configure the vertex input handling we follow with:

VkVertexInputBindingDescription vertexBindingDescription = {};
vertexBindingDescription.binding = 0;
vertexBindingDescription.stride = sizeof(vertex);
vertexBindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;

VkVertexInputAttributeDescription vertexAttributeDescritpion = {};
vertexAttributeDescritpion.location = 0;
vertexAttributeDescritpion.binding = 0;
vertexAttributeDescritpion.format = VK_FORMAT_R32G32B32A32_SFLOAT;
vertexAttributeDescritpion.offset = 0;

VkPipelineVertexInputStateCreateInfo vertexInputStateCreateInfo = {};
vertexInputStateCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO;
vertexInputStateCreateInfo.vertexBindingDescriptionCount = 1;
vertexInputStateCreateInfo.pVertexBindingDescriptions = &vertexBindingDescription;
vertexInputStateCreateInfo.vertexAttributeDescriptionCount = 1;
vertexInputStateCreateInfo.pVertexAttributeDescriptions = &vertexAttributeDescritpion;

// vertex topology config:
VkPipelineInputAssemblyStateCreateInfo inputAssemblyStateCreateInfo = {};
inputAssemblyStateCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO;
inputAssemblyStateCreateInfo.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST;
inputAssemblyStateCreateInfo.primitiveRestartEnable = VK_FALSE;

Ok, some explanations required here. In the first part we bind the vertex position (our (x,y,z,w)) to location = 0, binding = 0. And then we are configuring the vertex topology to interpret our vertex buffer as a triangle list.

Next, the viewport and scissors clipping is configured. We will later make this state dynamic so that we can change it per frame.

VkViewport viewport = {};
viewport.x = 0;
viewport.y = 0;
viewport.width = context.width;
viewport.height = context.height;
viewport.minDepth = 0;
viewport.maxDepth = 1;

VkRect2D scissors = {};
scissors.offset = { 0, 0 };
scissors.extent = { context.width, context.height };

VkPipelineViewportStateCreateInfo viewportState = {};
viewportState.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO;
viewportState.viewportCount = 1;
viewportState.pViewports = &viewport;
viewportState.scissorCount = 1;
viewportState.pScissors = &scissors;

Here we can set our rasterization configurations. Most of this are self explanatory:

VkPipelineRasterizationStateCreateInfo rasterizationState = {};
rasterizationState.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO;
rasterizationState.depthClampEnable = VK_FALSE;
rasterizationState.rasterizerDiscardEnable = VK_FALSE;
rasterizationState.polygonMode = VK_POLYGON_MODE_FILL;
rasterizationState.cullMode = VK_CULL_MODE_NONE;
rasterizationState.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE;
rasterizationState.depthBiasEnable = VK_FALSE;
rasterizationState.depthBiasConstantFactor = 0;
rasterizationState.depthBiasClamp = 0;
rasterizationState.depthBiasSlopeFactor = 0;
rasterizationState.lineWidth = 1;

Next, sampling configuration:

VkPipelineMultisampleStateCreateInfo multisampleState = {};
multisampleState.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO;
multisampleState.rasterizationSamples = VK_SAMPLE_COUNT_1_BIT;
multisampleState.sampleShadingEnable = VK_FALSE;
multisampleState.minSampleShading = 0;
multisampleState.pSampleMask = NULL;
multisampleState.alphaToCoverageEnable = VK_FALSE;
multisampleState.alphaToOneEnable = VK_FALSE;

At this stage we enable depth testing and disable stencil:

VkStencilOpState noOPStencilState = {};
noOPStencilState.failOp = VK_STENCIL_OP_KEEP;
noOPStencilState.passOp = VK_STENCIL_OP_KEEP;
noOPStencilState.depthFailOp = VK_STENCIL_OP_KEEP;
noOPStencilState.compareOp = VK_COMPARE_OP_ALWAYS;
noOPStencilState.compareMask = 0;
noOPStencilState.writeMask = 0;
noOPStencilState.reference = 0;

VkPipelineDepthStencilStateCreateInfo depthState = {};
depthState.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO;
depthState.depthTestEnable = VK_TRUE;
depthState.depthWriteEnable = VK_TRUE;
depthState.depthCompareOp = VK_COMPARE_OP_LESS_OR_EQUAL;
depthState.depthBoundsTestEnable = VK_FALSE;
depthState.stencilTestEnable = VK_FALSE;
depthState.front = noOPStencilState;
depthState.back = noOPStencilState;
depthState.minDepthBounds = 0;
depthState.maxDepthBounds = 0;

Color blending, which is disabled for this tutorial, can be configured here:

VkPipelineColorBlendAttachmentState colorBlendAttachmentState = {};
colorBlendAttachmentState.blendEnable = VK_FALSE;
colorBlendAttachmentState.srcColorBlendFactor = VK_BLEND_FACTOR_SRC_COLOR;
colorBlendAttachmentState.dstColorBlendFactor = VK_BLEND_FACTOR_ONE_MINUS_DST_COLOR;
colorBlendAttachmentState.colorBlendOp = VK_BLEND_OP_ADD;
colorBlendAttachmentState.srcAlphaBlendFactor = VK_BLEND_FACTOR_ZERO;
colorBlendAttachmentState.dstAlphaBlendFactor = VK_BLEND_FACTOR_ZERO;
colorBlendAttachmentState.alphaBlendOp = VK_BLEND_OP_ADD;
colorBlendAttachmentState.colorWriteMask = 0xf;

VkPipelineColorBlendStateCreateInfo colorBlendState = {};
colorBlendState.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO;
colorBlendState.logicOpEnable = VK_FALSE;
colorBlendState.logicOp = VK_LOGIC_OP_CLEAR;
colorBlendState.attachmentCount = 1;
colorBlendState.pAttachments = &colorBlendAttachmentState;
colorBlendState.blendConstants[0] = 0.0;
colorBlendState.blendConstants[1] = 0.0;
colorBlendState.blendConstants[2] = 0.0;
colorBlendState.blendConstants[3] = 0.0;

All these configurations are now constant for the entirety of the pipeline's life. We might want to change some of this state per frame, like our viewport/scissors. To make a state dynamic we can:

VkDynamicState dynamicState[2] = { VK_DYNAMIC_STATE_VIEWPORT, VK_DYNAMIC_STATE_SCISSOR };
VkPipelineDynamicStateCreateInfo dynamicStateCreateInfo = {};
dynamicStateCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO;
dynamicStateCreateInfo.dynamicStateCount = 2;
dynamicStateCreateInfo.pDynamicStates = dynamicState;

And finally, we put everything together to create our graphics pipeline:

VkGraphicsPipelineCreateInfo pipelineCreateInfo = {};
pipelineCreateInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
pipelineCreateInfo.stageCount = 2;
pipelineCreateInfo.pStages = shaderStageCreateInfo;
pipelineCreateInfo.pVertexInputState = &vertexInputStateCreateInfo;
pipelineCreateInfo.pInputAssemblyState = &inputAssemblyStateCreateInfo;
pipelineCreateInfo.pTessellationState = NULL;
pipelineCreateInfo.pViewportState = &viewportState;
pipelineCreateInfo.pRasterizationState = &rasterizationState;
pipelineCreateInfo.pMultisampleState = &multisampleState;
pipelineCreateInfo.pDepthStencilState = &depthState;
pipelineCreateInfo.pColorBlendState = &colorBlendState;
pipelineCreateInfo.pDynamicState = &dynamicStateCreateInfo;
pipelineCreateInfo.layout = context.pipelineLayout;
pipelineCreateInfo.renderPass = context.renderPass;
pipelineCreateInfo.subpass = 0;
pipelineCreateInfo.basePipelineHandle = NULL;
pipelineCreateInfo.basePipelineIndex = 0;

result = vkCreateGraphicsPipelines( context.device, VK_NULL_HANDLE, 1, &pipelineCreateInfo, NULL, 
                                   &context.pipeline );
checkVulkanResult( result, "Failed to create graphics pipeline." );

That was a lot of code... but it's just setting state. The good news is that we are now ready to start rendering our triangle. We will update our render method to do just that.

Final Render

[Commit: 5613c5d]

We are FINALLY ready to update our render code to put a blue-ish triangle on the screen. Can you believe it? Well, let me show you how:

void render( ) {

    vkSemaphore presentCompleteSemaphore, renderingCompleteSemaphore;
    VkSemaphoreCreateInfo semaphoreCreateInfo = { VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO, 0, 0 };
    vkCreateSemaphore( context.device, &semaphoreCreateInfo, NULL, &presentCompleteSemaphore );
    vkCreateSemaphore( context.device, &semaphoreCreateInfo, NULL, &renderingCompleteSemaphore );
	
    uint32_t nextImageIdx;
    vkAcquireNextImageKHR(  context.device, context.swapChain, UINT64_MAX,
                            presentCompleteSemaphore, VK_NULL_HANDLE, &nextImageIdx );

First we will need to care about synchronising our render calls, so we create a couple semaphores and update our vkAcquireNextImageKHR() call. We need to change the presentation image from the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout to the VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL layout. We already know how to do this, so here is the code:

    VkCommandBufferBeginInfo beginInfo = {};
    beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
    beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
	
    vkBeginCommandBuffer( context.drawCmdBuffer, &beginInfo );
	
    // change image layout from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
    // to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
    VkImageMemoryBarrier layoutTransitionBarrier = {};
    layoutTransitionBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
    layoutTransitionBarrier.srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    layoutTransitionBarrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | 
                                            VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    layoutTransitionBarrier.oldLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
    layoutTransitionBarrier.newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    layoutTransitionBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    layoutTransitionBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    layoutTransitionBarrier.image = context.presentImages[ nextImageIdx ];
    VkImageSubresourceRange resourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
    layoutTransitionBarrier.subresourceRange = resourceRange;
	
    vkCmdPipelineBarrier(   context.drawCmdBuffer, 
                            VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 
                            VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 
                            0,
                            0, NULL,
                            0, NULL, 
                            1, &layoutTransitionBarrier );

This is code you should by now be familiar with. Next we will activate our render pass:

    VkClearValue clearValue[] = { { 1.0f, 1.0f, 1.0f, 1.0f }, { 1.0, 0.0 } };
    VkRenderPassBeginInfo renderPassBeginInfo = {};
    renderPassBeginInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
    renderPassBeginInfo.renderPass = context.renderPass;
    renderPassBeginInfo.framebuffer = context.frameBuffers[ nextImageIdx ];
    renderPassBeginInfo.renderArea = { 0, 0, context.width, context.height };
    renderPassBeginInfo.clearValueCount = 2;
    renderPassBeginInfo.pClearValues = clearValue;
    vkCmdBeginRenderPass( context.drawCmdBuffer, &renderPassBeginInfo, 
                          VK_SUBPASS_CONTENTS_INLINE );

Nothing special here. Just telling it which framebuffer to use and the clear values to set both for both attachments. Next we bind all our rendering state by binding our graphics pipeline:

    vkCmdBindPipeline( context.drawCmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, context.pipeline );    

    // take care of dynamic state:
    VkViewport viewport = { 0, 0, context.width, context.height, 0, 1 };
    vkCmdSetViewport( context.drawCmdBuffer, 0, 1, &viewport );

    VkRect2D scissor = { 0, 0, context.width, context.height };
    vkCmdSetScissor( context.drawCmdBuffer, 0, 1, &scissor);

Notice how we setup the dynamic state at this stage. Next we render our beautiful triangle by binding our vertex buffer and asking Vulkan to draw one instance of it:

    VkDeviceSize offsets = { };
    vkCmdBindVertexBuffers( context.drawCmdBuffer, 0, 1, &context.vertexInputBuffer, &offsets );

    vkCmdDraw( context.drawCmdBuffer,
               3,   // vertex count
               1,   // instance count
               0,   // first vertex
               0 ); // first instance

    vkCmdEndRenderPass( context.drawCmdBuffer );

We are almost done. Guess what is missing? Right, we need to change from VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR and we need to make sure all rendering work is done before we do that!

    VkImageMemoryBarrier prePresentBarrier = {};
    prePresentBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
    prePresentBarrier.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    prePresentBarrier.dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    prePresentBarrier.oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    prePresentBarrier.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
    prePresentBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    prePresentBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    prePresentBarrier.subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};
    prePresentBarrier.image = context.presentImages[ nextImageIdx ];
    
    vkCmdPipelineBarrier( context.drawCmdBuffer, 
                          VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, 
                          VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 
                          0, 
                          0, NULL, 
                          0, NULL, 
                          1, &prePresentBarrier );

    vkEndCommandBuffer( context.drawCmdBuffer );

And that is it. Only need to submit and we are done:

    VkFence renderFence;
    VkFenceCreateInfo fenceCreateInfo = {};
    fenceCreateInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
    vkCreateFence( context.device, &fenceCreateInfo, NULL, &renderFence );

    VkPipelineStageFlags waitStageMash = { VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT };
    VkSubmitInfo submitInfo = {};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
    submitInfo.waitSemaphoreCount = 1;
    submitInfo.pWaitSemaphores = &presentCompleteSemaphore;
    submitInfo.pWaitDstStageMask = &waitStageMash;
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &context.drawCmdBuffer;
    submitInfo.signalSemaphoreCount = 1;
    submitInfo.pSignalSemaphores = &renderingCompleteSemaphore;
    vkQueueSubmit( context.presentQueue, 1, &submitInfo, renderFence );

    vkWaitForFences( context.device, 1, &renderFence, VK_TRUE, UINT64_MAX );
    vkDestroyFence( context.device, renderFence, NULL );

    VkPresentInfoKHR presentInfo = {};
    presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
    presentInfo.waitSemaphoreCount = 1;
    presentInfo.pWaitSemaphores = &renderingCompleteSemaphore;
    presentInfo.swapchainCount = 1;
    presentInfo.pSwapchains = &context.swapChain;
    presentInfo.pImageIndices = &nextImageIdx;
    presentInfo.pResults = NULL;
    vkQueuePresentKHR( context.presentQueue, &presentInfo );

    vkDestroySemaphore( context.device, presentCompleteSemaphore, NULL );
    vkDestroySemaphore( context.device, renderingCompleteSemaphore, NULL );
}

We made it! We now have a basic skeleton Vulkan application running. Hopefully you could learn enough about Vulkan to figure out how to proceed from here. This is anyway a code repo that I would have liked to have when I started... so maybe this will be helpful to someone else.

I am currently writing another tutorial where I go into more details about the topics I left open. (It includes shader uniforms, texture mapping and basic illumination). So, do check regularly for the new content. I will also post it on my twitter once I finish and publish it here. Feel free to contact me for suggestions and feedback at jhenriques@gmail.com.

Have a nice one, JH.