Friday, March 14, 2025
HomeApp SecurityVulkan 1.4: Sooner app hundreds, much less stutter and fewer Reminiscence Utilization...

Vulkan 1.4: Sooner app hundreds, much less stutter and fewer Reminiscence Utilization | by Shahbaz Youssefi | Android Builders | Dec, 2024


Host Picture Copy is a recreation changer for Android

Vulkan 1.4 was launched not too long ago, and with it comes a big characteristic for Android: Host Picture Copy, primarily based on VK_EXT_host_image_copy.

We now have beforehand written about this extension in this Khronos weblog submit, explaining the technical particulars of utilizing this extension. This extension is especially helpful for Android video games as we’ll see on this submit.

In brief, Host Picture Copy is a Vulkan characteristic that enables the appliance to switch picture knowledge utilizing the CPU as an alternative of the GPU. This characteristic is especially helpful on UMA gadgets (similar to typical Android gadgets), however might place restrictions on photos. Specifically, most drivers disable framebuffer compression for host-copyable photos which are in any other case renderable. Learn on to be taught the place this characteristic actually shines.

To place issues in context, Host Picture Copy is one strategy to asynchronously switch picture knowledge. The opposite is utilizing a devoted switch queue (with VK_QUEUE_TRANSFER_BIT, and with out VK_QUEUE_GRAPHICS_BIT). In Vulkan 1.4, no less than one is required. You may count on that the overwhelming majority of Android gadgets delivery with Vulkan 1.4 will implement Host Picture Copy, and implement it optimally for compressed codecs. That’s, Vulkan requires optimalDeviceAccess to be true for these codecs.

Because it occurs, texture knowledge constitutes the biggest quantity of picture knowledge in typical video games, they usually use compressed codecs!

First, let’s see how Host Picture Copy differs from doing knowledge copies on the GPU, similar to with vkCmdCopyBufferToImage2.

With out Host Picture Copy, the trail from texture knowledge loaded from disk to a picture goes by means of a Vulkan buffer:

  • A Vulkan buffer is allotted, taking on about as a lot reminiscence because the Vulkan picture does.
  • The feel knowledge is copied (within the type of memcpy) to the buffer after mapping it by the CPU.
  • vkCmdCopyBufferToImage2 is recorded within the command buffer that’s later submitted.
  • The feel knowledge is copied to the picture by the GPU.
  • The buffer reminiscence is freed a number of frames later as soon as the appliance is aware of the GPU copy is completed.

Within the above, the feel knowledge is copied twice, and for a number of frames the quantity of reminiscence allotted for the feel knowledge is twice the scale of the picture. There are two additional issues to notice right here:

  • The copy on the CPU is as quick as it could actually get, as a result of it’s successfully memcpy.
  • The copy on the GPU effectively reorders the information to match the bodily structure of the picture (a.okay.a. structure swizzling), but it surely occurs on the graphics queue (assuming no devoted switch queues), interfering with rendering in the identical body.

With Host Picture Copy as an alternative, the copy is finished just by calling vkCopyMemoryToImage. On this case, the CPU does the copy and structure swizzling. This copy is slower than every of the copies above, as a result of the CPU will not be as environment friendly in reordering the information, however:

  • The copy, even when slower, is simply finished as soon as
  • The copy doesn’t intervene with ongoing GPU work
  • There isn’t a further reminiscence allotted for texture knowledge

FYI, the rationale this extension has much less utility on NUMA gadgets, similar to gadgets with devoted GPUs (and devoted reminiscence) is that the CPU might not have entry to your entire GPU reminiscence or entry could also be too gradual, which can restrict the quantity of reminiscence that may very well be used for host-copyable textures, or the copy could also be prohibitively costly. The identicalMemoryTypeRequirements property signifies whether or not Host Picture Copy limits entry to GPU reminiscence or not.

Within the following, two eventualities are offered the place Host Picture Copy can considerably enhance a recreation with the above properties in thoughts.

Eradicating stutter throughout texture knowledge streaming whereas concurrently halving reminiscence utilization sounds too good to be true, however that’s precisely the kind of factor Host Picture Copy allows.

To set the scene: think about an open-world recreation, you might be nearing a brand new space and plenty of new textures must be loaded from persistent storage. You might be cruising at 60 FPS; it will be a disgrace if that drops to twenty FPS or the sport crashes with Out of Reminiscence.

Avoiding such stutters with Host Picture Copy is quite simple.

The appliance can use a CPU thread to stream in texture knowledge straight into new photos utilizing Host Picture Copy. The GPU would proceed to render frames of constant complexity as earlier than, sustaining FPS, and the reminiscence improve is as minimal as it could actually get. Don’t overlook to reminiscence map the feel knowledge file as an alternative of studying right into a CPU buffer first for much more effectivity!

Can we apply the identical technique for when the sport is being loaded within the first place? Positive, use a number of CPU threads to repeat texture knowledge straight into photos. Provided that the CPU copy is slower on account of structure swizzling, load occasions might probably not be any quicker, however no less than the reminiscence utilization is halved!

However Host Picture Copy has a secret manner of constructing this a lot quicker — as quick as memcpy! Mainly the CPU copy can be simply as environment friendly because the CPU copy within the GPU Switch situation, the GPU copy is gone, the GPU buffer is gone, it’s all goodness and no downsides. The bottom line is VK_HOST_IMAGE_COPY_MEMCPY.

This flag is trivial, it merely tells the CPU not to do structure swizzling. So the feel knowledge being copied to the picture is assumed to be pre-swizzled, and the copy is just memcpy. However for the reason that structure swizzling of photos on varied gadgets will not be public data, how is this convenient?

The reply is in image-to-memory copies with the identical flag, that’s readback of swizzled picture knowledge with out undoing the structure swizzling. Many high-fidelity AAA Android video games obtain huge packages of texture knowledge on the primary run of the sport. Take the next algorithm:

  • Obtain texture knowledge
  • Use a short lived Vulkan picture and name vkCopyMemoryToImage -> the CPU does structure swizzling
  • Learn again the picture contents with vkCopyImageToMemory with the VK_HOST_IMAGE_COPY_MEMCPY flag -> the returned knowledge is pre-swizzled for this explicit machine/driver
  • Retailer solely the pre-swizzled knowledge to persistent storage, not the unique texture knowledge, to attenuate storage footprint

The subsequent time the sport runs, it could actually merely use vkCopyMemoryToImage with VK_HOST_IMAGE_COPY_MEMCPY to copy the pre-swizzled knowledge into the pictures as quick as a easy learn of the file contents can be. This additionally occurs to optimize the streaming situation above!

Solely gotcha is that driver updates may change the structure swizzling of photos. The sport must verify that optimalTilingLayoutUUID is unchanged for the reason that pre-swizzled texture knowledge was cached, and redo the above if it ever modifications. Fortuitously, modifications to the structure swizzle are uncommon. In observe, the sport is unlikely to ever must redownload or reprocess its texture knowledge.

The Host Picture Copy characteristic as conditionally required by Vulkan 1.4, and unconditionally required by Android 16 for brand new gadgets, is a recreation changer for video games on Android. On this submit we checked out a number of simple however important wins utilizing this performance, however there are others, notably asynchronous picture reminiscence defragmentation. Certainly, your ingenuity will result in different optimizations which are made attainable by this characteristic.

Make sure to try this submit on the Khronos weblog for extra technical particulars across the utilization of this performance. As this performance begins to change into prevalent on Android telephones, Vulkan video games shall be better off. Don’t miss out!



Supply hyperlink

RELATED ARTICLES

Most Popular

Recent Comments