Example of TAA implementation with Vireo RHI

TAA / Vireo RHI / C++ / Vulkan / DirectX 12 / Slang Shader Language / Deferred Rendering Henri Michelon

Context

This article details the implementation of TAA in the Vireo RHI Samples from the Vireo Rendering Hardware Interface project.

The corresponding source code can be found in the "Deferred" example: https://github.com/HenriMichelon/vireo_samples/tree/main/src/samples/deferred

Introduction: what is TAA?

Temporal Anti-Aliasing (TAA) is an anti-aliasing technique that uses information accumulated across multiple successive frames to smooth out visual artifacts. Unlike MSAA, which operates during rendering on the edges of geometric shapes, or FXAA, which applies a post-processing filter, TAA leverages time as an additional sampling dimension. (The Vireo RHI Samples also include examples of MSAA and FXAA.)

The core idea: for each frame, the camera is shifted slightly by a sub-pixel (jitter). The result is then merged with the history of previous frames. It is currently the dominant anti-aliasing technique in 3D engines.

TAA's place in the pipeline

The execution order of the deferred pipeline will be as follows:

Depth
PrepassZ-buffer

→

G-Buffer
PassPosition/Normal/Albedo/Velocity

→

Lighting
PassDeferred Shading

→

TAA
PassAccumulation

→

OIT
PassTransparency

→

Post-
processSMAA/FXAA/Effect

TAA operates on opaque geometries: the G-Buffer provides velocity data for the TAA shader well before transparent geometries are rendered during the OIT pass.

Camera Jitter

Jitter involves shifting the projection by a different sub-pixel in each frame, following a quasi-random sequence (or at least sufficiently random for TAA).

TAA data requires enhancing the Global UBO (Uniform Buffer Object) shared between the CPU and shaders:

struct Global {
    alignas(16) glm::mat4 projection;
    glm::mat4 view;
    glm::mat4 viewInverse;
    glm::mat4 previousProjection; // TAA
    glm::mat4 previousView;       // TAA
    glm::vec2 jitter;             // TAA
    glm::vec2 screenSize;             // TAA
    alignas(16) glm::vec4 ambientLight;
};

To calculate the jitter, we traditionally use a Halton sequence that is stored in the global UBO.

Implementation in Scene.cpp

void Scene::jitterProjection(const vireo::Extent& extent) {
    static uint32_t frameIndex = 0;

    // Halton sequence: base b, index i
    const auto halton = [](const uint32_t index, const uint32_t base) {
        auto result = 0.0f;
        auto f = 1.0f / static_cast<float>(base);
        auto i = index;
        while (i > 0) {
            result += f * static_cast<float>(i % base);
            i /= base;
            f /= static_cast<float>(base);
        }
        return result;
    };

    global.jitter = {
        (halton(frameIndex % 16 + 1, 2) - 0.5f) / static_cast<float>(extent.width),
        (halton(frameIndex % 16 + 1, 3) - 0.5f) / static_cast<float>(extent.height)
    };

    global.previousProjection = global.projection;
    global.projection[2][0] = global.jitter.x; 
    global.projection[2][1] = global.jitter.y; 

    frameIndex++;
}

Jitter is applied directly to the camera's current projection, on the projection[2][0] and projection[2][1] cells that correspond to the offset of the line of sight in the perspective projection matrix. The current matrix is first saved in previousProjection before modification.

Per-pixel velocity calculation

TAA requires knowing where each pixel was located in the previous frame. This information is stored in a velocity buffer (also known as a motion vector buffer): a 2-channel, 16-bit render target containing the displacement in NDC space of each fragment between frame n-1 and frame n.

Adding a 5th attachment to the G-Buffer Pass

// Velocity buffer format
vireo::ImageFormat::R16G16_SFLOAT, // TAA Velocity

// Index constant for the descriptor set
static constexpr int BUFFER_VELOCITY{4};

// Add the velocity render target
struct FrameData : FrameDataCommand {
    ...
    std::shared_ptr<vireo::RenderTarget> velocityBuffer; // TAA
};


// Send the render target to the shader in onRender()
renderingConfig.colorRenderTargets[BUFFER_VELOCITY].renderTarget = frame.velocityBuffer;

// Create the render target in onResize()
frame.velocityBuffer = vireo->createRenderTarget(
    pipelineConfig.colorRenderFormats[BUFFER_VELOCITY],
    extent.width, extent.height,
    vireo::RenderTargetType::COLOR,
    renderingConfig.colorRenderTargets[BUFFER_VELOCITY].clearValue,
    1, vireo::MSAA::NONE,
    "Velocity Buffer");

Shader modifications (Slang)

The VertexOutput structure is first modified to pass the previous position between the vertex shader and the fragment shader:

struct VertexOutput {
    float4 position   : SV_POSITION;
    float3 worldPos   : TEXCOORD0;
    float3 normal     : TEXCOORD1;
    float2 uv         : TEXCOORD2;
    float3 tangent    : TEXCOORD3;
    float3 bitangent  : TEXCOORD4;
    float4 previousPos: TEXCOORD5; // TAA
};

The vertex shader calculates the previous position of the vertex using data from the previous frame saved on the CPU side:

// Previous position (frame n-1) for TAA
float4 previousViewPos = mul(global.previousView, worldPos);
output.previousPos = mul(global.previousProjection, previousViewPos);

The fragment shader derives the velocity vector in NDC space, removing the jitter:

float2 curPos = (input.position.xy / global.screenSize);
curPos -= global.jitter * 0.5;
float2 prevPos = (input.previousPos.xy / input.previousPos.w) * 0.5 + 0.5;
output.velocity = (curPos - prevPos);

Applying TAA

The core of TAA is the shader (taa.frag.slang). It takes three textures as input:

Binding	Texture	Content
`t1`	inputImage	Current rendering (frame n)
`t2`	history	TAA result of frame n-1
`t3`	velocity	Velocity buffer calculated by the G-Buffer pass

Texture2D inputImage : register(t1);
Texture2D history : register(t2);
Texture2D velocity : register(t3);

1. Velocity-based reprojection

float2 vel = velocity.Sample(sampler, uv).xy;
float2 historyUV = uv - vel;

// Reject if off-screen (occlusion, edge)
if (any(historyUV < 0.0) || any(historyUV > 1.0)) {
    return current;
}

2. Catmull-Rom filtering of the history

The history is sampled using a bicubic Catmull-Rom filter rather than the classic bilinear filter. This filter produces a sharper result and reduces the amount of "ghosting".

float4 prev = SampleTextureCatmullRom(
    history, sampler, historyUV, params.imageSize);

3. Variance clipping in YCoCg space

To prevent ghosting (residual images from previous frames), the history is confined to an AABB calculated from the 3×3 neighborhood of the current pixel. This operation is performed in YCoCg space, which is better suited to visual perception than RGB.

velocity = clamp(prevYCoCg, mean − σ, mean + σ)

float3 m1 = 0.0, m2 = 0.0;
for (int x = -1; x <= 1; ++x) {
    for (int y = -1; y <= 1; ++y) {
        float3 neighborYCoCg = RGBToYCoCg(
            inputImage.Sample(sampler, uv + offset).rgb);
        m1 += neighborYCoCg;
        m2 += neighborYCoCg * neighborYCoCg;
    }
}
float3 mean   = m1 / 9.0;
float3 stddev = sqrt(max(0.0, (m2 / 9.0) - (mean * mean)));
float3 minColor = max(mean - 1.0 * stddev, neighborMin);
float3 maxColor = min(mean + 1.0 * stddev, neighborMax);
float3 clampedPrev = clamp(prevYCoCg, minColor, maxColor);

4. Adaptive blending

The blending factor between the current frame and the history is not fixed, it adapts based on two signals: the magnitude of the velocity vector and the clamping intensity (a measure of deocclusion).

The fixed constants in this section allow you to adjust the TAA result. The ones chosen in this example prioritize anti-aliasing over artifacts (see Chapter 07).

// Velocity weight: the more movement, the less reliance on the history
float velocityWeight = saturate(length(vel) * 30.0);
// Clamping weight: if the history has been significantly adjusted, reduce its weight
float clampWeight = saturate(length(clampedPrev - prevYCoCg) * 3.0);
// blend ∈ [0.75, 0.95] — 95% history when stationary, 75% during movement
float blendFactor = lerp(0.95, 0.75, max(velocityWeight, clampWeight));
float3 result = lerp(current.rgb, finalPrevRGB, blendFactor);

Temporal buffer management (ping-pong)

The TAA maintains two color buffers per frame-in-flight (taaColorBuffer[0] and taaColorBuffer[1]) in a ping-pong pattern.

// Add buffer bindings:
static constexpr vireo::DescriptorIndex BINDING_HISTORY{2}; // TAA Only
static constexpr vireo::DescriptorIndex BINDING_VELOCITY{3}; // TAA Only

// Add buffers and descriptors set:
struct FrameData {
    ...
    std::shared_ptr<vireo::DescriptorSet> taaDescriptorSet[2];
    std::shared_ptr<vireo::RenderTarget>  taaColorBuffer[2];
};

// Adding the descriptors layout  and the TAA pipeline:
std::shared_ptr<vireo::Pipeline> taaPipeline;
std::shared_ptr<vireo::DescriptorLayout> taaDescriptorLayout;

// In onInit(), create the descriptors layout, descriptors set, pipeline and buffer:
taaDescriptorLayout = vireo->createDescriptorLayout();
taaDescriptorLayout->add(BINDING_PARAMS, vireo::DescriptorType::UNIFORM);
taaDescriptorLayout->add(BINDING_INPUT, vireo::DescriptorType::SAMPLED_IMAGE);
taaDescriptorLayout->add(BINDING_HISTORY, vireo::DescriptorType::SAMPLED_IMAGE);
taaDescriptorLayout->add(BINDING_VELOCITY, vireo::DescriptorType::SAMPLED_IMAGE);
taaDescriptorLayout->build();

const auto taaResources = vireo->createPipelineResources({
    taaDescriptorLayout,
    samplers.getDescriptorLayout() });

pipelineConfig.resources = taaResources;
pipelineConfig.fragmentShader = vireo->createShaderModule("shaders/taa.frag");
taaPipeline = vireo->createGraphicPipeline(pipelineConfig);

for (auto& frame : framesData) {
    ...
    frame.taaDescriptorSet[0] = vireo->createDescriptorSet(taaDescriptorLayout);
    frame.taaDescriptorSet[0]->update(BINDING_PARAMS, paramsBuffer);
    frame.taaDescriptorSet[1] = vireo->createDescriptorSet(taaDescriptorLayout);
    frame.taaDescriptorSet[1]->update(BINDING_PARAMS, paramsBuffer);
}

At each frame, the current buffer becomes the history buffer for the next frame, and vice versa:

// In taaPass():
const auto historyIndex  = (taaIndex + 1) % 2;
const auto currentHistory  = frame.taaColorBuffer[taaIndex];
const auto previousHistory = frame.taaColorBuffer[historyIndex];

// At the end of rendering — advance the index if TAA is enabled
if (applyTAA) {
    taaIndex = (taaIndex + 1) % 2;
}

The rest is a standard full-screen rendering pass using Vireo RHI:

cmdList->barrier(
   colorBuffer,
   vireo::ResourceState::RENDER_TARGET_COLOR,
   vireo::ResourceState::SHADER_READ);
cmdList->barrier(
   previousHistory,
   vireo::ResourceState::UNDEFINED,
   vireo::ResourceState::SHADER_READ);
cmdList->barrier(
    currentHistory,
    vireo::ResourceState::UNDEFINED,
    vireo::ResourceState::RENDER_TARGET_COLOR);

frame.taaDescriptorSet[taaIndex]->update(BINDING_INPUT,   colorBuffer->getImage());
frame.taaDescriptorSet[taaIndex]->update(BINDING_HISTORY,  previousHistory->getImage());
frame.taaDescriptorSet[taaIndex]->update(BINDING_VELOCITY, velocityBuffer->getImage());

renderingConfig.colorRenderTargets[0].renderTarget = currentHistory;
cmdList->beginRendering(renderingConfig);
cmdList->setViewport(vireo::Viewport{
    static_cast<float>(extent.width),
    static_cast<float>(extent.height)});
cmdList->setScissors(vireo::Rect{
    extent.width,
    extent.height});
cmdList->bindPipeline(taaPipeline);
cmdList->bindDescriptors({frame.taaDescriptorSet[taaIndex], samplers.getDescriptorSet()});
cmdList->draw(3);
cmdList->endRendering();

cmdList->barrier(
    previousHistory->getImage(),
    vireo::ResourceState::SHADER_READ,
    vireo::ResourceState::UNDEFINED);
cmdList->barrier(
    colorBuffer->getImage(),
    vireo::ResourceState::SHADER_READ,
    vireo::ResourceState::UNDEFINED);

Integration into the rendering pipeline

The TAA output is inserted between the rendering of opaque geometry (Deferred Lighting) and the rendering of transparent geometry (OIT). When TAA is enabled, the following passes in the example (OIT, FXAA, SMAA, Voronoi effect, gamma correction) consume the taaColorBuffer rather than the original colorBuffer:

// After Deferred Lighting:
postProcessing.taaPass(
    frameIndex,
    swapChain->getExtent(),
    samplers,
    cmdList,
    frame.colorBuffer,
    gbufferPass.getVelocityBuffer(frameIndex));
auto colorBuffer = postProcessing.applyTAA
    ? postProcessing.getTAAColorBuffer(frameIndex)
    : frame.colorBuffer;

// OIT and post-processing receive this colorBuffer
transparencyPass.onRender(..., colorBuffer);
postProcessing.onRender(..., colorBuffer);

Other practical considerations

Adjustable quality settings

Parameter	Value	Function
Halton Cycle	16 frames	Number of distinct jitter positions
σ clipping variance	1.0×	Tolerance before history rejection
Blend factor (rest)	0.95	History weight when no movement
Blend factor (movement)	0.75	History weight when camera/object is moving
Velocity scale	30.0×	Sensitivity of the blend to the velocity vector

Limitation

Since the velocity buffer is calculated only on opaque geometries in the G-Buffer, transparent objects (OITs) do not contribute to the velocity, which can cause ghosting on moving transparent surfaces.