Context
This article details the implementation of SMAA in the Vireo RHI Samples of the Vireo Rendering Hardware Interface project
What is SMAA?
Anti-aliasing is one of the fundamental problems in real-time rendering. SMAA (Subpixel Morphological Anti-Aliasing) proposes a post-process approach that operates on the final rendering result: the image is processed after rendering, without using geometry (unlike MSAA).
The principle is morphological: edge shapes are detected to derive adaptive blending weights, providing quality close to MSAA 4× at a much lower cost.
In this example, SMAA is integrated into the PostProcessing module alongside FXAA and TAA,
and can be toggled at runtime using the M key.
Its implementation consists of three fragment shader passes, written in C++23 and Slang.
The SMAA Pipeline
SMAA requires three passes to produce the final result:
Each pass is a full-screen quad.
The geometry of this triangle is generated entirely in the vertex shader (quad.vert),
without a vertex buffer, using SV_VertexID.
Shaders are written in Slang,
a modern shading language compiled to SPIR-V (Vulkan) or DXIL (DirectX 12).
Pass 1 — Edge Detection
This pass takes the color image output from the rendering pipeline (inputImage)
and produces a buffer encoding horizontal and vertical edge intensity for each pixel.
#include "postprocess.inc.slang"
ConstantBuffer<Params> params : register(b0);
Texture2D inputImage : register(t1);
SamplerState sampler : register(SAMPLER_LINEAR_EDGE, space1);
float4 fragmentMain(VertexOutput input) : SV_TARGET {
// Coefficients de la formule de luminance perceptuelle ITU-R BT.601
float3 lumaWeight = float3(0.299, 0.587, 0.114);
// Échantillonnage du pixel central et de ses 4 voisins directs
float lumaC = dot(inputImage.Sample(sampler, input.uv).rgb, lumaWeight);
float lumaN = dot(inputImage.Sample(sampler, input.uv + float2( 0, -1) / params.imageSize).rgb, lumaWeight);
float lumaS = dot(inputImage.Sample(sampler, input.uv + float2( 0, 1) / params.imageSize).rgb, lumaWeight);
float lumaW = dot(inputImage.Sample(sampler, input.uv + float2(-1, 0) / params.imageSize).rgb, lumaWeight);
float lumaE = dot(inputImage.Sample(sampler, input.uv + float2( 1, 0) / params.imageSize).rgb, lumaWeight);
// Gradient horizontal : différence gauche-centre + centre-droite
float edgeH = abs(lumaW - lumaC) + abs(lumaC - lumaE);
// Gradient vertical : différence haut-centre + centre-bas
float edgeV = abs(lumaN - lumaC) + abs(lumaC - lumaS);
// Sortie : R = arête horizontale, G = arête verticale
return float4(edgeH, edgeV, 0, 0);
}
Working in luminance instead of RGB provides two advantages:
fewer samples (a scalar instead of a 3-float vector)
and better correlation with human perception.
The coefficients (0.299, 0.587, 0.114) come from the
BT.601 standard.
The sampler used here is SAMPLER_LINEAR_EDGE
(clamp-to-edge addressing mode, bilinear filtering).
This ensures pixels at image borders do not sample out-of-bounds texels.
The smaaEdgeBuffer is allocated as
R16G16_SFLOAT (two 16-bit channels).
Red stores horizontal gradients, Green stores vertical gradients.
Pass 2 — Blending Weight Calculation
This pass consumes the buffer from pass 1 and computes a scalar blending weight per pixel.
#include "postprocess.inc.slang"
ConstantBuffer<Params> params : register(b0);
Texture2D edgeBuffer : register(t1);
SamplerState sampler : register(SAMPLER_NEAREST_BORDER, space1);
float4 fragmentMain(VertexOutput input) : SV_TARGET {
// Lecture des arêtes H et V depuis la passe 1
float2 edge = edgeBuffer.Sample(sampler, input.uv).rg;
// Le poids est le maximum des deux directions, clampé dans [0, 1]
float weight = saturate(max(edge.r, edge.g));
return float4(weight, weight, weight, 0);
}
max(edgeH, edgeV) selects the dominant edge direction.
A pixel located at the intersection of strong edges in both directions
will receive a maximum weight, which is the desired behavior.
The function saturate(x) is the GPU equivalent of
clamp(x, 0, 1) in C++.
Unlike pass 1, the sampler SAMPLER_NEAREST_BORDER
used here does not apply bilinear filtering:
edge data is read exactly per pixel.
Border mode returns zero outside the texture.
Pass 3 — Neighborhood Blending
The final pass combines the original image with its immediate neighbors based on the weights computed in pass 2.
#include "postprocess.inc.slang"
ConstantBuffer<Params> params : register(b0);
Texture2D inputImage : register(t1);
Texture2D blendBuffer : register(t2); // Sortie de la passe 2
SamplerState sampler : register(SAMPLER_NEAREST_BORDER, space1);
float4 fragmentMain(VertexOutput input) : SV_TARGET {
float4 color = inputImage.Sample(sampler, input.uv); // Pixel central
float2 blend = blendBuffer.Sample(sampler, input.uv).rg; // Poids H et V
float4 n = inputImage.Sample(sampler, input.uv + float2( 0, -1) / params.imageSize);
float4 e = inputImage.Sample(sampler, input.uv + float2( 1, 0) / params.imageSize);
// Interpolation verticale : mélange avec le voisin Nord
float4 blended = lerp(color, n, blend.r);
// Interpolation horizontale : mélange avec le voisin Est
blended = lerp(blended, e, blend.g);
return blended;
}
This pass is the only one using the
smaaDescriptorLayout with three bindings:
parameter buffer (b0),
original color image (t1),
and blend buffer (t2).
The first interpolation (lerp)
blends the current pixel with its north neighbor
based on blend.r.
The second blends the result with the east neighbor
using blend.g.
Where no edge exists (blend ≈ 0),
the pixel is unchanged.
Where a strong edge is detected (blend ≈ 1),
the pixel is heavily blended with its neighbor.
Integration into the PostProcessing Pipeline
The samples.common.postprocessing module manages the full lifecycle of post-processing passes.
The architecture uses C++20 modules (export module).
Pipelines Initialization (onInit)
// Création des trois pipelines graphiques SMAA
pipelineConfig.colorRenderFormats.push_back(vireo::ImageFormat::R16G16_SFLOAT);
pipelineConfig.resources = resources; // layout standard (2 bindings)
pipelineConfig.fragmentShader = vireo->createShaderModule("shaders/smaa_edge_detect.frag");
smaaEdgePipeline = vireo->createGraphicPipeline(pipelineConfig);
pipelineConfig.fragmentShader = vireo->createShaderModule("shaders/smaa_blend_weigth.frag");
smaaBlendWeightPipeline = vireo->createGraphicPipeline(pipelineConfig);
// La passe 3 a un format et un layout différents
pipelineConfig.colorRenderFormats[0] = renderFormat; // format final de l'image
pipelineConfig.resources = smaaResources; // layout avec 3 bindings
pipelineConfig.fragmentShader = vireo->createShaderModule("shaders/smaa_neighborhood_blend.frag");
smaaBlendPipeline = vireo->createGraphicPipeline(pipelineConfig);
Render Loop (onRender)
The SMAA passes in onRender are integrated with other post-processing passes.
// === Passe 1 : détection des contours ===
cmdList->barrier(colorInput, // image source → lecture shader
vireo::ResourceState::RENDER_TARGET_COLOR,
vireo::ResourceState::SHADER_READ);
cmdList->barrier(frame.smaaEdgeBuffer, // edge buffer → cible de rendu
vireo::ResourceState::UNDEFINED,
vireo::ResourceState::RENDER_TARGET_COLOR);
frame.smaaEdgeDescriptorSet->update(BINDING_INPUT, colorInput->getImage());
// ... beginRendering → bindPipeline → draw(3) → endRendering
// === Passe 2 : poids de mélange ===
cmdList->barrier(frame.smaaEdgeBuffer, // edge buffer → lecture shader
vireo::ResourceState::RENDER_TARGET_COLOR,
vireo::ResourceState::SHADER_READ);
cmdList->barrier(frame.smaaBlendBuffer, // blend buffer → cible de rendu
vireo::ResourceState::UNDEFINED,
vireo::ResourceState::RENDER_TARGET_COLOR);
frame.smaaBlendWeightDescriptorSet->update(BINDING_INPUT, frame.smaaEdgeBuffer->getImage());
// ... beginRendering → bindPipeline → draw(3) → endRendering
// === Passe 3 : mélange de voisinage ===
cmdList->barrier(frame.smaaBlendBuffer, // blend buffer → lecture shader
vireo::ResourceState::RENDER_TARGET_COLOR,
vireo::ResourceState::SHADER_READ);
frame.smaaBlendDescriptorSet->update(BINDING_INPUT, colorInput->getImage());
frame.smaaBlendDescriptorSet->update(BINDING_SMAA_INPUT, frame.smaaBlendBuffer->getImage());
// ... beginRendering → bindPipeline → draw(3) → endRendering
The pattern is identical for each pass:
barrier on the source (to SHADER_READ),
barrier on the destination (to RENDER_TARGET_COLOR),
descriptor update, then draw call.
Barriers ensure correct GPU synchronization.
SMAA vs FXAA vs MSAA — Comparison
The Vireo RHI examples include three anti-aliasing techniques:
FXAA (Fast Approximate Anti-Aliasing)
Detects edges using local contrast and applies directional blur in one pass. Very fast, but can slightly blur the image.
SMAA (Subpixel Morphological AA)
Analyzes edge morphology for more precise blending weights. Higher quality than FXAA with less blurring. Slightly more expensive (3 passes).
TAA (Temporal Anti-Aliasing)
Uses frame history to accumulate information. Excellent for static geometry, but requires velocity buffers and ghosting management.
The combination TAA + SMAA is particularly effective: TAA stabilizes temporally, SMAA refines spatial edges.
Conclusion
Several improvements can bring this closer to full SMAA:
- Use an edge lookup LUT in pass 2 for precise directional weights
- Implement stencil masking to process only edge pixels
- Add SMAA T2× (temporal jitter) for finer subpixel AA