Custom GPU Passes
Blinc’s drawing API has two layers. The first is DrawContext, the
high-level imperative surface used by widgets, canvases, and the
component library: fill_rect, draw_text, push_clip, and so on.
Resolution-independent, analytical, SDF-backed. Most code never needs
anything else.
The second is the custom GPU pass, an escape hatch into raw wgpu.
You bring your own pipeline, your own buffers, and your own bind groups.
Blinc plumbs the device, queue, and target view through, runs your pass
at a frame-accurate position in its render loop, and clips your output
to a layout-aware viewport rect. Use this when you need:
- Retained per-instance buffers and instanced draw calls for thousands of objects in a scene.
- Compute shaders.
- Custom post-process effect chains.
- Direct interop with another
wgpu-based crate. - Any pipeline whose semantics don’t fit the SDF primitive model.
There are two ways to schedule a custom pass. Pick by where it should run.
Two scheduling models
| You want to… | Use |
|---|---|
| Run inline with a canvas closure, clipped to the canvas’s layout bounds | DrawContext::run_gpu_pass |
| Run at a fixed stage (pre-render / scene-3d / post-process) across the whole frame | register_custom_pass on the renderer |
The first is element-scoped and the more common case. The pass appears in the tree alongside everything else, gets the canvas’s bounds as its scissor, and is dispatched inside the canvas’s paint slot. The second is global and best for skybox-style backgrounds or full-frame post-processing.
This chapter focuses on the first. For the second, see the docs on
blinc_gpu::CustomRenderPass and RenderStage.
The CustomRenderPass trait
Both scheduling models share the same trait. Implement it on a struct that owns your GPU state:
#![allow(unused)]
fn main() {
use blinc_gpu::custom_pass::{CustomRenderPass, RenderPassContext, RenderStage};
struct Particles {
pipeline: Option<wgpu::RenderPipeline>,
vertex_buffer: Option<wgpu::Buffer>,
instance_buffer: Option<wgpu::Buffer>,
}
impl CustomRenderPass for Particles {
fn label(&self) -> &str { "particles" }
fn stage(&self) -> RenderStage { RenderStage::PreRender }
fn initialize(
&mut self,
device: &wgpu::Device,
queue: &wgpu::Queue,
format: wgpu::TextureFormat,
) {
// Build pipelines, buffers, bind groups once. Stored on
// `self`, retained across frames.
}
fn render(&mut self, ctx: &RenderPassContext) {
// Issue draws. `ctx.device`, `ctx.queue`, `ctx.target` are
// the live wgpu handles for this frame.
}
}
}
Key contract:
initializeruns once, lazily, before the firstrendercall. Build your pipeline and persistent buffers here.renderruns every frame the pass is dispatched. Update per-frame uniforms withqueue.write_buffer, encode a render pass withLoadOp::Load(so you compose over whatever Blinc already drew), draw, and submit.ctx.viewportisSome([x, y, w, h])in physical pixels when the pass is scoped to a canvas. Applyset_viewport+set_scissor_rectto clip your draws to that rect.
Inside initialize and render, you’re writing plain wgpu. Blinc
neither restricts what you do nor inspects what you produce.
Scoping a pass to a canvas
The natural place to embed a CustomRenderPass is inside a regular
canvas() widget. Wrap your pass with GpuPass::new(...) and call
run_gpu_pass from the canvas closure:
#![allow(unused)]
fn main() {
use blinc_gpu::GpuPass;
let pass = GpuPass::new(Particles { /* ... */ });
let pass_for_canvas = pass.clone();
div()
.w(720.0)
.h(540.0)
.rounded(12.0)
.child(
canvas(move |ctx, bounds| {
ctx.run_gpu_pass(
&pass_for_canvas,
Some(Rect::new(0.0, 0.0, bounds.width, bounds.height)),
);
})
.size(720.0, 540.0),
)
}
What’s happening:
GpuPass::newwraps yourCustomRenderPassin anArc<Mutex<…>>so the canvas closure (which isFn, notFnMut) can hold a clone and pass&passwithout needing your ownRefCellorMutex.ctx.run_gpu_passrecords the pass into the paint context’s pending-pass list. The list is drained at composite time, after the SDF cache is blitted onto the swapchain.- The
Some(Rect::new(...))second argument plumbs through toRenderPassContext::viewport. Your pass uses it forset_viewportset_scissor_rect. If you passNone, the GPU backend falls back to the current clip-stack AABB.
Mixing imperative draws with a custom pass in one closure works the way you’d expect. Calls inside the closure run in source order, and the custom pass composes over them inside its scissor:
#![allow(unused)]
fn main() {
canvas(move |ctx, bounds| {
ctx.fill_rect(bounds.rect(), 8.0.into(), Color::BLACK.into());
ctx.run_gpu_pass(&pass_for_canvas, Some(bounds.rect()));
ctx.draw_text("particles", origin, &style);
})
}
Retained buffers + instanced rendering
This is the headline use case. Below is the minimum a CustomRenderPass
needs to instance N objects against retained GPU buffers.
State stored on self (survives every frame):
#![allow(unused)]
fn main() {
struct InstancedGrid {
pipeline: Option<wgpu::RenderPipeline>,
vertex_buffer: Option<wgpu::Buffer>, // base mesh (a unit quad)
instance_buffer: Option<wgpu::Buffer>, // per-instance data
uniform_buffer: Option<wgpu::Buffer>, // per-frame uniforms
bind_group: Option<wgpu::BindGroup>,
}
}
initialize builds them once:
#![allow(unused)]
fn main() {
fn initialize(&mut self, device: &wgpu::Device, queue: &wgpu::Queue, format: wgpu::TextureFormat) {
let quad: [Vertex; 6] = /* six verts of a unit quad */;
let vertex_buffer = device.create_buffer(&wgpu::BufferDescriptor {
label: Some("grid_vb"),
size: std::mem::size_of_val(&quad) as u64,
usage: wgpu::BufferUsages::VERTEX | wgpu::BufferUsages::COPY_DST,
mapped_at_creation: false,
});
queue.write_buffer(&vertex_buffer, 0, bytemuck::bytes_of(&quad));
let instances: Vec<Instance> = /* 4096 instances */;
let instance_buffer = device.create_buffer(&wgpu::BufferDescriptor {
label: Some("grid_ib"),
size: (std::mem::size_of::<Instance>() * instances.len()) as u64,
usage: wgpu::BufferUsages::VERTEX | wgpu::BufferUsages::COPY_DST,
mapped_at_creation: false,
});
queue.write_buffer(&instance_buffer, 0, bytemuck::cast_slice(&instances));
// Pipeline, uniforms, bind group… all stored on self.
self.pipeline = Some(/* ... */);
self.vertex_buffer = Some(vertex_buffer);
self.instance_buffer = Some(instance_buffer);
}
}
render issues one instanced draw per frame:
#![allow(unused)]
fn main() {
fn render(&mut self, ctx: &RenderPassContext) {
let (Some(pipeline), Some(vb), Some(ib), Some(ub), Some(bg)) =
(&self.pipeline, &self.vertex_buffer, &self.instance_buffer,
&self.uniform_buffer, &self.bind_group) else { return; };
// Refresh per-frame uniforms (single 16-byte write).
ctx.queue.write_buffer(ub, 0, bytemuck::bytes_of(&uniforms));
let mut encoder = ctx.device.create_command_encoder(&Default::default());
{
let mut pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
label: Some("grid_pass"),
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
view: ctx.target,
resolve_target: None,
depth_slice: None,
ops: wgpu::Operations {
// Compose over whatever Blinc already drew below us.
load: wgpu::LoadOp::Load,
store: wgpu::StoreOp::Store,
},
})],
depth_stencil_attachment: None,
..Default::default()
});
pass.set_pipeline(pipeline);
// Clip to the canvas's bounds. `viewport` is the
// physical-pixel rect Blinc plumbed in from your
// `run_gpu_pass` call — already clamped to the render
// target, and zero-area rects are skipped at the dispatch
// boundary. Pass it through verbatim; don't `.max(1.0)`
// any axis or you walk into the wgpu scissor overflow
// panic when the canvas sits exactly on the bottom edge
// after a resize.
if let Some([vx, vy, vw, vh]) = ctx.viewport {
pass.set_viewport(vx, vy, vw, vh, 0.0, 1.0);
pass.set_scissor_rect(vx as u32, vy as u32, vw as u32, vh as u32);
}
pass.set_bind_group(0, bg, &[]);
pass.set_vertex_buffer(0, vb.slice(..)); // base mesh
pass.set_vertex_buffer(1, ib.slice(..)); // per-instance
pass.draw(0..6, 0..INSTANCE_COUNT); // one draw, N instances
}
ctx.queue.submit(std::iter::once(encoder.finish()));
}
}
Two wgpu::VertexBufferLayout entries (one with step_mode: Vertex,
one with step_mode: Instance) plus the rasterizer looping over the
instance buffer for you. The full working version lives in
examples/blinc_app_examples/examples/gpu_pass_demo.rs:
cargo run -p blinc_app_examples --example gpu_pass_demo
4 096 instanced quads, one draw call per frame, all buffers retained.
How the pass composes with the rest of the frame
Blinc’s frame loop is documented in the GPU
Rendering chapter. The short
version: a canvas closure runs every frame inside the paint walker.
SDF primitives the closure emits land in either the static cache or the
dynamic batch. A custom pass scheduled via run_gpu_pass is drained
separately and dispatched after composite_frame has blitted the
static cache onto the swapchain, but before any overlay panels on top.
What this means concretely:
- Your pass draws over the canvas’s other content (its background,
whatever
fill_rects the closure emitted, ancestor styling). - Overlay panels rendered by the layout system still appear on top of your pass. Modal dialogs and tooltips don’t get hidden behind custom WebGPU content.
- The canvas wrapper opts the subtree out of the static-cache fast path automatically (canvases run every frame by contract). There’s no extra invalidation work to do.
- Custom passes are not recorded by
RecordingContext. If you replay a recorded canvas, custom passes silently no-op during replay (the trait method has a default empty impl, and onlyGpuPaintContextoverrides it).
Bounds vs. clipping
The viewport argument to run_gpu_pass is your contract with the
canvas:
Some(Rect::new(0.0, 0.0, bounds.width, bounds.height)): clip to the canvas’s full layout box. The most common form.Some(Rect::new(margin, margin, w - 2.0*margin, h - 2.0*margin)): clip to a sub-region. Useful if your wgpu output should leave a border for SDF chrome around it.None: fall back to whatever clip the GPU paint context has on its stack. The wrapping widget has almost certainly pushed one for its layout bounds, so this is usually equivalent to the first form.
What you do inside the pass is up to you. Coordinate systems, depth,
and blend state are all local; Blinc doesn’t interpose. The only state
it cares about is the set_scissor_rect you apply to clip your output
to the canvas region. If you skip that, your pass will draw to the
whole frame target.
Sharing state with the rest of your UI
Because the canvas closure is Fn (not FnMut), you can’t mutate
captured-by-move variables directly. Two clean patterns:
Time-driven motion inside the pass. Capture a start Instant on
first render call; compute elapsed every frame. No external state
needed.
#![allow(unused)]
fn main() {
fn render(&mut self, ctx: &RenderPassContext) {
let start = *self.start.get_or_insert_with(std::time::Instant::now);
let time = start.elapsed().as_secs_f32();
// …write `time` into your uniform buffer…
}
}
External signals. Wrap any state you need to mutate from outside
the closure (frame counter, mouse-derived target, animation
parameters) in an Arc<Mutex<...>> or one of Blinc’s signals. Read it
from inside render:
#![allow(unused)]
fn main() {
struct Field {
/* GPU resources… */
target: Arc<Mutex<glam::Vec2>>, // shared with UI
}
impl CustomRenderPass for Field {
fn render(&mut self, ctx: &RenderPassContext) {
let target = *self.target.lock().unwrap();
// …feed target into the uniform…
}
}
// Elsewhere, UI mutates the shared cell:
let target = Arc::new(Mutex::new(glam::Vec2::ZERO));
on_mouse_move(|p| *target.lock().unwrap() = p);
}
Don’t try to peek into GpuPass itself; the wrapper deliberately
doesn’t expose its inner pass. Hold the shared state through your own
Arc instead.
When NOT to use a custom pass
A custom pass is the right tool when your work doesn’t fit the SDF primitive model: instanced meshes, compute, multi-pass post-effects. For most UI work it’s the wrong tool. Heavier to write, harder to debug, and locked to one GPU backend.
Default to the high-level path:
- For shapes and gradients:
DrawContext::fill_rect,stroke_rect,fill_circle,fill_path. - For animations: spring physics, CSS keyframes, motion bindings. The compositor patches GPU primitives in place for these. See Performance Tips.
- For 3D scenes:
blinc_canvas_kit’sSceneKit3D+draw_mesh_data. Handles cameras, lighting, IBL, and shadow maps without you writing any wgpu.
Reach for custom passes only when the simpler tools genuinely can’t express what you need.