concept vi

Background work & durable workflows

The queue lives inside your SQLite file. Enqueue is a plain INSERT — so background work commits or rolls back with the domain write that caused it. No second system, no orphaned work.

Some work outlives the request: report builds, email sends, data imports, webhook fan-outs. The usual answer — a separate queue service, a second database, a broker you keep in sync — introduces a seam where things go wrong. A write succeeds, the queue insert fails, and now you have a database row with no work attached. Or the reverse: a job queued for a report that rolled back.

orion's answer is to put the queue in the same SQLite file as the rest of the store. enqueue() is a plain INSERT. Call it inside a store.tx() and the job commits or rolls back with the domain write. That's the transactional-outbox pattern — and here it costs nothing extra, because the outbox is the store.

The seam: enqueue / handle / start

The jobs interface follows the same shape as Store and Bus: a small, belt-owned contract, a zero-dependency blessed default, and heavier executors behind the same seam for when you outgrow polling.

import { createJobs } from "../../belt/jobs.ts"

// construct once, alongside the store
const jobs = createJobs(store, {
  pollMs:      500,       // idle cadence — drains immediately after each job
  maxAttempts: 3,         // dead-letter after this many attempts
  backoffMs:   (n) => 1000 * 2 ** (n - 1),  // 1s, 2s, 4s, …
})

// in a feature's start() slot — one line, one stopper
start: (deps) => jobs.start(deps)

Three methods carry the whole v1 contract:

enqueue(type, payload, opts?) — a plain INSERT on the store. Safe inside any store.tx(); publishes nothing (lifecycle events start at claim time). runAt and per-enqueue maxAttempts are the only options.
handle(type, handler) — register the async function that runs when a job of that type is claimed. The runner only claims types it has handlers for — a second process handling other types never steals this one's work.
start(deps) — the polling runner. Returns a stopper that ends the loop, clears the timer, and waits for any in-flight job to finish. Shutdown drains; it never abandons a half-run job.

The transactional outbox — in four lines

The canonical example: a user requests a report. The report row and its build job must commit together or not at all. No coordinator required:

// features/reports/reports.commands.ts
import { command } from "../../belt/http.ts"

export const create = command(async (ctx, respond) => {
  const jobs = reportJobs(ctx.store)
  ctx.store.tx(() => {
    const { lastId } = reports.insert(ctx.store, title)
    const job = jobs.enqueue("report.build", { reportId: Number(lastId) })
    reports.linkJob(ctx.store, Number(lastId), job.id)
  })
})

The report row and the _orion_jobs INSERT are in the same transaction. A crash between the two is impossible — either both land or neither does. There is no window where a report exists without a pending build, and no build queued for a report that was never committed.

This is the reason jobs-in-your-SQLite is the blessed default. Over a network queue you buy this guarantee with a distributed transaction or an outbox table you maintain by hand. Here it's the natural consequence of a plain INSERT on the same store.

The claim: atomic, safe across processes

The runner picks up work with a single UPDATE…RETURNING. SQLite serializes writers, so the claim is atomic even when two processes share the same db file — a second consumer cannot claim the same row:

-- the claim: one statement, one lock, no window for a double-run
update _orion_jobs
   set status = 'running', attempts = attempts + 1, updated_at = datetime('now')
 where id = (
   select id from _orion_jobs
    where status = 'queued' and run_after <= ? and type in (…)
    order by run_after, id
    limit 1
 )
 returning id, type, payload, attempts, max_attempts, cause

After a successful run the row moves to done. On failure: if attempts remain, the row returns to queued with a run_after computed from exponential backoff (1s · 2ⁿ⁻¹). After maxAttempts it is dead-lettered to failed. A retry is not terminal, so it logs but does not publish jobs.failed — only the dead-letter does.

One operational note: a process that dies mid-job leaves the row in running — deliberately. A second consumer cannot tell "crashed" from "still working," so the runner never auto-reclaims stuck rows. Requeue them by hand via orion db jobs, the CLI backlog command.

Live progress is just the loop

Job lifecycle publishes bare events on the jobs topic — jobs.started, jobs.progress, jobs.done, jobs.failed — each carrying only an id and job type. Events carry no state. Live views re-read the job row and render a plain <progress> element; fat-morph updates it in every watcher's browser. No new protocol, no polling endpoint, no client state.

import { live } from "../../belt/http.ts"

// the handler — progress() writes the row AND publishes jobs.progress
jobs.handle("report.build", async (job, { store, bus, progress }) => {
  const steps = buildSteps(job.payload.reportId)
  for (const [i, step] of steps.entries()) {
    await step.run()
    progress(((i + 1) / steps.length) * 100)   // → every watcher's bar morphs
  }
})

// the live view — subscribes to both; renders whatever is current truth
const reportList = live({ topics: ["reports", "jobs"] }, (ctx) =>
  ReportList(reports.all(ctx.store), jobs.recent())
)

The handler lives in the feature folder alongside its SQL and commands. It wires into the process in the feature's start() slot — one line in the manifest, nothing in main.ts changes:

// features/reports/index.ts — the feature manifest
export const reportsFeature = {
  migrate: (store) => reports.migrate(store),
  routes:  (router) => routes.mount(router),
  start:   (deps) => reportJobs(deps.store).start(deps),   // ← the runner
}

The worker split: a rung, not a default

The blessed setup runs the consumer in-process alongside the web server. Splitting it out to a worker process is a ladder rung — one you climb when you have a specific reason to, and not before:

in-process (default)

One process, one entry point

Live-progress repaints work. The bus is in-process, so a jobs.progress event reaches every open stream immediately. Jobs and renders share the event loop.

worker process (rung)

Separate entry point

Isolates CPU-heavy work from the render loop. The gate: observed event-loop starvation — a number on the Health panel, not a feeling. Live repaints cross processes only once a cross-process Bus adapter (NOTIFY/LISTEN) is in place.

A worker entry point is a second composition root — same feature modules, no router, no listen. It runs the same migrations, which are forward-only and idempotent, so the call is a no-op when the schema is already current:

// worker.ts — same features, no HTTP. Migrations are idempotent:
// only new entries run, so this brings the schema forward or does nothing.
for (const f of features) store.migrate(f.name, f.migrations)

for (const f of features) f.start?.(deps)
// job handling runs; live-progress events stay inside this process

Migrations are append-only and forward-only (store.migrate applies only the entries a feature hasn't seen), so it doesn't matter which process runs first — a worker booted before the web deploy simply migrates, and one booted after no-ops. A worker that won't start is a deploy you fix; the idempotent migration is what keeps that safe.

Durable workflows — same seam, upgraded executor

v1 jobs is deliberately enqueue / progress / retry. It does not do steps, sleep, or await-event — true durable workflows. Those belong one tier up, behind the same seam, backed by a heavier executor when you outgrow polling.

// durable steps and sleep — behind the jobs seam, different executor
workflow("onboard", async (step, { account }) => {
  await step.do("provision",  () => provisionTenant(account))
  await step.sleep("grace", "3 days")   // durable — survives restarts and deploys
  await step.do("welcome",    () => sendWelcomeEmail(account))
})

Feature code that codes against the seam doesn't change a line when you swap executors. The seam is sized so that each executor can implement it:

Tier Zero-dep default Self-host upgrade Cloudflare Postgres

Jobs _orion_jobs + poll NOTIFY/LISTEN wake Queues Absurd

Workflows SQLite saga steps + sleep Workflows Absurd

Cron / notify setTimeout poll cron + NOTIFY Queues + Cron LISTEN/NOTIFY

Tier	Zero-dep default	Self-host upgrade	Cloudflare	Postgres
Jobs	`_orion_jobs` + poll	NOTIFY/LISTEN wake	Queues	Absurd
Workflows	SQLite saga	steps + sleep	Workflows	Absurd
Cron / notify	`setTimeout` poll	cron + NOTIFY	Queues + Cron	LISTEN/NOTIFY

The zero-dep default is honest about what polling costs: a 500ms idle cadence is the latency floor for low-volume background work. That is fine for most use cases. The wake-on-enqueue upgrade — a SQLite extension that issues NOTIFY the moment a row is inserted — collapses that latency to the commit round-trip. Both are valid rungs; the default earns its place because it requires nothing beyond the SQLite file you already have.

The SQLite polling stance stands on the same performance foundation as the rest of orion's store design — see Anders Murphy's 100,000 TPS over a billion rows for the benchmark baseline that makes SQLite a credible job store at any non-trivial scale.

What the seam owns, what the feature owns

The belt owns the _orion_jobs table and the lifecycle — claim, progress, retry, dead-letter, lifecycle events on the bus. It never reads from feature tables. The feature owns its handler logic, its progress semantics, and the live view that renders job rows back to the user. That boundary is the same read/write split as the rest of orion: the belt gives you a seam; the feature fills it.

jobs.byId(id) — the row a live view renders progress from. One read, no joins; the view owns the surrounding HTML.
jobs.recent(limit) — for job-list views, sorted by recency. Never paginated by the belt — that's the feature's concern.
The jobs topic — subscribe a live region to it and every claim, progress tick, completion, and failure repaints that region. No websocket protocol to design, no polling endpoint to add.

back to the tour→ see it live→