go · superfly/fsm · devicemapper · sqlite

container
image
orchestrator

a durable, crash-resistant container image pipeline built on the superfly/fsm library. two-phase workflow: a deterministic prepare phase that runs once per image family, and a per-invocation activate phase that creates a fresh copy-on-write snapshot.

phase 1: prepare phase 2: activate boltdb crash recovery dm thin provisioning content-addressed storage
~/fsm
$ ./fsm --golang // check if already prepared prepared = false // phase 1 — prepare-golang FSM FetchManifest FSM DownloadBlobs FSM PrepareThinBase FSM UnpackIntoBase prepared = true ← SQLite // phase 2 — activate-a3f9 FSM ActivateSnapshot // done mount= /mnt/images/412930

// 01 — system overview

Architecture

two independent FSM workflows share a single SQLite database and BoltDB state store. prepare runs with a deterministic ID per family — crash and restart and it picks up from the last completed step. activate always creates a fresh thin snapshot.

S3 Bucket
public bucket
anonymous access
image layers
images/{family}/*
PHASE 1 — prepare-{family}
1FetchManifest — list S3 keys
2DownloadBlobs — fetch layers to blobs/
3PrepareThinBase — create DM volume + ext4
4UnpackIntoBase — extract OCI tarballs
prepared=1 written to SQLite by main.go
PHASE 2 — activate-{uuid} (each invocation)
1ActivateSnapshot — thin clone from base
mounted at /mnt/images/{snap_lv_id}
SQLite
fsm.db
WAL mode
images, blobs,
activations, locks
DeviceMapper
thin pool
base volumes
cow snapshots
/dev/mapper/*
BoltDB
FSM state
./fsmdb/
crash recovery
per-step checkpoints

// 02 — fsm workflow

Workflow Animator

each step is a durable BoltDB checkpoint. crash at any point and resume picks up from the last completed transition. click a step to inspect it, or press play to walk through the state machine.

ready
press play to walk through the workflow, or click any step to inspect it.
step 0 / 0

// 03 — design decisions

Key Decisions

the refactor was driven by one principle: trust the FSM to know which steps ran. remove everything inside steps that duplicates what BoltDB already tracks.

// 01
two-phase FSM design
prepare and activate have different semantics. prepare is idempotent — it runs once per image family forever. activate creates a fresh snapshot every invocation. separating them into distinct FSM workflows makes each phase's intent explicit and independently resumable.
architecture
// 02
deterministic run ID
using prepare-{family} as the FSM run ID means crash recovery naturally resumes the exact same run via BoltDB. no need for a prepared flag inside steps — the FSM already knows which transitions completed. the flag is only checked in main.go before deciding whether to start phase 1.
crash recovery
// 03
trust the response chain
each FSM step receives state from the previous step via r.W.Msg. re-querying the DB inside a step for data already in the response chain defeats the purpose of durable state management. PrepareThinBase and ActivateSnapshot were simplified to read r.W.Msg.BaseLvID directly.
fsm principle
// 04
prepared flag lives in main.go
checking prepared inside UnpackIntoBase says "don't trust the FSM to know if this step ran." that undermines the entire point of using a durable state machine. the flag is checked once in main.go before starting phase 1, then set there after phase 1 completes.
correctness
// 05
fsm.Abort for non-retryable errors
if S3 returns zero layers, retrying won't help — the family doesn't exist. fsm.Abort(err) signals the library to stop immediately instead of burning all retry attempts on a hopeless operation. blob download failures propagate normally so the FSM retries them on transient S3 errors.
error handling
// 06
WriteResult removed as FSM step
writing a text file is not a meaningful crash checkpoint. if the process dies after mounting a snapshot, there's no need to re-mount to write a file — the result can be derived from SQLite. moved to main.go after both FSMs complete, making the step graph honest about what actually needs durability.
simplicity