Built with AI, and how to verify it
Écluse is a supply-chain security tool, and I built much of it leaning on an LLM harder than I ever have. If that makes you nervous, good. Here's the honest version: what's mine, what the AI did, and why you don't have to trust either of us to use it.
What's mine, and what's the AI's
- Mine: the design. The architecture, the three-registry origin model, the deny-by-default rules engine, the security invariants (outbound egress / SSRF, identifier canonicalisation, response bounds), and the planning. That's the result of months of thinking about how a small team gets this protection without enterprise licence fees or a registry of its own. I've worked in typed functional programming for years, so Haskell is familiar ground. I own the design and the safety-critical paths, and I can explain them.
- The AI's: the implementation. Writing the Haskell that turns the design into code is mostly done with an LLM right now, for speed. I'm not hands-off: I read every PR and go through the code line by line, just not as tightly as I will near release, since it's still under frequent revision. The release-grade pass comes before release.
How I keep that honest
- I decide. I act as the architect, own the
requirements, and review and merge every PR. The agent does the work but
never merges and never pushes to
main. The process is public, inplanning/orchestration-strategy.md. - Escalate, don't guess. On anything ambiguous,
missing, or contradictory, the agent stops and asks rather than
inventing its way past it. No made-up config or API behaviour, no
quietly-weakened tests, no
undefinedleft in. - A second set of eyes. Each slice is reviewed by separate agents with fresh context: first for whether it does what was asked, then for security and code quality.
- The compiler does a lot of the work. Totality,
-Werror, parse-don't-validate, and invariants baked into the types (you literally can't represent a mount at the root) throw out a category of confident-but-wrong output before anyone reads it. - Tests, not "it works." Every acceptance criterion
has a deterministic test; the rules engine's deny-precedence is
property-tested with Hedgehog; changed lines clear 85% coverage;
Semgrep, lint, and a hermetic Nix build must pass. Details in
docs/testing.md.
Nothing ships until I've audited it
I'm not cutting a release until I've been through the whole codebase
line by line, the way you read code you're about to hand someone else to
run. Écluse is pre-launch on purpose: don't put it in front of a build
yet. The "understand and explain every line" bar that CONTRIBUTING.md
sets for contributors is the bar for release.
You don't have to trust me. Check it
- The image is bit-for-bit reproducible: rebuild from pinned source and diff it against what's published.
- Every release ships keyless SLSA provenance and an SBOM in a public transparency log.
- The security-critical behaviour is in the types and the property tests: read the deny-by-default property and the egress guards.
If you can verify the output, you don't have to trust how it was made. (More in Release & Supply-Chain Operations.)
Why this is public now
I'm sharing Écluse pre-launch to get the design picked apart while
changing it is still cheap; I don't have a budget for an outside
security review yet. So please try to break it: the origin model, the
upstream merge, the deny-by-default rules, the egress story. Start with
MOTIVATION.md and the architecture.
The heavy LLM use is a property of bootstrapping and will taper; the
design is what I'm not backing off.