ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Proxy

Description

Écluse (package ecluse) sits between consumers (developers, CI) and a package registry, applying a configurable resilience policy before any dependency reaches a build, without hosting packages itself. The name is French for a canal lock: a chamber whose gates never open at once. Every dependency is held and cleared through that controlled passage before it is admitted to a build.

The goal is resilience, not malware detection: shrink the blast radius of a bad publish (a hijacked maintainer account, a race-to-publish, a typosquat) rather than promise to recognise malice. Écluse is not a registry: storage is delegated to whatever backend the operator runs (AWS CodeArtifact, GCP Artifact Registry), and Écluse only governs what may be fetched from, and mirrored to, those backends. npm is the first ecosystem; the domain model is ecosystem-agnostic so PyPI and RubyGems can follow.

How a request is cleared

Écluse speaks a registry's native protocol across three read-path registries (the client's, a private upstream of already-vetted packages, and the public registry), and the two request shapes use them differently:

  • A tarball request is gated for that one version: a private-upstream hit is streamed unfiltered (already vetted); on a miss, the proxy fetches the version's public metadata, evaluates the rules, and either streams it from public and enqueues an asynchronous mirror job or returns a denial.
  • A packument (metadata) request is a merge: the private and public upstreams are fetched in parallel, public versions are filtered by the rules while private versions are trusted, and the two are combined into one document (private wins a version collision, an integrity divergence is flagged as a supply-chain signal, and latest is repointed to the newest survivor).

Two properties run through both shapes: the rules engine is deny by default (a version is admitted only if some rule allows it and none denies it), and mirroring is demand-driven, so only versions actually pulled are mirrored, never on the request's critical path.

How the code is organised

Écluse is a functional core with effects at the edges: the policy and protocol logic is pure and trivially testable, and IO is confined to a thin shell. Swappable backends sit behind handles (records of functions chosen at a single composition root), so a new cloud or a new ecosystem is an added implementation behind an existing handle, not a structural change.

The library's vocabulary, roughly from the pure core outward:

run is the entry point the ecluse executable invokes (see Main). It lives in the library, not in app/Main.hs, so the composition root is a single importable unit and app/Main.hs stays a thin shell that only calls it.

Further reading

docs/architecture.md is the systems-design index: the vision, the end-to-end request lifecycle, and a map to the per-concern design documents. CONTRIBUTING.md covers the codebase layout and testing strategy, and STYLE.md the coding and documentation conventions.

Synopsis

Documentation

runProxy :: BootEnv -> IO () Source #

Start Écluse: the entry point the ecluse executable runs (see Main).

Assemble the composition root from configuration. Parse the environment layer and the optional config document, validate everything and fail fast at boot on any problem (a malformed env, an unresolved rule policy, a configured mount with no adapter, a credential reference that does not resolve, or a mirror-queue backend not built in this binary), aggregating the failures so a single run reports them all. On success, build the handles (the shared HTTP Manager, the config-selected mirror queue, the metadata cache, the logger, the process-global credential provider, and the telemetry substrate, off unless ECLUSE_TELEMETRY enables it) into an Env, derive the served mount bindings, then run the server and the mirror worker concurrently over that single Env (runServer and runWorker). Bracketing the Env (and the telemetry providers) for the lifetime of both tears down their shared resources along every exit path.

runServer :: ServerConfig -> Env -> IO () Source #

Run the proxy's HTTP front door over the composition-root Env with the config-derived ServerConfig.

This is the npm-aware composition site: mountBindingFor mounts npm -- its path grammar (Ecluse.Core.Registry.Npm.Route) and its denial renderer (Ecluse.Core.Registry.Npm.Serve) -- into the otherwise ecosystem-neutral web layer (runServer), so the agnostic server stays closed over the shared Route set and only this one place names an ecosystem. Splitting the server into its own binary later reuses this same entry.

runWorker :: WorkerPolicies -> Env -> IO () Source #

Run the supervised mirror worker over the composition-root Env and the per-ecosystem re-evaluation bundles: the consume → re-evaluate → fetch → verify → publish → ack loop against the queue, the publish-side registry client, and the credential handle, in the worker monad (WorkerM) over the worker runtime (workerRuntimeOf). The bundles carry the same prepared rules and public origin the serve path gates with, so the worker re-runs current policy against a job before mirroring it.

This is the composition-root hoist point: it resolves the request-independent dd correlation object (the service identity; no span is active at the worker entry) and installs it as the worker's initial katip context, then discharges the loop to IO through runWorkerM, the worker analogue of the serve path's runHandler boundary. The loop logic lives in Ecluse.Core.Worker; the single-process program runs this alongside runServer.

npmServerConfig :: ServerConfig Source #

The fallback server settings: a single npm mount with no packument-serve or publish dependencies, so the packument route is the recognised-but-unserved 501 stub and a publish is 405 (no publication target). Exposed so the composed front door can be driven directly without binding a socket (e.g. embedded in another wai application, or exercised in tests through application) to assert the routing and the unwired-mount surface; a real launch derives its bindings from configuration in run.

mountBindingFor :: Ecosystem -> Maybe PackumentDeps -> Maybe PublishDeps -> Maybe MountBinding Source #

Resolve an Ecosystem to its complete MountBinding, or Nothing when that ecosystem has no adapter wired. The ecosystem selects its path grammar (the Classifier) and its denial renderer (the MountRenderer), and its path prefix is derived from it (prefixFor) rather than configured, so the ecosystem is the single thing that drives the binding (see docs/architecture/hosting.mdMounts). The composition root supplies the packument-serve dependencies once the per-mount registry set is resolved; Nothing for them leaves the packument route the recognised-but-unserved 501 stub.

npm is the only ecosystem with an adapter; the others have no registry client or renderer, so they resolve to Nothing, a loud miss at the call site rather than a silently half-wired mount.

unconfiguredRegistry :: RegistryClient Source #

A registry handle with no backend behind it: every effectful field __refuses loudly__ (a typed RegistryUnconfigured) and every pure parse* field returns Left, so an unconfigured fetch/publish or parse fails explicitly rather than silently returning a fabricated success. It holds the handle slot in the composition root where a configured backend is selected elsewhere.

planCveSync :: AppConfig -> IO (Map Ecosystem CveSyncHandle) Source #

Build the advisory-sync plan from config: nothing without a configured vulnerability-database bucket; otherwise one CveSyncHandle per configured mount ecosystem, each against its own stable per-ecosystem object key and canonical on-disk path under the OSV data dir. Prepares the data dir (created if missing; stray .tmp downloads from an interrupted run swept) so the sync tasks start clean. Note the readiness consequence: an operator who mounts an ecosystem Pilot does not compile has declared an artifact that never arrives, and the pod honestly never reports ready.

data CveSyncHandle Source #

One configured ecosystem's advisory-sync wiring.

Constructors

CveSyncHandle 

Fields

  • csSlot :: CveSlot

    The slot this ecosystem's mount rules borrow through.

  • csReady :: TVar Bool

    The one-way first-sync readiness flag.

  • csEnv :: SyncEnv

    The sync task's environment.

cveRuleDepsFor :: Map Ecosystem CveSyncHandle -> BreakerReporter -> Ecosystem -> RuleDeps Source #

The rules' boot-bound capabilities for one mount ecosystem: the CVE lookup borrows through that ecosystem's own slot when the sync plan carries one, and abstains otherwise, so a mount's rules can never read a neighbouring ecosystem's advisory database.

cveSyncReady :: Map Ecosystem CveSyncHandle -> IO Bool Source #

The readiness gate over the sync plan: ready once every configured ecosystem's advisory database has first-synced. The flags flip one way, so readiness never flaps on this; an empty plan (no bucket) is vacuously ready.

cveSyncScheduleFor :: AppConfig -> SyncSchedule Source #

The sync tasks' timing: the shipped boot burst over the configured poll interval. The microsecond conversion cannot wrap: the config decoder bounds the interval to [1, maxBound div 1_000_000] seconds.