Using Écluse: the operator manual
This is the operator manual for deploying and running Écluse: how to configure it, connect your clients, and fence its network egress so it stays a safe link in your supply chain. It's the companion to the internal architecture documents, which explain the why behind everything here.
Status: pre-launch. Écluse is under active development. This manual is the configuration and operational contract: the env vars, the config schema, the client setup, and the security responsibilities. Features still landing are marked (planned); treat this as the deployment contract, not a claim that every capability below is wired today.
Contents
- What Écluse does
- Deployment model
- The Golden Path
- Configuration
- Connecting your clients
- Securing network egress (required)
- Locking down CI egress (recommended)
- Rule policy
- Operating Écluse
- Planned controls
- Learn more
What Écluse does
Écluse sits between your build (developer machine or CI) and the
upstream registry, and applies a deny-by-default policy before any
package reaches a build. It reads through a private upstream first,
falls back to the public registry with rules applied, and mirrors
approved packages asynchronously. It's a policy gate, not a registry,
and hosts nothing itself. npm is the first supported ecosystem; the
engine is ecosystem-agnostic, with PyPI and RubyGems on the roadmap. The
design is in docs/architecture.md.
Deployment model
Écluse ships as a single reproducible container image, a multicall
executable: ecluse proxy (the HTTP proxy),
ecluse pilot (the OSV ingestion pipeline), or
ecluse dredger (the registry cleanup worker), selected by
the container command. All three roles share one config file and rule
set.
ecluse pilot compile --out DIR runs one OSV compilation
and exits: it fetches an ecosystem's advisory export
(--ecosystem, default npm;
--source URL overrides the configured
osvExportBaseUrl), writes osv.db into
DIR, and exits non-zero on failure, so it's safe to script
and schedule. --upload also publishes the artifact to the
vulnerability-database bucket, making one invocation a full sync cycle;
--upload without a configured bucket aborts
immediately.
The default command runs the proxy process (the HTTP
front door on ECLUSE_PORT, default 8080) plus
the mirror worker. The proxy scales horizontally behind a load balancer,
but Pilot and Dredger must run as singletons: multiple
instances race, duplicate API calls, and overlap registry deletions.
Point your package manager at the proxy as a registry (see Connecting your clients).
Before running a published image, verify its provenance and SBOM attestations: the recipe (keyless Sigstore, Rekor, pinned by digest) is in the README.
The Golden Path
This is the recommended, most resilient way to run Écluse, and the posture the threat model treats as canonical. Aim for it unless you have a specific reason to diverge; each step links to its detail.
- Run three registries, not one. Give the three
internal roles distinct backends: a first-party store
(publication target), a public-derived mirror store
(mirror target), and a pull-through read endpoint that
unions both (
ECLUSE_MOUNTS__NPM__PRIVATE_UPSTREAM). Separating first-party from public-derived inventory lets you scan and police each by provenance, and keeps the mirror auditable. Collapsing onto fewer registries works but muddies auditing and post-incident scoping. The one hard rule: the aggregating endpoint must union trusted stores only, never a direct public upstream, or raw ungated packages reach clients as trusted and bypass the gate. See registry-level composition. - Let callers use their own identity (passthrough). The default credential strategy forwards each caller's token to the private upstream and publication target, so access matches your registry IAM exactly (no escalation) and Écluse holds no standing read credential. This is the default; nothing to set. See access model.
- Mint the mirror-write token from the container
role. Set
ECLUSE_MOUNTS__NPM__CREDENTIAL_PROVIDER=codeartifactso the worker mints a short-lived write token under the task/instance role instead of carrying a static secret (staticis supported but discouraged). Scope that role write-only to the mirror store and keepECLUSE_MOUNTS__NPM__MIRROR_CODE_ARTIFACT_TOKEN_DURATIONshort: it's Écluse's only standing credential and it writes to the trusted store. Scope the mirror queue the same way: a job tells the worker to fetch-and-publish, so grant only the serve roleSendMessageand only the workerReceiveMessage/delete. Anyone who can write to the queue can force a write to the trusted store. - Let the edge own access; leave
ECLUSE_AUTH_TOKENoff. Écluse is not your access boundary. Front it with a gateway, mesh, or IAP that admits only the networks you intend, and restrict reachability both north-south and east-west (pod-to-pod): an ingress-only allow-list that leaves the pod reachable inside the cluster is a common vulnerability. See Connecting your clients. - Fence egress, keep metadata reachable. Default-deny
outbound, allowing only your upstreams, the mirror target, the advisory
bucket when
ECLUSE_VULNERABILITY_DATABASE_BUCKETis configured (the proxy needss3:GetObjecton it to syncosv.db), and the metadata endpoint; reach CodeArtifact and S3 over VPC endpoints; require IMDSv2 with hop limit 1. Don't block the metadata endpoint; Écluse needs it to mint credentials. See Securing network egress. - Make the proxy unbypassable. Deny CI runners (and, where practical, workstations) outbound access to the public registries, so the only route to a package is through Écluse. This turns the policy from default into unbypassable. See Locking down CI egress.
- Verify what you run. Pin the image by digest and verify its provenance + SBOM attestations before deploying (see Verifying the image).
The why behind each choice, and the residual risks this posture accepts, is in the threat model and Security invariants.
Deviating from the Golden Path
Écluse still runs if you diverge, but each deviation trades away a protection, and two are silent (Écluse can't detect them, so nothing warns you):
- Collapsing the registries onto one store (leaving
ECLUSE_MOUNTS__NPM__MIRROR_TARGET/ECLUSE_MOUNTS__NPM__PUBLICATION_TARGETunset). The perimeter still holds, but first-party and public-derived packages share one store, so you lose provenance separation, per-provenance scanning, and clean post-incident scoping. Écluse Dredger refuses to boot ifMIRROR_TARGETequalsPUBLICATION_TARGET, since automated pruning on a shared store risks first-party data loss. (Register threat #10 and #16.) - Pointing the private upstream at a registry that itself
draws from public (say a CodeArtifact repo with the stock
npm-storeupstream to npmjs). This is the dangerous one, and Écluse can't detect it: raw ungated packages reach clients through the trusted read path, behind the gate instead of through it, nullifying the rules, integrity floor, and freshness quarantine. Aggregate trusted stores only into the private upstream (your first-party store plus Écluse's mirror), and let the gated mirror be the only way public content enters. (Register threat #15.)
The other deviations self-announce: an open edge
(ECLUSE_AUTH_TOKEN unset) leans on your network boundary, a
static publish credential fails closed at boot without that edge, and a
static mirror-write secret forgoes the minted token. Each
is covered at its step above.
Configuration
Configuration has two layers: environment variables for process and secret values, and an optional config document (YAML) for the two things too expressive for flat env vars: the rule policy and the mount map. A single-mount npm deployment on the default policy needs no document.
The table below is the complete environment-variable reference. A value resolves as defaults < config document < environment variable, so the environment wins. The resolution model and the rationale behind each setting are in Configuration & Authentication.
Environment variables
| Variable | Required | Default | Description |
|---|---|---|---|
ECLUSE_PORT |
No | 8080 |
TCP port the proxy listens on. Must be in
0..65535 (0 binds an OS-assigned ephemeral
port); an out-of-range value is rejected at load. |
ECLUSE_MOUNTS__NPM__PRIVATE_UPSTREAM |
Yes | URL of the private upstream registry (the
authority for reads under the default passthrough
strategy). |
|
ECLUSE_MOUNTS__NPM__PUBLIC_UPSTREAM |
No | https://registry.npmjs.org |
URL of the public upstream, queried anonymously and gated by the rules. |
ECLUSE_PUBLIC_URL |
Recommended | The proxy's own externally-reachable base
URL (e.g. https://registry.example.com), used to rewrite
each served dist.tarball to an absolute
URL clients fetch back through the proxy. Unset, tarball URLs are
path-relative and the npm CLI can't install from them (it
reads a leading-slash dist.tarball as a file:
path), so set this for any deployment serving real
npm installs. |
|
ECLUSE_MOUNTS__NPM__MIRROR_TARGET |
No | ECLUSE_MOUNTS__NPM__PRIVATE_UPSTREAM |
Registry that approved packages are
mirrored to. Unset ⇒ folds onto the private upstream (one registry, read
and written). The write credential does not fold, set
ECLUSE_MOUNTS__NPM__CREDENTIAL_PROVIDER. |
ECLUSE_MOUNTS__NPM__CREDENTIAL_PROVIDER |
No | codeartifact |
Mirror-target write credential:
codeartifact (mints a short-lived token under the
container/task role, the shipped default) or static (a
fixed ECLUSE_MOUNTS__NPM__MIRROR_TARGET_TOKEN).
gcp-artifact-registry is recognised but not yet built. |
ECLUSE_MOUNTS__NPM__MIRROR_TARGET_TOKEN |
No | Static write token, used when
ECLUSE_MOUNTS__NPM__CREDENTIAL_PROVIDER=static. |
|
ECLUSE_MOUNTS__NPM__MIRROR_CODE_ARTIFACT_DOMAIN |
Depends | codeartifact only |
CodeArtifact domain, or parsed from a
CodeArtifact ECLUSE_MOUNTS__NPM__MIRROR_TARGET host. |
ECLUSE_MOUNTS__NPM__MIRROR_CODE_ARTIFACT_DOMAIN_OWNER |
Depends | codeartifact only |
12-digit owning account id, or parsed from the host (a non-account-id value is rejected at boot). |
ECLUSE_MOUNTS__NPM__MIRROR_CODE_ARTIFACT_REGION |
Depends | codeartifact only |
Region, this key, else the host (its
authoritative region), else AWS_REGION. |
ECLUSE_MOUNTS__NPM__MIRROR_CODE_ARTIFACT_TOKEN_DURATION |
No | Token lifetime in seconds, capped at
43200 (12 h). |
|
ECLUSE_MOUNTS__NPM__PUBLICATION_TARGET |
No | Where client npm publish
(first-party packages) is written. Opt-in: unset ⇒
PUT /{pkg} is 405 (no implicit write
path). May be the same registry as the private upstream. Protect this
surface; see the warning below. |
|
ECLUSE_MOUNTS__NPM__PUBLICATION_TARGET_TOKEN |
No | Static fallback credential for the publication target, forwarded only when a publishing client sends none. The default is passthrough (the publisher's own token). ⚠️ A static token with an open edge lets any unauthenticated client publish under it; see the warning below. | |
ECLUSE_MOUNTS__NPM__PUBLISH_SCOPES |
Conditionally | If
ECLUSE_MOUNTS__NPM__PUBLICATION_TARGET is set |
Comma-separated allow-list of package
scopes a client may publish (e.g. @acme,@beta), the
anti-shadowing guard: a publish outside the list is refused before any
upstream write. It limits names, not callers, and is not authentication.
An empty list with a publication target set is a fail-loud boot
error. |
ECLUSE_QUEUE_BACKEND |
No | sqs |
Mirror-queue backend: sqs
(AWS), or memory (a bounded in-process queue: a
non-durable, best-effort mirror for single-node or air-gapped
deployments, never an automatic fallback, warns loudly at boot).
pubsub (GCP) is recognised but not yet built. |
ECLUSE_QUEUE_URL |
Depends | Cloud backends only | Queue identifier: an SQS queue URL or a
Pub/Sub projects/<p>/topics/<t> resource.
Required for the cloud backends (absent ⇒ fail-loud at
boot); not needed for memory (ignored). |
ECLUSE_QUEUE_MEMORY_MAX_DEPTH |
No | 50000 |
memory only. Cap on
in-process queue depth. An enqueue past the cap is dropped (drop-newest)
and rate-limit-logged; a dropped job re-mirrors on next demand, so it's
safe. Positive integer. |
AWS_REGION |
Depends | AWS backends only | Region for SQS and CodeArtifact. |
AWS_ENDPOINT_URL_SQS /
AWS_ENDPOINT_URL |
No | SQS endpoint override (AWS-SDK-standard).
Point at a local emulator (ministack) or VPC endpoint; with
one set, requests are signed with
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. Unset
⇒ normal AWS resolution. |
|
ECLUSE_GOOGLE_PROJECT |
Depends | GCP backends only | Project for Pub/Sub and Artifact Registry (credentials via ADC). |
ECLUSE_AUTH_TOKEN |
No | If set, clients must present this token
(Bearer / _authToken). Omit for
network-secured deployments. |
|
ECLUSE_MOUNTS__NPM__RESPECT_UPSTREAM_TARBALL_HOST |
No | false |
Secure default. When false, a
tarball is fetched only from the same allowlisted upstream that
served the packument; set true only for a registry
that serves tarballs from a separate CDN/files host (widens the fetch
surface to any allowlisted host). See Securing network
egress. |
ECLUSE_ADDITIONAL_BLOCKED_RANGES |
No | Comma-separated list of CIDR ranges (e.g.
10.99.0.0/16,fd12::/8) an operator adds to the fixed
internal-address block, applied identically across every mount. Extends
the block only, never narrows it; a malformed entry fails closed
at boot. See Securing network
egress. |
|
ECLUSE_HELP_MESSAGE |
No | String appended to every denial message (e.g. a support channel). | |
ECLUSE_LOG_FORMAT |
No | json |
Log shape: json (one JSON
object per line, for log collectors) or console
(human-readable). |
ECLUSE_TELEMETRY |
No | off |
OpenTelemetry master switch. With it
off, no telemetry is emitted. When on, the SDK
reads the standard OTEL_* variables. |
ECLUSE_CVE_SYNC_INTERVAL |
Depends | Pilot only, default 3600 |
How often the Écluse Pilot singleton refreshes the OSV database from upstream. |
ECLUSE_VULNERABILITY_DATABASE_BUCKET |
No | The object-store bucket carrying the
compiled osv.db advisory artifacts. Pilot uploads to it;
the proxy polls it and shadow-swaps fresh artifacts into the rules
engine. Unset, the proxy runs no advisory sync and
AllowIfRemediatesCve abstains. |
|
ECLUSE_CVE_DB_POLL_INTERVAL |
No | 60 |
Proxy only: how often each configured
ecosystem's sync task polls the bucket for a fresh advisory database (a
cheap conditional HEAD). Deliberately independent of, and
more frequent than, Pilot's ECLUSE_CVE_SYNC_INTERVAL:
matching them would nearly double the worst-case advisory age. Positive
integer. |
ECLUSE_MAX_OSV_DB_BYTES |
No | 536870912 |
Proxy only: refuse to download an advisory database larger than this many bytes (default 512 MiB). The declared length fails fast and the streaming download enforces the cap. |
ECLUSE_OSV_DATA_DIR |
No | data/osv |
Directory for the OSV advisory databases:
where Pilot compiles them, and where the proxy lands its synced
per-ecosystem artifacts. During a swap, actual disk use briefly exceeds
what ls or du show: the superseded file is
unlinked while its last readers finish, and the kernel frees the space
when the drained connection closes. |
ECLUSE_OSV_EXPORT_BASE_URL |
No | https://osv-vulnerabilities.storage.googleapis.com |
Base URL of the per-ecosystem OSV advisory
exports Pilot compiles from
(<base>/<ecosystem>/all.zip). Override it if
the upstream moves or you mirror the exports. |
ECLUSE_SHUTDOWN_DRAIN_TIMEOUT |
No | 30 |
Seconds the graceful shutdown waits for in-flight requests and in-progress artifact streams to finish before the process exits. Positive integer. |
ECLUSE_CORES |
No | derived | Cores (GHC capabilities) the process claims. Unset ⇒ derived from the container's cgroup CPU quota (floored, at least 1, clamped to the visible processors); with no cgroup limit either, the runtime's own detection stands. Give the container whole cores; see the runtime sizing note. The boot log prints the decision and its provenance. Positive integer. See Operating Écluse → Runtime sizing. |
ECLUSE_MAX_HEAP_BYTES |
No | derived | Heap ceiling in bytes, enforced by the GHC
runtime (a breach is a clean heap-overflow error rather than a kernel
OOM kill). Unset ⇒ derived from the cgroup memory limit less the nursery
budget and 10% slack; with no cgroup limit, unbounded unless your own
GHCRTS -M says otherwise. Enforcing a ceiling re-executes
the binary once, in place (same PID). Positive integer. |
ECLUSE_SERVE_MAX_IN_FLIGHT |
No | computed | Process-wide cap on concurrent metadata
materialisation (whole packument requests and the public-metadata gate a
tarball miss reaches). Unset, computed at boot as
max(8, 10 x cores) and logged. Over the cap, a request
waits up to 1 second for a slot (a bounded waiting room, no
queue-jumping) and proceeds when one frees; only a request that finds
the room full or waits out that budget gets
503 Service Unavailable with Retry-After: 1.
Trusted private tarball hits, health probes, and local routes stream
outside the cap. Positive integer. A 503 with
Retry-After: 1 is intentional backpressure, not a failure:
exclude it from alerts (a real upstream failure returns 503 without that
header), and a service mesh can auto-retry it. |
ECLUSE_PUBLIC_CONNECTIONS_PER_HOST |
No | computed | Maximum pooled (kept-for-reuse)
connections per public upstream host. Unset, computed at boot as
clamp(32, 1024, nofile / 8) and logged. Connections beyond
the pool still open, but re-handshake TLS each time. Positive integer.
The private pool is sized separately (next row). |
ECLUSE_PRIVATE_CONNECTIONS_PER_HOST |
No | computed | Maximum pooled connections to the private
upstream host. Unset, computed at boot as a quarter of the soft
RLIMIT_NOFILE, clamped to 64-4096, and logged. Sized for
the trusted tarball hit, which streams outside
ECLUSE_SERVE_MAX_IN_FLIGHT. The pool governs reuse, not
socket count. Positive integer. |
ECLUSE_CACHE_TTL |
No | 60 |
Seconds metadata is kept in the shared packument cache. |
ECLUSE_CACHE_MAX_ENTRIES |
No | 1024 |
Maximum number of items the metadata cache will hold. |
ECLUSE_CACHE_MAX_BYTES |
No | 268435456, 256 MiB |
Resident-byte budget for each of the metadata cache's stores (the full-packument store, the single-version store, and the assembled-representation store), so the worst-case total is three budgets. |
ECLUSE_MAX_RESPONSE_BYTES |
No | 12582912, 12 MiB |
Largest upstream metadata body buffered before the fetch aborts fail-closed. Bounds memory against a hostile upstream returning a giant body. Positive integer. |
ECLUSE_MAX_VERSION_COUNT |
No | 100000 |
Largest version count a packument may carry before it is refused. Bounds per-version rule evaluation against a version flood. Positive integer. |
ECLUSE_MAX_NESTING_DEPTH |
No | 64 |
Deepest JSON nesting a decoded upstream document may reach before it is refused. Bounds CPU/stack against a pathologically nested payload. Positive integer. |
ECLUSE_MIN_PUBLIC_INTEGRITY |
No | sha256 |
Minimum integrity algorithm a
public (untrusted) version's digest must meet:
sha256, sha384, sha512, or
blake2b. A weaker or absent digest is refused with
403. Hard-floored at SHA-256:
sha1/md5/an unknown name is rejected at
startup. The trusted path has its own loosenable floor
(ECLUSE_MIN_TRUSTED_INTEGRITY). |
ECLUSE_MIN_TRUSTED_INTEGRITY |
No | sha256 |
Minimum integrity algorithm a
trusted (private) version's digest must meet. Defaults
to sha256, so a SHA-1-only or hashless private version is
dropped like a public one, but unlike the public floor is
loosenable below SHA-256
(sha1/md5) for a legacy private mirror. An
unknown name is rejected at load. |
Configuration is validated in full at startup and the process refuses to start on any problem (an unknown rule type, a bad URL, an unresolved policy reference): a misconfiguration is a loud, immediate failure, never a quietly mis-enforced policy.
⚠️ The first-party publish surface authorises names, not callers. With publishing enabled (
ECLUSE_MOUNTS__NPM__PUBLICATION_TARGET), theECLUSE_MOUNTS__NPM__PUBLISH_SCOPESallow-list limits which package names may be published; it's not authentication and says nothing about who may publish. So a staticECLUSE_MOUNTS__NPM__PUBLICATION_TARGET_TOKEN(used only when a publisher forwards none) is fail-closed: set it withoutECLUSE_AUTH_TOKENand Écluse refuses to start (PublishStaticCredentialNeedsEdge), so "static publish credential + open edge", which would let any unauthenticated client publish under the operator's credential, is unrepresentable.ECLUSE_AUTH_TOKENis the edge Écluse can verify itself; an external layer (gateway, mTLS, network policy) is good defence-in-depth but doesn't satisfy this. Pure passthrough (no static token, the default) needs none of this. See Access model → Publishing.
The configuration document
A YAML file mounted at /etc/ecluse/config.yaml. It
carries the rule policy (see Rule policy) and, for multi-mount deployments,
the mount map. Single-mount deployments desugar from
the env vars above and need no document. Schema and examples: Configuration
& Authentication.
Deployments derive their initial policy from the default
baseline configuration (config/default.yaml).
Secrets
Secrets never live in the config document. Client and registry tokens
are always env vars, and cloud-managed registries (CodeArtifact /
Artifact Registry) derive short-lived tokens from ambient cloud
credentials. Écluse always holds a mirror-target write
credential; reads follow the mount's credential
strategy: passthrough (default) forwards the client's
own token to the private upstream and strips it before the public one,
service reads with Écluse's own credential. See Outbound
Registry Credentials.
Connecting your clients
Point your package manager at the proxy as its registry. With
ECLUSE_AUTH_TOKEN set, supply it the standard npm way:
# .npmrc
registry=https://ecluse.example.internal/
//ecluse.example.internal/:_authToken=${ECLUSE_TOKEN}Edge authentication to the proxy has three modes (and feeds the mount's credential strategy, which decides how the upstreams are then credentialled):
- Open:
ECLUSE_AUTH_TOKENunset; access control is delegated entirely to the network layer (VPC, service mesh). Appropriate only on a closed network. - Static token:
ECLUSE_AUTH_TOKENset; clients send it asAuthorization: Bearer <token>or.npmrc_authToken. - Trusted edge identity: a fronting gateway / IAP /
mesh asserts a verified identity. Écluse honours it only over a
verifiable binding to that edge (mutual TLS, or a shared secret
/ HMAC on the asserted identity), and refuses to start
a
trusted-edgemount with neither. A bare trusted header is forgeable wherever the proxy is reachable off the edge, so restrict reachability to the edge east-west as well as north-south.
Securing network egress (required)
Écluse fetches from the registries you point it at, and some URLs it
follows (a version's dist.tarball) come from upstream
responses. Apply least-privilege egress in two layers. Écluse provides
the first in the application, with an origin-aware trust
model:
- Untrusted origins: the public-upstream fetch and
every
dist.tarballfetch are gated by a host allowlist (Écluse dials only your configured upstream hosts), fetched HTTPS-only with TLS certificate validation, and bounded by response-size limits. A non-HTTPS upstream fails closed at boot, and adist.tarballis normalised to HTTPS or refused (below). Certificate validation closes the resolve-to-internal and DNS-rebinding SSRF class: an address a name is steered to can't present a CA-trusted certificate for the host. A pure literal internal-range block (loopback, link-local incl. the169.254.169.254metadata endpoint, unspecified0.0.0.0/8/::, RFC1918, CGNAT, IPv6 ULAfc00::/7incl.fd00:ec2::254) stays as cheap defence-in-depth on thedist.tarballhost: a tarball whose host is an internal-address literal is refused. Extend it withECLUSE_ADDITIONAL_BLOCKED_RANGES(comma-separated CIDRs, every mount alike); it only ever widens, never narrows. - The trusted private origin
(
ECLUSE_MOUNTS__NPM__PRIVATE_UPSTREAM) is deliberately not subject to the internal-range block: a private registry legitimately lives on your internal network.
SSRF to the instance-metadata endpoint is prevented in the
application, not by blocking metadata at the network. An untrusted
upstream or dist.tarball can't steer a fetch to
169.254.169.254: the proxy dials only allowlisted hosts
over HTTPS with certificate validation, and the literal block refuses a
dist.tarball whose host is that address. Écluse's own
metadata access goes through the AWS SDK to mint its instance-role
credentials, so don't deny the proxy egress to metadata or internal
ranges, that breaks its own credentials.
Provide the second layer at the platform, protecting your data targets (registries, mirror):
- Require IMDSv2, hop limit 1 (AWS
httpPutResponseHopLimit: 1): keeps the proxy's own credential minting working while stopping a neighbour or forwarded request from reaching metadata through extra hops. Don't deny egress to169.254.169.254outright; Écluse needs it for credentials. - Default-deny egress, allow only your registries + mirror
target.
- AWS: security-group egress rules / network ACLs to the upstream and mirror CIDRs (plus the metadata endpoint the instance role needs).
- GCP: VPC firewall egress rules and, where applicable, VPC Service Controls.
- Kubernetes: a default-deny
NetworkPolicywith an explicit egress allowlist; allow your private upstream's internal range. - Service mesh (Istio/Linkerd): set the sidecar
outbound policy to
REGISTRY_ONLY, declare each upstream as aServiceEntry, and constrain it with aSidecaregress listener and an egressAuthorizationPolicy.
- Grant the proxy only the cloud permissions it
needs: the mirror-write credential, the advisory-bucket read
(
s3:GetObject) whenECLUSE_VULNERABILITY_DATABASE_BUCKETis set, and (under theservicestrategy) the private-read credential, nothing more.
The dist.tarball host policy.
dist.tarball is upstream-chosen, so by default Écluse
fetches a tarball only from the same allowlisted upstream that served
the packument; a different host is refused even if allowlisted. If your
registry serves artifacts from a separate CDN/files host (the
PyPI-files-host shape), set
ECLUSE_MOUNTS__NPM__RESPECT_UPSTREAM_TARBALL_HOST=true to
allow any allowlisted host. It never escapes the allowlist or
internal-range block, but widens the fetch surface, so opt in
deliberately.
The rationale is in Security: outbound-request and input-validation invariants.
Securing Écluse Pilot and Dredger
The auxiliary services (the Écluse Pilot ingestion pipeline and the Écluse Dredger reaper) need distinct, tightly scoped egress, and both must run as singletons (one replica).
- Écluse Pilot: no public ingress. Egress to
osv.dev(raw advisories), the instance-metadata endpoint (credentials), and your object store (S3/GCS) withs3:PutObjectto uploadosv.db. The object is named<ecosystem>-osv-schema<N>.db(e.g.npm-osv-schema1.db,N= the table-schema epoch); the key is stable per ecosystem, so bucket policies and the proxy's ETag polling can target it. Onosv.dev5xx/408/429, Pilot retries with capped, jittered backoff, then logs and waits the fullECLUSE_CVE_SYNC_INTERVAL, so a transient outage can't get your NAT address rate-limited. To avoid an idling pod, schedule the one-shot instead:ecluse pilot compile --out /tmp/osv --uploadas aCronJobwithconcurrencyPolicy: Forbid(which preserves the singleton). - Écluse Dredger: no public ingress. Egress only to your private mirror (Registry B) for delete requests and to the instance-metadata endpoint for credentials. It holds a standing high-privilege delete capability, so isolate it from all untrusted networks.
Locking down CI egress (recommended)
The controls above secure Écluse's own egress. This one secures your consumers', turning Écluse from a proxy clients are asked to use into the registry they can only reach.
If you control CI, deny runners outbound access to the public
registries (registry.npmjs.org and the equivalents
for other ecosystems) and let them reach only Écluse and your internal
services. Point the runners' package managers at Écluse.
Now a misconfigured job (a stray --registry flag, a
committed .npmrc at the public registry, a tool that
ignores your settings) can't quietly bypass the policy: it can't reach
the public registry, so it fails instead of pulling an unvetted package.
You depend on the network you administer centrally, not on every job
being configured correctly.
This is what makes the policy unbypassable rather than merely default: per-project package-manager and version-manager setups (npm/pnpm config, nvm, Nix shells, containers) can override what you ship to a machine, but none can route around a network that only reaches Écluse. See MOTIVATION → The bar.
The same idea extends to developer workstations (tarball fetches only through Écluse on a managed network, browsing and search left open), though workstations are a softer control than CI.
Rule policy
Écluse evaluates a named map of rules over a built-in deny-by-default policy: a package is admitted only if a rule allows it, and every deny type outranks every allow type by default, so a matching deny wins. The shipped default is small and biased toward resilience rather than blanket bans:
min-age: admit public versions older than a quarantine window (7 days by default), the core defence against race-to-publish typosquatting and dependency confusion. On at launch.AllowIfRemediatesCve(remediation-fast-track): admit a release a synced advisory names as its exact fixed version ahead of the quarantine, provided no other advisory still affects it. On at launch; it abstains until an advisory database has been synced (setECLUSE_VULNERABILITY_DATABASE_BUCKETand run Pilot), so without one only the quarantine governs. It's a deliberate exact match onfixed: a fix under any other version string waits out the quarantine, withAllowByIdentityas the workaround.AllowByIdentity: admit a specific package orpackage@versionpast the quarantine (e.g. a security fix the exact-match probe can't see), at the top of the allow band but still below every deny. Available.revoke: a hard-deny (DenyByIdentity) rule for a specific package orpackage@version, at a precedence above the scope allow-list. Available.
You override values, add rules (e.g. opt into
DenyInstallTimeExecution), or suppress a default by name in
the configuration document:
{
"rules": {
"min-age": { "ageSeconds": 1209600 },
"deny-scripts": { "type": "DenyInstallTimeExecution", "precedence": 200 },
"revoke-bad": { "type": "DenyByIdentity", "identity": "bad-package" },
"cve-fast-lane": { "type": "AllowIfRemediatesCve" },
"pin-fix": { "type": "AllowByIdentity", "identity": "left-pad@1.3.0" }
}
}Full semantics (precedence, the patch/add/suppress merge, and the strict validation) are in Rule policy and Rules Engine.
Always-on: a public version must carry a strong integrity digest
Independent of the rules above, one admission policy is
non-negotiable on public (untrusted) upstreams: a
version is served only if its dist carries an integrity
digest meeting the integrity floor
(ECLUSE_MIN_PUBLIC_INTEGRITY, default
SHA-256). SHA-1 and MD5 have practical collisions, so a
weak-or-absent digest could let a substituted artifact pass. A public
version whose strongest digest is absent or below the floor (e.g. only a
legacy SHA-1 shasum) is inadmissible: its tarball returns
403 and it's filtered from the served packument, so a
client never sees a version it couldn't safely fetch.
The floor may be raised (sha512,
blake2b) but never lowered; a sub-floor value is rejected
at startup. The trusted private path has its own floor,
ECLUSE_MIN_TRUSTED_INTEGRITY, also defaulting to
sha256 (so a SHA-1-only private version is dropped too) but
loosenable below SHA-256
(sha1/md5) for a legacy private mirror, where
trust substitutes for cryptographic strength.
Gotcha. A custom or off-spec public upstream serving
versions without a floor-meeting digest will have those versions
silently disappear, and direct fetches 403. This is
deliberate. To serve such a source, point it at the
private upstream slot and loosen
ECLUSE_MIN_TRUSTED_INTEGRITY below sha256.
Operating Écluse
Pre-warming the cache. A cold
npm installagainst an empty cache hits the proxy with dozens of heavy requests at once, causing latency spikes or503backpressure. Pre-warm as part of deployment: run annpm install(or a script fetching your heavy dependencies) after starting Écluse, before sending production traffic. Once warm, request coalescing absorbs spikes.Health probes.
GET /livezreports process liveness (a stalled mirror worker fails it);GET /readyzreports config loaded and the listener serving. Readiness is deliberately lenient about public-upstream reachability, so a transient blip doesn't pull a healthy pod from rotation. With an advisory bucket configured, readiness also waits for each configured ecosystem's first advisory sync (a one-way flip per ecosystem, so it never flaps); the listener serves throughout, since an absent advisory database abstains into deny-by-default, so the gate governs routing, not whether the process answers. Mounting an ecosystem whose artifact Pilot never publishes declares a sync that never arrives, so the pod never reports ready. The npm liveness probeGET /-/pinganswers locally with200 {}. Pilot and Dredger export the same/livezand/readyzonECLUSE_PORT.Logs. One JSON object per line by default (
ECLUSE_LOG_FORMAT=json), orconsolefor local development. Bearer tokens render as a redacted placeholder, so token material never reaches a log field.Telemetry (opt-in). OpenTelemetry traces and metrics are off by default; set
ECLUSE_TELEMETRY=on. SetDD_*(DD_SERVICE,DD_ENV,DD_VERSION,DD_AGENT_HOST) for Datadog or the standardOTEL_*for any other backend;DD_*wins where both are set, and the resolved identity stamps both traces and theddobject on every log line.DD_API_KEY/DD_SITEare ignored: Écluse only exports to a node-local collector or Agent.- You declare the destination. Export goes to
http://localhost:4318by default, or whereverDD_AGENT_HOST/OTEL_EXPORTER_OTLP_ENDPOINTpoints. Écluse doesn't gate it; for a remote collector, authenticate out of band withOTEL_EXPORTER_OTLP_HEADERS. - Never on the request path. Export is async and batched, so an unreachable collector never slows a request; an absent endpoint logs one boot warning and falls back to localhost, and persistent errors throttle to a periodic heartbeat.
- You declare the destination. Export goes to
Search.
GET /-/v1/searchreturns501by design: search is a discovery convenience, not an install path. Use the public registry's website.Runtime sizing (cores and memory). At boot Écluse resolves how many cores to claim and what heap ceiling to run under, logging each decision with its provenance, so the posture is readable from the start-up lines. Resolution order per knob:
- Explicit config wins:
ECLUSE_CORES(orcores) andECLUSE_MAX_HEAP_BYTES(maxHeapBytes), positive integers. - Otherwise derive from the cgroup (v2): the CPU quota, floored (at least 1) and clamped to visible processors; the memory limit less the nursery budget (cores x allocation area) less 10% slack, floored at half the limit.
- No limit either way: the GHC runtime's own
resolution stands (its defaults plus any
GHCRTS), and aGHCRTSheap ceiling you set is never overridden.
Give Écluse whole cores. A fractional CPU limit (say 3.5) has no good option: claiming 4 capabilities overruns the CFS quota during stop-the-world GC, freezing the process mid-pause; flooring to 3 never self-throttles but strands the fraction. Écluse floors the derived count, so pair an integer limit with
requests = limits(and exclusive cores where offered) to remove throttling structurally. A CPU limit doesn't shrink the processor count the runtime sees, so withoutECLUSE_CORESa 2-CPU pod on a 32-core node would claim 32 capabilities and 32 nurseries. Enforcing a heap ceiling needs runtime flags fixed at start, so Écluse re-executes its own binary once, in place (same PID), loggingruntime: re-launching with GHCRTS ...first.- Explicit config wins:
Runtime memory arithmetic (proxy pod). For the proxy role; the other roles differ (Pilot runs a scheduled compute, the Dredger follows its pruning rules), so tune their allocation area via
GHCRTSseparately, though the cores/heap resolution above still applies to every role. The binary ships-A64m -n4m(a 64 MiB per-core allocation area in 4 MiB chunks), trading bounded extra memory for far fewer GCs under load. Budget roughlycores x 64 MiBof nursery, plus the live heap (dominated by the metadata cache), plus up to one live-heap of copying headroom during a major GC. Worked shapes: a 2-CPU / 512 MiB pod runs as-is; a 2-CPU / 256 MiB pod also needsGHCRTS="-A16m"; a 4-CPU pod wants ~750 MiB on defaults, or 512 MiB with-A32m. Taller pods amortise the cache and coalescing better, so prefer 4-CPU-ish shapes. Tune the allocation area withGHCRTS; the boot log prints the effective value.Revoking a mirrored version (internal yank). The mirror store (Registry B) deliberately resists upstream yanks, so a benign yank doesn't break your installs, but a version later found malicious isn't removed automatically (Écluse never re-gates trusted content). Usually this resolves itself: once the public registry yanks the bad version its bytes change or vanish, re-mirroring can't reproduce them, and you purge the stale copy from Registry B at leisure. When your own scanning is ahead of the public yank, revoke in order: (1) deny the identity (a
DenyByIdentityrule), so the serve path stops admitting it and the worker stops re-mirroring, then (2) purge that version from Registry B. Order matters: purge alone is a treadmill, since while the version is live upstream the next install re-admits and re-mirrors it.
Planned controls
Documented ahead of implementation so the configuration surface is known.
- GCP backends (planned): the
Pub/Sub
MirrorQueueand ADC credential leaf. The AWS equivalents (SQSMirrorQueue, CodeArtifact credential leaf, mirror worker, composition root) are built and wired. DenyIfCVErule (planned): a hard-deny over the OSV advisory index. Its allow-side counterpart,AllowIfRemediatesCve, has shipped (see Rule policy).
The full deployment runbook ships with the launch.
Learn more
The internal design, for when you need the why:
- Architecture overview
- Configuration & Authentication
- Security invariants & network egress
- Threat
model, the STRIDE register, generated from the OWASP Threat Dragon
model (
threat-modelling/ecluse.json) - Rules engine
- Multi-ecosystem hosting & URL rewriting
- Release & supply-chain operations