| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Server.Pipeline.Packument
Description
The serve paths behind the package routes: the packument merge behind
GET /{pkg}.
This is the data-plane handler module for packuments. It composes the
slices that decide what to serve -- the registry client
(Ecluse.Core.Registry.Npm), the per-version rules (Ecluse.Core.Rules), the structural
filter (Ecluse.Core.Registry.Npm.Filter), the cross-upstream merge
(Ecluse.Core.Package.Merge), the metadata cache (Ecluse.Core.Server.Cache), the
own-ETag conditional (Ecluse.Core.Server.Conditional), and the serve-outcome status
(Ecluse.Core.Server.Response) -- into one action in the
Handler reader, reading its mount's serve dependencies and
the request runtime ServeRuntime from the request's
RequestCtx.
Credential authority
This handler implements the default passthrough credential posture (see
docs/architecture/access-model.md). The invariant that holds under every
strategy is the public strip: the client's credential is __stripped before any
public-upstream fetch__, which is always anonymous -- sending an internal token to the
public registry would be a credential disclosure, so the public-upstream fetch is built
with no token at all. Under passthrough the client's own credential is additionally
forwarded verbatim to the private upstream, which is the authority for who may
read what. The two origins are fetched concurrently, each with its own credential
posture; nothing shares a token across the trust split.
Because passthrough makes the private upstream the per-client authority, its
metadata is not cached across clients here: the private origin is fetched and parsed on
every request with that client's own credential, so the upstream re-authorises each
client itself, and only the anonymous public origin is cached (one shared document, no
per-client authority to preserve). Caching the private origin keyed by base URL alone
would let one client's cached entry serve another client's private document within the
TTL, bypassing the upstream's authorisation -- a cross-client disclosure. (Other
strategies make the private origin shareable by authorising each serve differently; the
metadata cache itself stays credential-free regardless -- see
docs/architecture/access-model.md → Caching.)
Merge, not fallback
A packument is the set of available versions, spread across upstreams, so it is
merged rather than short-circuited on a private hit (see
docs/architecture/registry-model.md → "Packument merge across upstreams").
Private versions are trusted and enter unfiltered; public versions are gated
through the rules and the structural filter (the FilterPlan's survivors restrict
the typed view) before they enter; the two are combined, private winning a collision and
an integrity divergence flagged. If one upstream
is unavailable while the other succeeds, the best-effort union of what resolved is
served -- only when nothing resolves does the request error.
Decision surface vs served surface
The merge and filter reason over the typed PackageInfo but the document served
is the raw upstream JSON, so every unmodeled wire key survives (see
docs/architecture/registry-model.md → "Decision surface vs served surface").
The MergePlan names, for each surviving version, the source that won it; the
served body is assembled in one pass by the mount's assembly hook
(assembleMergedPackument for npm): each
survivor's object is taken from the raw Value of its winning source with its
tarball URL rewritten under the mount base as it is placed, the reconciled
dist-tags and time are carried from the plan, and every other top-level key is
relayed from the precedence-winning document. The typed model is never
re-serialised. The two fields the merge owns as a decision -- dist-tags.latest
and the time instants -- are re-rendered from that decision (the times as
normalised ISO-8601), so they may differ byte-for-byte from any single upstream
while denoting the same value; integrity-bearing fields (dist.integrity,
dist.tarball up to the rewrite's own prefix) are relayed raw and untouched. The
served bytes get our own ETag, since a merged/filtered body matches no single
upstream's.
Synopsis
- servePackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
- headPackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
- withPublicMetadataClient :: ServeRuntime -> PackumentDeps -> Text -> (MetadataClient -> IO a) -> Handler a
- packumentETag :: Text -> PackageName -> [(Provenance, ContentDigest, [Text])] -> ETag
Documentation
servePackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a GET /{pkg} packument request end to end, over the request's
RequestCtx.
The mount's PackumentDeps and error renderer are read from the matched
MountBinding in context, not threaded as arguments. When the mount has no
packument-serve dependencies wired, the route is recognised but not served -- a
501 in the mount's surface -- rather than fabricating a result.
With dependencies wired: the edge token, if configured, is validated before any
upstream is touched. Then the private and public upstreams are fetched
concurrently -- the client's credential forwarded to the private origin, the public
origin anonymous -- each parse failure or unavailable upstream degrading to a missing
contribution rather than an error. Private versions are trusted as-is; public
versions are gated through the rules and the structural filter (the FilterPlan);
the surviving sets are merged (mergePackuments) and the MergePlan assembled
onto the raw upstream Values to build the served body,
which is then answered against the client's conditional request with our own ETag.
When nothing survives, the status follows the most recoverable cause via
packumentStatus. An origin whose self-reported packument name disagrees with the
route is validated out -- dropped as untrusted for this request and logged -- so a
single misreporting upstream never denies a package another upstream serves; when
that leaves no valid origin, the request is a 502 (a responding upstream
returned an invalid response), distinct from a genuine absence. Every refusal -- the
edge 401 and the no-survivors 403/503/502/500 -- is rendered through the
mount's MountRenderer.
headPackument :: PackageName -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a HEAD /{pkg} packument request: the identical pipeline and gating as
servePackument -- the same fetch, merge, filter, rule decision, and no-survivors
status -- answered with the identical status and headers as the GET (the would-be
merged body's Content-Length and the own ETag the conditional-request machinery
computes), but with the body suppressed (bodiless), as HTTP semantics require of a
HEAD reply.
A packument body is assembled locally (a metadata fetch plus the cross-upstream
merge), so -- unlike the tarball HEAD (headTarball) -- answering it pumps __no
artifact body__ and carries no egress-amplification risk: this is the HTTP-correctness
half of the explicit-HEAD handling, not the DoS lever the tarball path closes. The
merged body is still materialised, to size it and compute its ETag; only the bytes
are withheld from the reply.
withPublicMetadataClient :: ServeRuntime -> PackumentDeps -> Text -> (MetadataClient -> IO a) -> Handler a Source #
The derived validator (exported for its unit spec)
packumentETag :: Text -> PackageName -> [(Provenance, ContentDigest, [Text])] -> ETag Source #
The derived packument validator: a SHA-256 over the serve's inputs -- the mount base URL, the package name, and per source (in merge order) its provenance, its origin body's digest, and the version keys that survived its gate.
The served document is a deterministic function of exactly these (the merge plan
derives from the gated typed views, which derive from the origin bytes and the
survivor sets; the assembly then edits the origin documents under the mount base
URL), so this tag can never call a changed document unchanged. It may change when
the re-assembled bytes would not have -- a spurious 200, never a wrong 304 --
which is the correct slack for a validator. Deriving it from inputs is what lets a
304 skip assembly, encoding, and any output hashing entirely.
Fields are fed to the hash with unambiguous framing: the digest is fixed-width, the
variable-length pieces are NUL-terminated, and each source block closes with an
\SOH terminator, so no concatenation of adjacent fields can collide with another
split of the same bytes. The leading salt versions the scheme: bump it when the
assembly's behaviour changes so pre-change client caches revalidate as modified.