| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Server.Pipeline.Tarball
Contents
Description
The serve paths behind the package routes: the artifact relay behind GET /{pkg}/-/{file}.tgz.
This is the data-plane handler module for artifacts. It composes the
slices that decide what to serve into one action in the
Handler reader, reading its mount's serve dependencies and
the request runtime ServeRuntime from the request's
RequestCtx.
Artifact path
The tarball handler (serveTarball) is the demand-driven artifact relay. Its two legs
locate the tarball differently, by the trust of their origin.
The private leg is a conventional stable read: it fetches the tarball at
{pdPrivateBaseUrl}/{pkg}/-/{file} (artifactRequestByFile), addressed by the
client's requested filename, without a private-packument fetch -- the stable,
cacheable shape an npm ci install issues, so a worst-case lockfile fan-out pays one
artifact round-trip per tarball rather than a packument fetch+decode per tarball it
would only discard. The request forwards the client's credential over the
trusted manager, attached at the single bearer-attach point
(withToken), which pins redirectCount = 0: this
credential-bearing read never follows a redirect (a private CDN 302 is returned to
the serve path, not chased with the bearer). The constructed URL is on the private base
host, so the TrustedOrigin tarball-host gate is satisfied
same-host, and the trusted origin is exempt from the internal-range block (a private
registry on an internal address still serves). A 2xx streams the artifact through with
bounded memory (the withResponse/responseStream relay, never a buffering fetch)
and answers the request; a non-2xx status or a connection failure is a __clean
miss__ that falls through to the public leg.
The private leg applies no serve-time integrity floor. An established version pinned
in a consumer's lockfile and served from an operator-trusted private registry is
fast-tracked: its bytes are still verified client-side by npm (against the
dist.integrity it resolved over the packument route) and by the mirror worker on
ingestion, so fast-tracking gives up only the proactive "refuse weak-integrity" stance,
not tamper-evidence. A consequence of the conventional read: a private upstream that
serves its tarball off the conventional /-/ path (a separate files host, a signed
CDN URL the convention cannot rebuild) is not reached by this leg, so it is a private
miss that falls through to the public origin.
The public leg honours the authoritative upstream location -- the
Artifact.artUrl the projection preserved from the gated version's dist.tarball,
selected by the requested filename -- rather than reconstructing the conventional path,
so the proxy can front a public registry that serves its artifacts from a separate host
or an off-convention path (a CDN/files host, a signed URL). That location is gated, not
trusted: it is fetched only when the tarball-host policy
(tarballHostAllowed, per ECLUSE_RESPECT_UPSTREAM_TARBALL_HOST)
admits its host (the default refuses a cross-host dist.tarball), and the untrusted
egress is https-only with certificate validation. The public leg is anonymous: it
gates that one version against the rules (the same machinery the packument path
gates the whole set with) and selects the artifact, and on an admit __streams the public
bytes from artUrl and enqueues a MirrorJob__ (naming that
authoritative URL) for the worker to back-fill the mirror target; on a reject --
including a host the tarball-host policy refuses -- it renders the serve error model
(403/503/500/404) through the mount's renderer. The enqueue is
serve-then-enqueue, best-effort and non-blocking: the artifact reaches the client
first, and an enqueue failure is swallowed rather than failing or delaying the response.
Mirroring is demand-driven -- a job is enqueued only here, on a tarball-path admit,
never when a packument is filtered. The two legs are not peers over time: the
back-fill retires each artifact from the public leg, so at steady state the private
conventional read serves the vast majority of tarball traffic and the public leg is
the transient onboarding/fail-over ramp (see
docs/architecture/registry-model.md → "Traffic shape over time"). The serve path does not verify dist.integrity;
the client checks the artifact's own hash and the worker re-verifies before publishing.
An artifact is a pass-through body -- served byte-identical to upstream's -- so its
conditional-GET handling relays rather than computing an own ETag (see
docs/architecture/web-layer.md → "Middleware and helper libraries", and contrast
the merged-packument own-ETag path): the client's If-None-Match/If-Modified-Since
are forwarded onto the upstream artifact request on both legs (forwardValidators),
and an upstream 304 Not Modified is relayed straight back to the client as a bodiless
304 (isNotModified via the relay's accept predicate) rather than re-downloading the
tarball -- the cheap freshness check on the hot artifact path.
Synopsis
- serveTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
- headTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived
The tarball handler
serveTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a GET /{pkg}/-/{file}.tgz artifact request end to end, over the
request's RequestCtx.
The mount's PackumentDeps and error renderer are read from the matched
MountBinding; an unwired mount is the recognised-but-unserved 501 stub (as for
servePackument). With dependencies wired and the edge token (if any) validated, the
two legs locate the tarball by the trust of their origin:
- the private leg is a conventional stable read: it fetches
{pdPrivateBaseUrl}/{pkg}/-/{file}by the requested filename (artifactRequestByFile), forwarding the client's credential and __without a private-packument fetch__; a2xxstreams the bytes through with bounded memory and answers the request, any other status (or a connection failure) is a clean miss that falls through. It applies no serve-time integrity floor -- the bytes are still verified client-side and by the mirror worker (see the module header → "Artifact path"); - on a private miss the public leg fetches that one version's metadata anonymously
and gates it against the rules; an admit honours the gated
dist.tarball, streaming the public bytes and enqueuing aMirrorJob(serve-then-enqueue, the enqueue best-effort and non-blocking), a reject renders the serve error model (403/503/500/404) through the mount's renderer.
The public-upstream fetch is always anonymous (the client credential is never sent to the
public upstream); the mirror job carries no credential. The serve path does not
verify dist.integrity (see the module header → "Artifact path").
headTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #
Serve a HEAD /{pkg}/-/{file}.tgz artifact request end to end, over the
request's RequestCtx.
A HEAD must never run the full-GET streaming pump: a bodiless HEAD would
otherwise open the upstream artifact connection and pump a whole artifact body that
the reply then discards -- wasted upstream egress and a DoS-amplification lever (a
client forcing arbitrary full-artifact fetches with cheap HEADs). So this handler
gates the artifact through the identical pipeline as serveTarball -- the same
edge auth, host-allowlist, internal-range, and tarball-host policy, and the same
upstream-request construction -- but issues the upstream request as a HEAD and relays
its status and safe response headers (relayArtifact) with no body
(probeUpstreamWhen). On an admit no MirrorJob is enqueued: a
HEAD serves no bytes, so there is nothing to back-fill (mirroring stays demand-driven
on the GET path). A refusal renders the same serve error model with an empty body.