ecluse:ecluse-core
Safe HaskellNone
LanguageGHC2021

Ecluse.Core.Server.Pipeline.Tarball

Description

The serve paths behind the package routes: the artifact relay behind GET /{pkg}/-/{file}.tgz.

This is the data-plane handler module for artifacts. It composes the slices that decide what to serve into one action in the Handler reader, reading its mount's serve dependencies and the request runtime ServeRuntime from the request's RequestCtx.

Artifact path

The tarball handler (serveTarball) is the demand-driven artifact relay. Its two legs locate the tarball differently, by the trust of their origin.

The private leg is a conventional stable read: it fetches the tarball at {pdPrivateBaseUrl}/{pkg}/-/{file} (artifactRequestByFile), addressed by the client's requested filename, without a private-packument fetch -- the stable, cacheable shape an npm ci install issues, so a worst-case lockfile fan-out pays one artifact round-trip per tarball rather than a packument fetch+decode per tarball it would only discard. The request forwards the client's credential over the trusted manager, attached at the single bearer-attach point (withToken), which pins redirectCount = 0: this credential-bearing read never follows a redirect (a private CDN 302 is returned to the serve path, not chased with the bearer). The constructed URL is on the private base host, so the TrustedOrigin tarball-host gate is satisfied same-host, and the trusted origin is exempt from the internal-range block (a private registry on an internal address still serves). A 2xx streams the artifact through with bounded memory (the withResponse/responseStream relay, never a buffering fetch) and answers the request; a non-2xx status or a connection failure is a __clean miss__ that falls through to the public leg.

The private leg applies no serve-time integrity floor. An established version pinned in a consumer's lockfile and served from an operator-trusted private registry is fast-tracked: its bytes are still verified client-side by npm (against the dist.integrity it resolved over the packument route) and by the mirror worker on ingestion, so fast-tracking gives up only the proactive "refuse weak-integrity" stance, not tamper-evidence. A consequence of the conventional read: a private upstream that serves its tarball off the conventional /-/ path (a separate files host, a signed CDN URL the convention cannot rebuild) is not reached by this leg, so it is a private miss that falls through to the public origin.

The public leg honours the authoritative upstream location -- the Artifact.artUrl the projection preserved from the gated version's dist.tarball, selected by the requested filename -- rather than reconstructing the conventional path, so the proxy can front a public registry that serves its artifacts from a separate host or an off-convention path (a CDN/files host, a signed URL). That location is gated, not trusted: it is fetched only when the tarball-host policy (tarballHostAllowed, per ECLUSE_RESPECT_UPSTREAM_TARBALL_HOST) admits its host (the default refuses a cross-host dist.tarball), and the untrusted egress is https-only with certificate validation. The public leg is anonymous: it gates that one version against the rules (the same machinery the packument path gates the whole set with) and selects the artifact, and on an admit __streams the public bytes from artUrl and enqueues a MirrorJob__ (naming that authoritative URL) for the worker to back-fill the mirror target; on a reject -- including a host the tarball-host policy refuses -- it renders the serve error model (403/503/500/404) through the mount's renderer. The enqueue is serve-then-enqueue, best-effort and non-blocking: the artifact reaches the client first, and an enqueue failure is swallowed rather than failing or delaying the response. Mirroring is demand-driven -- a job is enqueued only here, on a tarball-path admit, never when a packument is filtered. The two legs are not peers over time: the back-fill retires each artifact from the public leg, so at steady state the private conventional read serves the vast majority of tarball traffic and the public leg is the transient onboarding/fail-over ramp (see docs/architecture/registry-model.md → "Traffic shape over time"). The serve path does not verify dist.integrity; the client checks the artifact's own hash and the worker re-verifies before publishing.

An artifact is a pass-through body -- served byte-identical to upstream's -- so its conditional-GET handling relays rather than computing an own ETag (see docs/architecture/web-layer.md → "Middleware and helper libraries", and contrast the merged-packument own-ETag path): the client's If-None-Match/If-Modified-Since are forwarded onto the upstream artifact request on both legs (forwardValidators), and an upstream 304 Not Modified is relayed straight back to the client as a bodiless 304 (isNotModified via the relay's accept predicate) rather than re-downloading the tarball -- the cheap freshness check on the hot artifact path.

Synopsis

The tarball handler

serveTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #

Serve a GET /{pkg}/-/{file}.tgz artifact request end to end, over the request's RequestCtx.

The mount's PackumentDeps and error renderer are read from the matched MountBinding; an unwired mount is the recognised-but-unserved 501 stub (as for servePackument). With dependencies wired and the edge token (if any) validated, the two legs locate the tarball by the trust of their origin:

  • the private leg is a conventional stable read: it fetches {pdPrivateBaseUrl}/{pkg}/-/{file} by the requested filename (artifactRequestByFile), forwarding the client's credential and __without a private-packument fetch__; a 2xx streams the bytes through with bounded memory and answers the request, any other status (or a connection failure) is a clean miss that falls through. It applies no serve-time integrity floor -- the bytes are still verified client-side and by the mirror worker (see the module header → "Artifact path");
  • on a private miss the public leg fetches that one version's metadata anonymously and gates it against the rules; an admit honours the gated dist.tarball, streaming the public bytes and enqueuing a MirrorJob (serve-then-enqueue, the enqueue best-effort and non-blocking), a reject renders the serve error model (403/503/500/404) through the mount's renderer.

The public-upstream fetch is always anonymous (the client credential is never sent to the public upstream); the mirror job carries no credential. The serve path does not verify dist.integrity (see the module header → "Artifact path").

headTarball :: PackageName -> Version -> Filename -> Request -> (Response -> IO ResponseReceived) -> Handler ResponseReceived Source #

Serve a HEAD /{pkg}/-/{file}.tgz artifact request end to end, over the request's RequestCtx.

A HEAD must never run the full-GET streaming pump: a bodiless HEAD would otherwise open the upstream artifact connection and pump a whole artifact body that the reply then discards -- wasted upstream egress and a DoS-amplification lever (a client forcing arbitrary full-artifact fetches with cheap HEADs). So this handler gates the artifact through the identical pipeline as serveTarball -- the same edge auth, host-allowlist, internal-range, and tarball-host policy, and the same upstream-request construction -- but issues the upstream request as a HEAD and relays its status and safe response headers (relayArtifact) with no body (probeUpstreamWhen). On an admit no MirrorJob is enqueued: a HEAD serves no bytes, so there is nothing to back-fill (mirroring stays demand-driven on the GET path). A refusal renders the same serve error model with an empty body.