ecluse:ecluse-core
Safe HaskellNone
LanguageGHC2021

Ecluse.Core.Registry.Npm.Filter

Description

The two pure transforms an npm packument needs before Écluse serves it: rewrite the embedded artifact URLs under the mount's prefix, and assemble the served document from a cross-upstream MergePlan and the raw source documents.

Both transforms operate structurally over the raw aeson Value, never by re-serialising a typed model. This is load-bearing: the served packument is an open document -- its schema is additionalProperties: true (see docs/architecture/api-surface.md → "The synthesized-packument schema") -- so any field Écluse does not model (author keys, registry bookkeeping, per-version extras) must be relayed unchanged. Building the served body from the raw Values keeps every unmodelled key; rebuilding it from Ecluse.Core.Package would silently drop them.

The decision/replay split

Which versions survive, which source wins each one, where dist-tags.latest resolves, and each surviving version's publish instant are the ecosystem-agnostic decisions, taken over the typed PackageInfo by Ecluse.Core.Package.Filter and Ecluse.Core.Package.Merge and handed here as a MergePlan. This module owns the npm wire-shape assembly: rebuilding versions/dist-tags/time onto the base document from the plan, and the tarball-URL rewrite over the raw upstream bytes. The npm wire knowledge lives here; the decision logic does not (it is reused by every ecosystem). See docs/architecture/registry-model.md → "Decision surface vs served surface".

URL rewriting

rewriteTarballUrls rewrites each version's dist.tarball to {mount-base}/{pkg}/-/{file}, so a client resolving metadata through the proxy also downloads the bytes through it rather than going straight to upstream and bypassing the gate (see docs/architecture/hosting.md → "The load-bearing requirement: URL rewriting"). Keeping artifacts same-host also keeps npm's auth flowing, which a separate artifact host would silently drop. The mount's externally-visible base URL is supplied by the caller; this transform performs no IO. It is idempotent: re-deriving {pkg} and {file} from an already-rewritten URL yields the same URL, so applying it more than once is safe.

Assembling the served document

assembleMergedPackument replays a MergePlan onto the raw source Values in one pass: each surviving version's object is taken from the raw document of the source that won it (so the served bytes are the winning upstream's, unmodelled keys and all) with its dist.tarball rewritten under the mount base as it is placed; dist-tags and time are rebuilt from the plan's reconciled decisions (the times as normalised ISO-8601, with the base document's created/modified bookkeeping retained); every other top-level key is relayed from the base document. A version not in the plan's survivors is simply never taken, so a client's resolver only ever sees admitted versions (presence in the packument is availability -- see docs/research/reverse-engineering/npm.md §8).

The fused single pass is deliberate: restricting, assembling, and rewriting as separate whole-document edits would rebuild a many-version packument several times per request, and this transform sits on the serve path's hot loop (see docs/architecture/performance.md). The rewrite honours the same gate as rewriteTarballUrls: the base document's own name is validated component-wise (safeName) before it is interpolated, and a document with no usable name has no URLs rewritten.

Synopsis

URL rewriting

rewriteTarballUrls :: Text -> Value -> Value Source #

Rewrite every version's dist.tarball to {base}/{pkg}/-/{file}, so the artifact is fetched back through this mount rather than directly from upstream.

base is the mount's externally-visible base URL (including any path prefix), supplied by the caller; a trailing slash on it is ignored. {pkg} is the packument's own name (the scoped @scope/name form npm uses in URLs), read from the document so the transform is self-contained. {file} is the upstream tarball URL's last path segment -- the artifact filename -- preserved verbatim so the bytes a client integrity-checks are unchanged.

Total and lossless: a version with no dist object, no tarball string, or a tarball with no filename segment is left untouched, as is a document with no usable name; every unmodelled key is relayed unchanged. Rewriting is idempotent -- a second pass derives the same {pkg} and {file} and so produces the same URL.

The name is upstream-controlled (it is the packument's own field), so each of its structural components -- the scope and base name either side of a @scope/ prefix -- is gated through "Ecluse.Core.Server.Route.isSafeComponent" before it is interpolated. A name carrying a traversal, an embedded separator, or a control character is rejected and the document is left untouched rather than emit a dist.tarball that aims a client outside the package's own path.

Assembling the served document

assembleMergedPackument :: Text -> Map SourceId Value -> MergePlan -> Value -> Value Source #

Assemble the served packument from a MergePlan and the raw source documents: rebuild versions, dist-tags, and time from the plan onto the base document, rewriting each surviving version's dist.tarball under mountBase in the same pass. Other top-level keys are inherited from the base document.

The plan was decided over the projected PackageInfos (the typed views of the same documents), but the assembly reads the raw Values, so unmodelled fields survive (see the module header). Each surviving version's object is taken from the source that won its key (mpSurvivors); a survivor whose source object is missing is dropped rather than fabricated, so coherence with the plan is preserved by construction. dist-tags is the plan's reconciled map (mpDistTags: latest resolved, absent-target tags dropped); time is the plan's surviving-version instants (mpTime, rendered as normalised ISO-8601) plus the base document's non-version created/modified bookkeeping.

The tarball rewrite is the same per-version transform rewriteTarballUrls applies, fused into the assembly so the versions object is built once rather than rebuilt by a second whole-document pass; it is gated identically (the base document's own name, validated by safeName, with no rewrite when the name is unusable).

The caller decides what to do with an empty plan; an empty mpSurvivors simply assembles an empty versions object. A non-object base document contributes no top-level keys and no bookkeeping (the plan-owned keys are still assembled), so the result is always an object.