| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Registry.Npm.Filter
Description
The two pure transforms an npm packument needs before Écluse serves it:
rewrite the embedded artifact URLs under the mount's prefix, and assemble the
served document from a cross-upstream MergePlan and the raw source documents.
Both transforms operate structurally over the raw aeson Value, never by
re-serialising a typed model. This is load-bearing: the served packument is an
open document -- its schema is additionalProperties: true (see
docs/architecture/api-surface.md → "The synthesized-packument schema") -- so
any field Écluse does not model (author keys, registry bookkeeping, per-version
extras) must be relayed unchanged. Building the served body from the raw
Values keeps every unmodelled key; rebuilding it from Ecluse.Core.Package
would silently drop them.
The decision/replay split
Which versions survive, which source wins each one, where dist-tags.latest
resolves, and each surviving version's publish instant are the ecosystem-agnostic
decisions, taken over the typed PackageInfo by
Ecluse.Core.Package.Filter and Ecluse.Core.Package.Merge and handed here as a
MergePlan. This module owns the npm wire-shape assembly: rebuilding
versions/dist-tags/time onto the base document from the plan, and the
tarball-URL rewrite over the raw upstream bytes. The npm wire knowledge lives
here; the decision logic does not (it is reused by every ecosystem). See
docs/architecture/registry-model.md → "Decision surface vs served surface".
URL rewriting
rewriteTarballUrls rewrites each version's dist.tarball to
{mount-base}/{pkg}/-/{file}, so a client resolving metadata through the
proxy also downloads the bytes through it rather than going straight to upstream
and bypassing the gate (see docs/architecture/hosting.md → "The load-bearing
requirement: URL rewriting"). Keeping artifacts same-host also keeps npm's auth
flowing, which a separate artifact host would silently drop. The mount's
externally-visible base URL is supplied by the caller; this
transform performs no IO. It is idempotent: re-deriving {pkg} and {file} from
an already-rewritten URL yields the same URL, so applying it more than once is safe.
Assembling the served document
assembleMergedPackument replays a MergePlan onto the raw source Values in
one pass: each surviving version's object is taken from the raw document of the
source that won it (so the served bytes are the winning upstream's, unmodelled keys
and all) with its dist.tarball rewritten under the mount base as it is placed;
dist-tags and time are rebuilt from the plan's reconciled decisions (the times
as normalised ISO-8601, with the base document's created/modified bookkeeping
retained); every other top-level key is relayed from the base document. A version
not in the plan's survivors is simply never taken, so a client's resolver only ever
sees admitted versions (presence in the packument is availability -- see
docs/research/reverse-engineering/npm.md §8).
The fused single pass is deliberate: restricting, assembling, and rewriting as
separate whole-document edits would rebuild a many-version packument several times
per request, and this transform sits on the serve path's hot loop (see
docs/architecture/performance.md). The rewrite honours the same gate as
rewriteTarballUrls: the base document's own name is validated component-wise
(safeName) before it is interpolated, and a document with no usable name has no
URLs rewritten.
URL rewriting
rewriteTarballUrls :: Text -> Value -> Value Source #
Rewrite every version's dist.tarball to {base}/{pkg}/-/{file}, so the
artifact is fetched back through this mount rather than directly from upstream.
base is the mount's externally-visible base URL (including any path prefix),
supplied by the caller; a trailing slash on it is ignored. {pkg} is the
packument's own name (the scoped @scope/name form npm uses in URLs), read
from the document so the transform is self-contained. {file} is the upstream
tarball URL's last path segment -- the artifact filename -- preserved verbatim so
the bytes a client integrity-checks are unchanged.
Total and lossless: a version with no dist object, no tarball string, or a
tarball with no filename segment is left untouched, as is a document with no
usable name; every unmodelled key is relayed unchanged. Rewriting is
idempotent -- a second pass derives the same {pkg} and {file} and so
produces the same URL.
The name is upstream-controlled (it is the packument's own field), so each
of its structural components -- the scope and base name either side of a @scope/
prefix -- is gated through "Ecluse.Core.Server.Route.isSafeComponent" before it is
interpolated. A name carrying a traversal, an embedded separator, or a control
character is rejected and the document is left untouched rather than emit a
dist.tarball that aims a client outside the package's own path.
Assembling the served document
assembleMergedPackument :: Text -> Map SourceId Value -> MergePlan -> Value -> Value Source #
Assemble the served packument from a MergePlan and the raw source documents:
rebuild versions, dist-tags, and time from the plan onto the base document,
rewriting each surviving version's dist.tarball under mountBase in the same
pass. Other top-level keys are inherited from the base document.
The plan was decided over the projected PackageInfos (the
typed views of the same documents), but the assembly reads the raw Values, so
unmodelled fields survive (see the module header). Each surviving version's object
is taken from the source that won its key (mpSurvivors); a survivor whose source
object is missing is dropped rather than fabricated, so coherence with the plan is
preserved by construction. dist-tags is the plan's reconciled map (mpDistTags:
latest resolved, absent-target tags dropped); time is the plan's
surviving-version instants (mpTime, rendered as normalised ISO-8601) plus the
base document's non-version created/modified bookkeeping.
The tarball rewrite is the same per-version transform rewriteTarballUrls applies,
fused into the assembly so the versions object is built once rather than rebuilt by
a second whole-document pass; it is gated identically (the base document's own
name, validated by safeName, with no rewrite when the name is unusable).
The caller decides what to do with an empty plan; an empty mpSurvivors simply
assembles an empty versions object. A non-object base document contributes no
top-level keys and no bookkeeping (the plan-owned keys are still assembled), so the
result is always an object.