ecluse:ecluse-core
Safe HaskellNone
LanguageGHC2021

Ecluse.Core.Package.Merge

Description

Merging several upstream packuments into the one document Écluse serves.

A packument is the set of available versions of a package, and that set is spread across upstreams: a trusted private upstream holds what has been vetted, while a gated public upstream holds the full history -- including versions not yet mirrored. Serving only the private document would hide those, so Écluse serves their union rather than short-circuiting on a private hit. This module is the pure, ecosystem-agnostic fold that reasons over that union on the PackageInfo domain model -- it lives above the registry handle, written once and reused by every ecosystem, and never imports a registry adapter.

Decision surface, not served surface. This module reasons over the typed PackageInfo but does not emit a finished, re-serialisable PackageInfo. The document Écluse serves is the raw upstream JSON (Value), edited in place by the serve layer, so that every unmodeled wire key survives. The typed model is lossy, so re-encoding it would drop those keys. This module therefore emits a MergePlan -- exactly which versions survive, which input each survivor came from, the reconciled dist-tags/time, and the detected divergences -- that the serve layer replays onto the raw Values. See docs/architecture/registry-model.md → "Decision surface vs served surface".

The trust split is the caller's, expressed as a Provenance tag on each input and applied before the merge: TrustedSource (private) versions are admitted as-is; GatedSource (public) versions are the already-rule-filtered set. This module does not run rules -- it reasons over exactly what it is handed (see docs/architecture/rules-engine.md → "Applying verdicts to a packument").

Two things make the merge more than a map union, and both are supply-chain signals, not silent reconciliations:

  • Collision. When the same version key comes from both a TrustedSource and a GatedSource, the trusted copy wins (it is the authority) -- recorded in the plan as the survivor's winning SourceId.
  • Divergence. When the colliding copies __contradict on a shared integrity algorithm__ -- an algorithm both expose carries disagreeing digests -- that is exactly the tampering Écluse exists to catch. Copies that merely expose different algorithm sets without contradicting on a shared one (one mirror also carrying a legacy digest the other omits) describe the same bytes and are not a divergence. The trusted copy still wins the merge, but a real contradiction is reported in the MergePlan; whether to additionally drop the version (fail-closed) is a policy decision left to the caller, so this module stays pure.

The merge is a lawful Monoid. The fold is realised over a Merge accumulator with a lawful Semigroup / Monoid: mempty is the empty merge (the degenerate identity at zero inputs) and (<>) is the trusted-wins union with order-independent divergence detection. mergePackuments assigns each input a SourceId by list position, foldMaps the contributions into the accumulator, and projects to a MergePlan. See the Semigroup instance for the exact law domain (associative + identity, intentionally not commutative).

See docs/architecture/registry-model.md → "Packument merge across upstreams".

Synopsis

Provenance

data Provenance Source #

The trust provenance of an upstream's contribution to the merge. The split is decided by the caller -- by which upstream a document came from -- and applied before merging, never derived here.

The constructors are named *Source rather than the bare Trusted/Gated because Ecluse.Core.Package already exports a Trust constructor named Trusted; a bare name would collide for the many callers that import Ecluse.Core.Package openly.

The Ord instance is the trust order itself -- TrustedSource compares __less than__ GatedSource so that "smallest wins" gives trusted precedence; the merge's resolution leans on this directly (see mergePackuments).

Constructors

TrustedSource

A private-upstream document. Its versions are already vetted, so they enter the union unfiltered and win any collision.

GatedSource

A public-upstream document. Its versions are the set that already survived the rules engine; the merge unions them but never re-filters.

Merging

type SourceId = Int Source #

A stable identifier for one input to a single mergePackuments call: the 0-based index of that (Provenance, PackageInfo) in the input list.

The serve layer needs to take a surviving version's object from the raw Value of whichever source won it, so the plan must name that source. Provenance alone is not enough: it identifies a source only while there is exactly one input per provenance (the npm topology today -- one trusted, one gated). The input index stays unambiguous even when several inputs share a provenance (e.g. an aggregating private upstream plus a first-party source, both TrustedSource), which keeps the plan correct for the multi-source case without a new type. The caller pairs each SourceId back to the raw Value it passed at that position.

data MergePlan Source #

The outcome of reasoning over a set of upstream packuments: a plan the serve layer replays onto the raw upstream Values to assemble the lossless served body. It carries exactly the decisions the merge owns -- never a finished, re-serialisable document (see this module's header, "Decision surface, not served surface").

Constructors

MergePlan 

Fields

  • mpName :: PackageName

    The package identity, carried from the contributions. Every contribution that reaches the merge has had its self-reported name validated against the requested one upstream of here (a disagreeing origin is dropped before the merge), so all inputs carry the same identity and it is never a substituted or manufactured value -- only one an upstream genuinely reported.

  • mpSurvivors :: Map Text SourceId

    Each surviving version key mapped to the SourceId of the input that won it, so the serve layer takes that version's object from the right source's raw Value. Trusted wins a collision; absent versions are not keys here.

  • mpDistTags :: Map Text Version

    dist-tags reconciled over the surviving union -- latest resolved by the shared selector, every other surviving-target tag carried, absent-target tags dropped.

  • mpTime :: Map Text UTCTime

    The served time map, reconstructed from the survivors: each surviving version's publish instant taken from the same winning candidate whose manifest is served, so a version's served time always comes from the source that won its manifest, never fabricated from a different source. A winner with no known publish time contributes no entry, so this is keyed by a subset of the survivors.

  • mpDivergences :: Set Divergence

    Every distinct same-version integrity conflict found. A Set because divergence is a property of the set of distinct integrity fingerprints contributed for a version key, not of any pairwise fold step: the winner's fingerprint is recorded against /each distinct fingerprint that contradicts it on a shared algorithm/, which is order-independent and deduplicating by construction. Empty when no two copies of a shared version contradict on a shared algorithm -- including when they merely expose different algorithm sets without disagreeing on one they share.

Instances

Instances details
Show MergePlan Source # 
Instance details

Defined in Ecluse.Core.Package.Merge

Eq MergePlan Source # 
Instance details

Defined in Ecluse.Core.Package.Merge

data Divergence Source #

A detected integrity conflict: a version key present in more than one source whose copies contradict on a shared algorithm -- an algorithm both expose carries disagreeing digests. The trusted copy wins the merge; this record preserves both fingerprints so the caller can log, meter, and decide policy (serve-with-private-winning vs fail-closed). It is the merge's supply-chain signal -- surfaced, never silently reconciled.

Ord is derived purely to let MergePlan carry divergences as a Set: the ordering is structural (over the version key and the two fingerprints) and has no meaning beyond deduplication and a stable presentation.

Constructors

Divergence 

Fields

data IntegrityFingerprint Source #

An order-independent fingerprint of a version's artifact integrity: the sorted multiset of (resolved algorithm, comparable digest body) pairs across all of the version's artifacts. Each digest is keyed by the algorithm it asserts (assertedAlg -- a hex Hash's tag, or the algorithm an SRI string embeds), not by its raw HashAlg wrapper tag, so an sha256-… SRI and a hex SHA-256 digest bucket together under SHA256 while an sha256-… and an sha512-… SRI bucket apart. A digest that asserts no algorithm (a bare or malformed SRI) keys under Nothing -- its own bucket -- so an unknown digest never merges with a real algorithm (the fail-closed reading). The body is the comparable digest: an SRI's base64 body (without its <alg>- prefix) or a hex digest's raw value, which is uniform within any shared resolved algorithm, so comparing bodies is sound. The comparison ignores artifact ordering and non-integrity fields (filename, URL, size) that legitimately vary between mirrors of the same bytes.

Two copies diverge when they contradict on a shared resolved algorithm: an algorithm both assert carries disagreeing bodies. An asymmetric pair -- one copy asserting an algorithm the other omits, including a mirror that recomputed integrity under a different algorithm -- does not diverge on that account; only a shared resolved algorithm whose bodies disagree does. So a mirror serving a modern digest alongside a legacy one agrees with a mirror serving only the modern digest, as long as that shared digest matches.

Opaque so the comparison used for divergence detection cannot be sidestepped; read the pairs back with integrityHashes when logging or metering a Divergence. Ord is derived (structurally, over the sorted pairs) only so a Divergence may live in a Set; it carries no domain meaning beyond that, and in particular is not the divergence test (which is the shared-algorithm contradiction above, never structural inequality of the whole set).

integrityHashes :: IntegrityFingerprint -> [(Maybe HashAlg, Text)] Source #

The (resolved algorithm, comparable digest body) pairs of a fingerprint, sorted, for an audit trail. The algorithm is the one each digest asserts (Nothing when it asserts none); the body is its comparable form (an SRI's base64 body, a hex digest's raw value).

mergePackuments :: [(Provenance, PackageInfo)] -> Maybe MergePlan Source #

Reason over several upstream packuments, by Provenance, and emit the MergePlan the serve layer replays onto the raw Values. Pure and total.

The merge is a fold with the degenerate identity at one input: a single packument yields a plan whose survivors are all of its versions (all won by source 0), with its tags and times reconciled and no divergences, so 0/1-upstream deployments need no special case. It is realised as a foldMap of each input's contribute into the lawful Merge Monoid, projected by planFrom. The model:

  • Union by version key, with TrustedSource winning a collision over GatedSource (the private upstream is the authority). The winning input's SourceId is recorded for the survivor. A collision whose copies contradict on a shared integrity algorithm is recorded as a Divergence; the winner is still kept.
  • 'dist-tags' reconciled over the union. latest is resolved by selectLatest -- keep-unless-denied, stable-preferring, and unparseable-safe -- from the precedence-winning source's tagged latest and the surviving versions; any other tag pointing at a version absent from the union is dropped. Collisions on the same tag are resolved by provenance (trusted wins), consistent with the version fold, so the plan does not depend on caller input order.
  • time reconstructed from the survivors: each survivor's publish instant is read off the same winning candidate whose manifest is served, so a version's served time always comes from the source that won its manifest, never fabricated from a different source. A winner with no known publish time contributes no entry.

The plan's identity (mpName) is carried from the contributions; callers fetch one package across its upstreams and each contribution's name has been validated against the requested one before reaching here, so all inputs share that one identity and it is never a substituted value. An empty input list yields Nothing -- there is nothing to serve.

The merge accumulator

The merge is realised as a fold into a lawful Monoid. contribute turns one (Provenance, PackageInfo) input into a Merge; (<>) combines two merges (trusted-wins union, with order-independent divergence kept unresolved until the projection); mempty is the empty merge (the degenerate identity). planFrom projects a folded Merge to a MergePlan. mergePackuments is exactly planFrom . foldMap (uncurry contribute). The Merge type is opaque -- build it only through contribute and mempty -- so a SourceId always names a real input position. See the Semigroup instance for the law domain (associative + identity, intentionally not commutative, and why).

data Merge Source #

The monoidal accumulator the merge folds into. It holds, unresolved, every candidate offered for every version key, plus the ranked dist-tags contributions; resolution to a single winner per key, and the divergence set, happens once in planFrom. The served time map needs no axis here: each version's publish instant rides inside its Candidate (on candDetails), so planFrom reads it off the same winner the manifest is taken from. Keeping candidates unresolved is what makes (<>) associative: a pairwise winner-vs-loser decision taken during the fold is not associative once three or more copies of a key collide, because divergence is a property of the whole set of distinct fingerprints, not of any one step.

Each accumulator also carries the count of inputs it represents, so that (<>) can __re-index the right operand's SourceIds by the left operand's input count__. This positional re-indexing is what makes a SourceId name an input's list position after a foldMap of single-input contributions -- and it is the sole reason the instance is non-commutative (see the Semigroup instance).

Instances

Instances details
Monoid Merge Source # 
Instance details

Defined in Ecluse.Core.Package.Merge

Methods

mempty :: Merge #

mappend :: Merge -> Merge -> Merge #

mconcat :: [Merge] -> Merge #

Semigroup Merge Source #

The merge's Semigroup has a deliberately narrow law domain, and the narrowing is load-bearing, not an accident:

  • Associative -- (a <> b) <> c == a <> (b <> c). The SourceId re-indexing offsets compose additively, and every per-key combiner (set union for candidates, "keep the smaller rank" for tags, "left name wins" for the identity) is itself associative, so the whole is.
  • Identity -- mempty (the empty merge) is both a left and a right unit.
  • Intentionally NOT commutative -- a <> b /= b <> a in general. (<>) re-indexes the right operand's SourceIds by the left operand's input count, because a SourceId must name the input's position in the caller's list -- the index the serve layer pairs back to a raw Value. Swapping the operands swaps those positions, so the SourceId labels differ.

The order-independence guarantee, stated precisely (and the reason commutativity is the wrong law): precedence is resolved by provenance, so the surviving key set and the winning provenance per key are invariant under any permutation of the inputs, and the value-level reconciliations (the survivor a key resolves to, the divergence fingerprint-pairs, the dist-tags targets, and the served time read off each survivor) are invariant under any permutation that keeps each collision cross-provenance, which the npm topology (exactly one trusted, one gated upstream) always does, so every observable decision is order-independent there. The sole residual order-dependence is the positional tiebreak between two inputs of the same provenance: provenance cannot break that tie, so the lower SourceId (earlier input) wins it, and which copy is the divergence winner then tracks order. That positional tiebreak is exactly why SourceId exists and why the instance is non-commutative.

Instance details

Defined in Ecluse.Core.Package.Merge

Methods

(<>) :: Merge -> Merge -> Merge #

sconcat :: NonEmpty Merge -> Merge #

stimes :: Integral b => b -> Merge -> Merge #

Show Merge Source # 
Instance details

Defined in Ecluse.Core.Package.Merge

Methods

showsPrec :: Int -> Merge -> ShowS #

show :: Merge -> String #

showList :: [Merge] -> ShowS #

Eq Merge Source # 
Instance details

Defined in Ecluse.Core.Package.Merge

Methods

(==) :: Merge -> Merge -> Bool #

(/=) :: Merge -> Merge -> Bool #

contribute :: Provenance -> PackageInfo -> Merge Source #

One input's contribution to the accumulator, at local SourceId 0: every version becomes a candidate (carrying its own publish time on candDetails), every dist-tags target a ranked value at this input's provenance, and the package name is offered as the identity. foldMap contribute over the inputs then re-indexes each to its list position via the Semigroup offset, so the absolute SourceId of a single-input contribution is its index in the foldMap.

planFrom :: Merge -> Maybe MergePlan Source #

Project the resolved MergePlan from a folded Merge. Resolves each version key to its precedence winner, derives the divergence Set from the shared-algorithm contradictions among each key's distinct fingerprints, reconciles dist-tags over the survivors, and reconstructs the served time map from each survivor's winning candidate. Returns Nothing only for the empty merge (mempty), which has no name and so nothing to serve; equivalently, the empty input list.