| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Package.Merge
Description
Merging several upstream packuments into the one document Écluse serves.
A packument is the set of available versions of a package, and that set is
spread across upstreams: a trusted private upstream holds what has been vetted,
while a gated public upstream holds the full history -- including versions not yet
mirrored. Serving only the private document would hide those, so Écluse serves
their union rather than short-circuiting on a private hit. This module is the
pure, ecosystem-agnostic fold that reasons over that union on the
PackageInfo domain model -- it lives above the registry handle,
written once and reused by every ecosystem, and never imports a registry adapter.
Decision surface, not served surface. This module reasons over the typed
PackageInfo but does not emit a finished, re-serialisable PackageInfo.
The document Écluse serves is the raw upstream JSON (Value), edited in place by
the serve layer, so that every unmodeled wire key survives. The typed model
is lossy, so re-encoding it would drop those keys. This module therefore emits a
MergePlan -- exactly which versions survive, which input each survivor came from,
the reconciled dist-tags/time, and the detected divergences -- that the serve
layer replays onto the raw Values. See docs/architecture/registry-model.md
→ "Decision surface vs served surface".
The trust split is the caller's, expressed as a Provenance tag on each
input and applied before the merge: TrustedSource (private) versions are
admitted as-is; GatedSource (public) versions are the already-rule-filtered set.
This module does not run rules -- it reasons over exactly what it is handed (see
docs/architecture/rules-engine.md → "Applying verdicts to a packument").
Two things make the merge more than a map union, and both are supply-chain signals, not silent reconciliations:
- Collision. When the same version key comes from both a
TrustedSourceand aGatedSource, the trusted copy wins (it is the authority) -- recorded in the plan as the survivor's winningSourceId. - Divergence. When the colliding copies __contradict on a shared integrity
algorithm__ -- an algorithm both expose carries disagreeing digests -- that is
exactly the tampering Écluse exists to catch. Copies that merely expose
different algorithm sets without contradicting on a shared one (one mirror also
carrying a legacy digest the other omits) describe the same bytes and are not a
divergence. The trusted copy still wins the merge, but a real contradiction is
reported in the
MergePlan; whether to additionally drop the version (fail-closed) is a policy decision left to the caller, so this module stays pure.
The merge is a lawful Monoid. The fold is realised over a Merge
accumulator with a lawful Semigroup / Monoid: mempty is the empty merge
(the degenerate identity at zero inputs) and (<>) is the trusted-wins union with
order-independent divergence detection. mergePackuments assigns each input a
SourceId by list position, foldMaps the contributions into the accumulator,
and projects to a MergePlan. See the Semigroup instance for the exact law
domain (associative + identity, intentionally not commutative).
See docs/architecture/registry-model.md → "Packument merge across upstreams".
Synopsis
- data Provenance
- type SourceId = Int
- data MergePlan = MergePlan {}
- data Divergence = Divergence {}
- data IntegrityFingerprint
- integrityHashes :: IntegrityFingerprint -> [(Maybe HashAlg, Text)]
- mergePackuments :: [(Provenance, PackageInfo)] -> Maybe MergePlan
- data Merge
- contribute :: Provenance -> PackageInfo -> Merge
- planFrom :: Merge -> Maybe MergePlan
Provenance
data Provenance Source #
The trust provenance of an upstream's contribution to the merge. The split is decided by the caller -- by which upstream a document came from -- and applied before merging, never derived here.
The constructors are named *Source rather than the bare Trusted/Gated
because Ecluse.Core.Package already exports a Trust constructor
named Trusted; a bare name would collide for the many callers that import
Ecluse.Core.Package openly.
The Ord instance is the trust order itself -- TrustedSource compares __less
than__ GatedSource so that "smallest wins" gives trusted precedence; the merge's
resolution leans on this directly (see mergePackuments).
Constructors
| TrustedSource | A private-upstream document. Its versions are already vetted, so they enter the union unfiltered and win any collision. |
| GatedSource | A public-upstream document. Its versions are the set that already survived the rules engine; the merge unions them but never re-filters. |
Instances
| Show Provenance Source # | |
Defined in Ecluse.Core.Package.Merge Methods showsPrec :: Int -> Provenance -> ShowS # show :: Provenance -> String # showList :: [Provenance] -> ShowS # | |
| Eq Provenance Source # | |
Defined in Ecluse.Core.Package.Merge | |
| Ord Provenance Source # | |
Defined in Ecluse.Core.Package.Merge Methods compare :: Provenance -> Provenance -> Ordering # (<) :: Provenance -> Provenance -> Bool # (<=) :: Provenance -> Provenance -> Bool # (>) :: Provenance -> Provenance -> Bool # (>=) :: Provenance -> Provenance -> Bool # max :: Provenance -> Provenance -> Provenance # min :: Provenance -> Provenance -> Provenance # | |
Merging
A stable identifier for one input to a single mergePackuments call: the
0-based index of that (Provenance, PackageInfo) in the input list.
The serve layer needs to take a surviving version's object from the raw
Value of whichever source won it, so the plan must name that source. Provenance
alone is not enough: it identifies a source only while there is exactly one
input per provenance (the npm topology today -- one trusted, one gated). The
input index stays unambiguous even when several inputs share a provenance (e.g. an
aggregating private upstream plus a first-party source, both TrustedSource),
which keeps the plan correct for the multi-source case without a new type. The
caller pairs each SourceId back to the raw Value it passed at that position.
The outcome of reasoning over a set of upstream packuments: a plan the
serve layer replays onto the raw upstream Values to assemble the lossless
served body. It carries exactly the decisions the merge owns -- never a finished,
re-serialisable document (see this module's header, "Decision surface, not served
surface").
Constructors
| MergePlan | |
Fields
| |
data Divergence Source #
A detected integrity conflict: a version key present in more than one source whose copies contradict on a shared algorithm -- an algorithm both expose carries disagreeing digests. The trusted copy wins the merge; this record preserves both fingerprints so the caller can log, meter, and decide policy (serve-with-private-winning vs fail-closed). It is the merge's supply-chain signal -- surfaced, never silently reconciled.
Ord is derived purely to let MergePlan carry divergences as a Set: the
ordering is structural (over the version key and the two fingerprints) and has no
meaning beyond deduplication and a stable presentation.
Constructors
| Divergence | |
Fields
| |
Instances
| Show Divergence Source # | |
Defined in Ecluse.Core.Package.Merge Methods showsPrec :: Int -> Divergence -> ShowS # show :: Divergence -> String # showList :: [Divergence] -> ShowS # | |
| Eq Divergence Source # | |
Defined in Ecluse.Core.Package.Merge | |
| Ord Divergence Source # | |
Defined in Ecluse.Core.Package.Merge Methods compare :: Divergence -> Divergence -> Ordering # (<) :: Divergence -> Divergence -> Bool # (<=) :: Divergence -> Divergence -> Bool # (>) :: Divergence -> Divergence -> Bool # (>=) :: Divergence -> Divergence -> Bool # max :: Divergence -> Divergence -> Divergence # min :: Divergence -> Divergence -> Divergence # | |
data IntegrityFingerprint Source #
An order-independent fingerprint of a version's artifact integrity: the sorted
multiset of (resolved algorithm, comparable digest body) pairs across all of the
version's artifacts. Each digest is keyed by the algorithm it asserts
(assertedAlg -- a hex Hash's tag, or
the algorithm an SRI string embeds), not by its raw HashAlg wrapper tag, so an
sha256-… SRI and a hex SHA-256 digest bucket together under SHA256 while an sha256-…
and an sha512-… SRI bucket apart. A digest that asserts no algorithm (a bare or
malformed SRI) keys under Nothing -- its own bucket -- so an unknown digest never merges
with a real algorithm (the fail-closed reading). The body is the comparable digest: an
SRI's base64 body (without its <alg>- prefix) or a hex digest's raw value, which is
uniform within any shared resolved algorithm, so comparing bodies is sound. The
comparison ignores artifact ordering and non-integrity fields (filename, URL, size) that
legitimately vary between mirrors of the same bytes.
Two copies diverge when they contradict on a shared resolved algorithm: an algorithm both assert carries disagreeing bodies. An asymmetric pair -- one copy asserting an algorithm the other omits, including a mirror that recomputed integrity under a different algorithm -- does not diverge on that account; only a shared resolved algorithm whose bodies disagree does. So a mirror serving a modern digest alongside a legacy one agrees with a mirror serving only the modern digest, as long as that shared digest matches.
Opaque so the comparison used for divergence detection cannot be sidestepped; read the
pairs back with integrityHashes when logging or metering a Divergence. Ord is
derived (structurally, over the sorted pairs) only so a Divergence may live in a
Set; it carries no domain meaning beyond that, and in particular is not the
divergence test (which is the shared-algorithm contradiction above, never structural
inequality of the whole set).
Instances
| Show IntegrityFingerprint Source # | |
Defined in Ecluse.Core.Package.Merge Methods showsPrec :: Int -> IntegrityFingerprint -> ShowS # show :: IntegrityFingerprint -> String # showList :: [IntegrityFingerprint] -> ShowS # | |
| Eq IntegrityFingerprint Source # | |
Defined in Ecluse.Core.Package.Merge Methods (==) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (/=) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # | |
| Ord IntegrityFingerprint Source # | |
Defined in Ecluse.Core.Package.Merge Methods compare :: IntegrityFingerprint -> IntegrityFingerprint -> Ordering # (<) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (<=) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (>) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # (>=) :: IntegrityFingerprint -> IntegrityFingerprint -> Bool # max :: IntegrityFingerprint -> IntegrityFingerprint -> IntegrityFingerprint # min :: IntegrityFingerprint -> IntegrityFingerprint -> IntegrityFingerprint # | |
integrityHashes :: IntegrityFingerprint -> [(Maybe HashAlg, Text)] Source #
The (resolved algorithm, comparable digest body) pairs of a fingerprint, sorted,
for an audit trail. The algorithm is the one each digest asserts (Nothing when it
asserts none); the body is its comparable form (an SRI's base64 body, a hex digest's
raw value).
mergePackuments :: [(Provenance, PackageInfo)] -> Maybe MergePlan Source #
Reason over several upstream packuments, by Provenance, and emit the
MergePlan the serve layer replays onto the raw Values. Pure and total.
The merge is a fold with the degenerate identity at one input: a single
packument yields a plan whose survivors are all of its versions (all won by source
0), with its tags and times reconciled and no divergences, so 0/1-upstream
deployments need no special case. It is realised as a foldMap of each input's
contribute into the lawful Merge Monoid, projected by planFrom. The model:
- Union by version key, with
TrustedSourcewinning a collision overGatedSource(the private upstream is the authority). The winning input'sSourceIdis recorded for the survivor. A collision whose copies contradict on a shared integrity algorithm is recorded as aDivergence; the winner is still kept. - 'dist-tags' reconciled over the union.
latestis resolved byselectLatest-- keep-unless-denied, stable-preferring, and unparseable-safe -- from the precedence-winning source's taggedlatestand the surviving versions; any other tag pointing at a version absent from the union is dropped. Collisions on the same tag are resolved by provenance (trusted wins), consistent with the version fold, so the plan does not depend on caller input order. timereconstructed from the survivors: each survivor's publish instant is read off the same winning candidate whose manifest is served, so a version's served time always comes from the source that won its manifest, never fabricated from a different source. A winner with no known publish time contributes no entry.
The plan's identity (mpName) is carried from the contributions; callers fetch one
package across its upstreams and each contribution's name has been validated against
the requested one before reaching here, so all inputs share that one identity and it
is never a substituted value. An empty input list yields Nothing -- there is nothing
to serve.
The merge accumulator
The merge is realised as a fold into a lawful Monoid. contribute turns one
(Provenance, PackageInfo) input into a Merge; (<>) combines two merges
(trusted-wins union, with order-independent divergence kept unresolved until the
projection); mempty is the empty merge (the degenerate identity). planFrom
projects a folded Merge to a MergePlan. mergePackuments is exactly
. The planFrom . foldMap (uncurry contribute)Merge type is opaque --
build it only through contribute and mempty -- so a SourceId always names a
real input position. See the Semigroup instance for the law domain (associative
+ identity, intentionally not commutative, and why).
The monoidal accumulator the merge folds into. It holds, unresolved, every
candidate offered for every version key, plus the ranked dist-tags contributions;
resolution to a single winner per key, and the divergence set, happens once in
planFrom. The served time map needs no axis here: each version's publish instant
rides inside its Candidate (on candDetails), so planFrom reads it off the same
winner the manifest is taken from. Keeping candidates unresolved is what makes (<>)
associative: a pairwise winner-vs-loser decision taken during the fold is not
associative once three or more copies of a key collide, because divergence is a
property of the whole set of distinct fingerprints, not of any one step.
Each accumulator also carries the count of inputs it represents, so that (<>)
can __re-index the right operand's SourceIds by the left operand's input
count__. This positional re-indexing is what makes a SourceId name an input's
list position after a foldMap of single-input contributions -- and it is the sole
reason the instance is non-commutative (see the Semigroup instance).
Instances
| Monoid Merge Source # | |
| Semigroup Merge Source # | The merge's
The order-independence guarantee, stated precisely (and the reason commutativity is
the wrong law): precedence is resolved by provenance, so the surviving key set
and the winning provenance per key are invariant under any permutation of the
inputs, and the value-level reconciliations (the survivor a key resolves to, the
divergence fingerprint-pairs, the |
| Show Merge Source # | |
| Eq Merge Source # | |
contribute :: Provenance -> PackageInfo -> Merge Source #
One input's contribution to the accumulator, at local SourceId 0: every
version becomes a candidate (carrying its own publish time on candDetails), every
dist-tags target a ranked value at this input's provenance, and the package name is
offered as the identity. foldMap contribute over the inputs then re-indexes each to
its list position via the Semigroup offset, so the absolute SourceId of a
single-input contribution is its index in the foldMap.
planFrom :: Merge -> Maybe MergePlan Source #
Project the resolved MergePlan from a folded Merge. Resolves each version
key to its precedence winner, derives the divergence Set from the shared-algorithm
contradictions among each key's distinct fingerprints, reconciles dist-tags over the
survivors, and reconstructs the served time map from each survivor's winning
candidate. Returns Nothing only for the empty merge (mempty), which has no name and
so nothing to serve; equivalently, the empty input list.