| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Registry.Npm.Project
Description
Projection of npm wire JSON into the ecosystem-agnostic domain model.
This module is the second half of the npm protocol boundary. Where
Ecluse.Core.Registry.Npm.Wire captures what the registry said as faithful wire
types, this module turns those into the domain vocabulary of Ecluse.Core.Package --
PackageInfo (the packument-level view) and PackageDetails (the per-version
snapshot the rules engine evaluates). Together they realise the parse* fields
of the Ecluse.Core.Registry handle: nothing above the adapter ever sees npm wire
data.
The projection is pure and total (it returns Either ParseError, never
throws), the execution half of parse, don't validate -- once a response has
been projected, downstream code holds precise domain types and never re-inspects
the wire shape.
Per-version graceful degradation
The versions, dist-tags, and time maps are decoded element-wise: a
version whose manifest is missing or malformed in a required/security-decisive
field (no dist or tarball, an unusable version), a dist-tags entry whose
value is not a string, or a time entry that is not a decodable instant is
dropped rather than failing the whole packument. Because presence in the
decision surface is what makes a version a serve-candidate, a dropped version is
automatically never served -- fail-closed for that one version (a version that
cannot be decoded cannot be evaluated for integrity, CVEs, or rules) while every
healthy version still resolves; a dropped date is simply a version with no known
publish time, and a dropped tag loses only that one tag. Only a document whose
top-level structure is unusable (a versions that is not an object, an
absent/empty name) is denied wholesale. A version's purely advisory fields
degrade in the wire layer (Ecluse.Core.Registry.Npm.Wire) without dropping the
version. Every drop is recorded as an InvalidEntry in
infoInvalidEntries (a version-manifest, dist-tag, or
publish-time drop, each carrying its key and reason), so the serve path can log
what an upstream served malformed rather than dropping it silently.
Signal mapping
The npm-specific fields collapse onto the normalised, ecosystem-blind signals:
- install-script presence →
CodeExecSignal, read fail-closed across two independent wire signals. A version runs code on install when either the abbreviated form'shasInstallScriptflag istrueor thescriptsmap declares any ofpreinstall/install/postinstall(matching what npm itself sets the flag from). The two fields are independent on the wire, so thescriptsmap is consulted __even whenhasInstallScriptis present andfalse__: a hostile upstream must not be able to mask a real install hook by lying in the sibling flag, so a declared script is authoritative and the signal is the union of the two, never the flag overriding a script. A version with neither signal maps toNoCodeOnInstall(both metadata forms always carry thescripts/hasInstallScriptinformation, so its absence is a determination, not an unknown). deprecated→Availability: a notice yieldsDeprecated(carrying the message), its absenceAvailable. npm has no per-version yank, soYankednever arises here.dist→ a single-elementNonEmptyofArtifact(npm publishes exactly one tarball per version). Both integrity digests survive when present and well-formed:dist.shasumas aSHA1Hashanddist.integrityas anSRIHash. Carrying both is load-bearing -- a cross-upstream merge compares the same version's integrity across the private and public registries to detect a supply-chain divergence, which dropping either digest would blind. Each digest is built through the validatingmkHash, so a malformed one -- empty ("shasum":""/"integrity":""), truncated, non-hex, or bad-base64 -- is unconstructable and so treated as absent, never as a degenerateHash: a digest that ties the version to no tamper-evident fingerprint must not slip past the public-integrity admission gate._npmUser→pkgPublisher(who pushed this version -- provenance). It rides on the version object but is not modelled by the wire manifest, so the projection reads it directly from the version object here.time[version]→pkgPublishedAt. The publish timestamp lives in the packument'stimemap, not the manifest; a version with notimeentry (or an abbreviated document, which omitstime) projects toNothing.
Trust is left TrustUnknown: establishing it needs signature verification
against npm's published keys, a fetch this pure projection does not perform.
Name as a validation input
The requested PackageName -- the identity the proxy resolved from the route -- is
the validation authority for the served packument's name, never a rewrite of
it. The packument projection takes the requested name and checks the upstream's
self-reported top-level name against it: a document whose self-report agrees is a
Projected PackageInfo carrying the name the upstream genuinely reported; a
document whose self-report disagrees is a NameMismatch, so the caller can
treat that origin as untrusted for this request and drop its contribution. The
served name is therefore always a value an upstream genuinely reported, never a
substituted or manufactured one. An absent or otherwise undecodable name remains
a ParseError, as before -- distinct from a present-but-different name.
Synopsis
- parsePackageInfo :: PackageName -> RegistryResponse -> Either ParseError PackageInfo
- parsePackageInfoFromValue :: PackageName -> Value -> Either ParseError Projection
- parseVersionDetails :: RegistryResponse -> Version -> Either ParseError PackageDetails
- parseVersionList :: RegistryResponse -> Either ParseError [Version]
- projectVersionEntry :: PackageName -> Version -> Maybe UTCTime -> Value -> Maybe PackageDetails
- enforceTarballScheme :: Text -> PackageInfo -> PackageInfo
- enforceTarballSchemeDetails :: Text -> PackageDetails -> Maybe PackageDetails
- data Projection
- projectName :: Text -> Either ParseError PackageName
Projection
parsePackageInfo :: PackageName -> RegistryResponse -> Either ParseError PackageInfo Source #
Project a fetched metadata response into the packument-level PackageInfo for
the requested package. Pure and total: a body that is not a decodable npm packument
is reported as a ParseError, never thrown.
The requested name is the validation authority. A document whose self-reported name
disagrees with the request cannot yield a valid view of the requested package,
so it is reported as a ParseError here -- the typed-view accessor admits only a
matching document. The finer Projection (a mismatch distinguished from a decode
failure) is surfaced by parsePackageInfoFromValue, which the serve layer uses to
distinguish a misreporting origin from an undecodable one.
parsePackageInfoFromValue :: PackageName -> Value -> Either ParseError Projection Source #
Project an already-decoded packument Value into a Projection for the
requested package, without re-parsing any bytes. This is the entry point the serve
layer uses when it has already decoded the upstream body to a raw Value (the
document it edits in place to serve) and wants the typed view of the same
document: projecting from the Value reuses that one parse rather than tokenising
the bytes a second time. Pure and total -- a Value that is not a decodable npm
packument is reported as a ParseError, never thrown.
The requested name validates the self-reported name: a match is Projected, a
disagreement is NameMismatch. The serve layer drops a NameMismatch origin's
contribution (an untrusted, misreporting upstream) and keeps the served name a value
some upstream genuinely reported.
parseVersionDetails :: RegistryResponse -> Version -> Either ParseError PackageDetails Source #
Project a fetched metadata response into the PackageDetails for a single
version. Fails with a ParseError if the body does not decode or the requested
version is absent from the packument.
parseVersionList :: RegistryResponse -> Either ParseError [Version] Source #
Extract the list of available versions from a fetched metadata response, in
the packument's versions key order. Fails with a ParseError only if the body
does not decode.
projectVersionEntry :: PackageName -> Version -> Maybe UTCTime -> Value -> Maybe PackageDetails Source #
Project a single version object -- one entry of a packument's versions map,
as a raw Value -- into its PackageDetails, given the requested package name, the
version key it sits under, and its publish time (the packument's time[version], if
present). Nothing when the version object does not decode in a required/security-
decisive field, exactly the per-version drop the full packument projection applies.
This is the per-version projection step factored out so a selective single-version
decode (see Ecluse.Core.Registry.Npm.SelectiveDecode), which extracts only the one
version object and its publish time from the packument bytes, projects it through the
same code the whole-packument path runs over every version -- so the resulting
PackageDetails is identical to -ing the version out of a full
lookupparsePackageInfo. The element-wise leniency is identical too: a version object missing
its dist/tarball (or otherwise unprojectable) yields Nothing, i.e. a genuine
absence, never a half-built snapshot.
Egress-scheme normalisation
enforceTarballScheme :: Text -> PackageInfo -> PackageInfo Source #
Normalise every served version's dist.tarball scheme against the https-only
egress policy (resolveTarballUrl), given the
upstreamBaseUrl the packument was served from. An https tarball is kept, a same-host
http tarball is upgraded to https, and a version whose tarball is http on a
foreign host (or any non-http(s) URL) is dropped from the served set and recorded as
an InvalidVersionManifest carrying the offending URL (the #486
drop-and-record contract), so the version is never dialled in plaintext and the drop is
observable.
The enforcement applies only when the upstream is https (in production every configured upstream is https by construction). A non-https upstream is the test/dev loopback opt-in, whose tarballs are left untouched. Applied as a projection post-step at the fetch boundary, where the upstream URL is known, so the projection stays context-free.
enforceTarballSchemeDetails :: Text -> PackageDetails -> Maybe PackageDetails Source #
The single-version form of enforceTarballScheme for the selective decode path:
Nothing drops the version (its dist.tarball is non-https and not upgradeable), a
Just carries the version with each artifact's URL normalised to https. A non-https
(test/dev loopback) upstream leaves the version untouched.
Name validation
data Projection Source #
The outcome of projecting an upstream packument against the requested package name (see the module header, "Name as a validation input").
The requested name validates the document; it never rewrites it. A document whose
self-reported name agrees with the request is Projected; one that disagrees is a
NameMismatch. The PackageInfo of a Projected carries the name the upstream
genuinely reported (which, having matched, equals the requested name) -- never a
substituted value.
Constructors
| Projected PackageInfo | The document decoded and its self-reported name matched the request. |
| NameMismatch Text | The document decoded but self-reported this different name (carried verbatim for the audit log). |
Instances
| Show Projection Source # | |
Defined in Ecluse.Core.Registry.Npm.Project Methods showsPrec :: Int -> Projection -> ShowS # show :: Projection -> String # showList :: [Projection] -> ShowS # | |
| Eq Projection Source # | |
Defined in Ecluse.Core.Registry.Npm.Project | |
projectName :: Text -> Either ParseError PackageName Source #
Parse an npm package name into the domain PackageName, splitting a scoped
@scope/name into its Scope and bare name. Fails with a ParseError on an
empty name; a non-scoped or well-formed scoped name always succeeds.
This is the npm name canonicaliser: equality on the resulting PackageName is
ecosystem-aware (npm is case-sensitive), so it is the agreement test both the read
path (an upstream's self-reported name against the request) and the publish path (a
document body's declared _id/name/versions[].name against the URL-path name)
compare against -- never a byte-for-byte string compare, so an encoding variant of the
same name cannot disagree silently.