| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Registry.Npm.SelectiveDecode
Contents
Description
A selective decode of an npm packument: pull one version's pieces out of the document bytes without materialising the other versions.
The whole-packument decode (aeson's eitherDecodeStrict) builds a Value for every
version -- and on a heavy packument (thousands of versions, multiple megabytes) that
decode dominates the serve-path cost. But the tarball gate consults a single
version: it needs that version's manifest object, its time[version] publish stamp, and
the document's self-reported name -- nothing of the other versions. This module walks
the registry's own JSON token stream (aeson's Data.Aeson.Decoding, no new
dependency) and materialises a Value only for those few pieces, __skipping every other
version's tokens without allocating them__. The win is on the parse, not the fetch:
the full bytes are still read (npm carries time only in the full document), but they
are parsed selectively -- O(1 version) work and residency rather than O(N).
Faithful to the whole-document decode
The skip is not a shortcut past validation. The walk consumes the entire token stream, so:
- malformed JSON anywhere surfaces as
SelectiveUndecodable-- the lexer reaches the offending bytes whether or not they sit in the requested version (matchingeitherDecodeStrictfailing the whole body); - trailing non-whitespace after the top-level object is rejected likewise (the same
end-of-input check
eitherDecodeStrictapplies); - every value is depth-bounded at the same budget
checkNestingDepthwould apply to it, so a deeply-nested sub-tree anywhere is aSelectiveTooDeeplyNestedbreach, not a serve.
The two pieces it does build -- the requested version object and the document name --
are produced by the same aeson Value decoder the whole-document path uses, so
projecting them yields a byte-for-byte identical PackageDetails
(the projection is "Ecluse.Core.Registry.Npm.Project.projectVersionEntry", run over the
same Value).
What it deliberately does not re-validate
The selective walk reaches only the requested version's time entry: a structurally
malformed-JSON one anywhere is still SelectiveUndecodable (the lexer reaches it), but a
schema-invalid sibling (a non-ISO time string for another version, a non-string
dist-tags value) is skipped unallocated and never inspected. The whole-document
decode degrades the same way: it drops a malformed time/dist-tags entry per-entry
(graceful per-entry degradation) rather than failing the document, so neither path
refuses a sound version over an unrelated sibling malformation. The two paths agree on
what is served (the one sound version, identically projected) and differ only in
tracking: the whole-document projection records each dropped sibling as an
InvalidEntry for the serve-path log, while this walk, skipping the
siblings unallocated, cannot report them (the degenerate tracking a single-version read
inherently has). The requested version's own schema-invalid stamp folds, on both paths,
to a version with no known publish time (the projecting caller's lenient parse), never a
document failure.
Synopsis
The selective decode
data SelectedVersion Source #
The pieces a selective decode pulls out of a packument for one requested version:
the document's self-reported name, the requested version's manifest object and publish
stamp (each as the raw Value the same projection the whole-document path uses then
consumes), and the raw number of entries in the versions object.
Each value field is Nothing when its key is absent from the document, so the caller
reproduces the whole-document outcome: an absent name is the empty-name decode failure,
an absent version object is a genuine miss, an absent time entry is a version with no
known publish stamp. The svVersionCount is the count the caller bounds against
maxVersionCount.
Constructors
| SelectedVersion | |
Fields
| |
Instances
| Show SelectedVersion Source # | |
Defined in Ecluse.Core.Registry.Npm.SelectiveDecode Methods showsPrec :: Int -> SelectedVersion -> ShowS # show :: SelectedVersion -> String # showList :: [SelectedVersion] -> ShowS # | |
| Eq SelectedVersion Source # | |
Defined in Ecluse.Core.Registry.Npm.SelectiveDecode Methods (==) :: SelectedVersion -> SelectedVersion -> Bool # (/=) :: SelectedVersion -> SelectedVersion -> Bool # | |
data SelectiveError Source #
Why a selective decode could not yield a SelectedVersion -- the two refusal causes
the whole-document decode would also raise, so the caller maps them onto the same
MetadataError the full path does.
Constructors
| SelectiveUndecodable | The body was not a well-formed JSON object (or carried trailing non-whitespace). |
| SelectiveTooDeeplyNested | Some value nested deeper than the depth budget allowed. |
Instances
| Show SelectiveError Source # | |
Defined in Ecluse.Core.Registry.Npm.SelectiveDecode Methods showsPrec :: Int -> SelectiveError -> ShowS # show :: SelectiveError -> String # showList :: [SelectiveError] -> ShowS # | |
| Eq SelectiveError Source # | |
Defined in Ecluse.Core.Registry.Npm.SelectiveDecode Methods (==) :: SelectiveError -> SelectiveError -> Bool # (/=) :: SelectiveError -> SelectiveError -> Bool # | |
selectVersionFromPackument :: Int -> Version -> ByteString -> Either SelectiveError SelectedVersion Source #
Selectively decode a packument's bytes for one version: walk the token stream,
extracting the document name, the requested version's object and time entry, and the
versions count, while skipping every other version's tokens unallocated and bounding
every value at maxDepth levels (the maxNestingDepth budget, so
the depth bound matches checkNestingDepth over the whole
document).
The body must be a well-formed JSON object with nothing but whitespace after it, or the
result is SelectiveUndecodable -- exactly as eitherDecodeStrict would fail it.