ecluse:ecluse-core
Safe HaskellNone
LanguageGHC2021

Ecluse.Core.Registry.Npm.SelectiveDecode

Description

A selective decode of an npm packument: pull one version's pieces out of the document bytes without materialising the other versions.

The whole-packument decode (aeson's eitherDecodeStrict) builds a Value for every version -- and on a heavy packument (thousands of versions, multiple megabytes) that decode dominates the serve-path cost. But the tarball gate consults a single version: it needs that version's manifest object, its time[version] publish stamp, and the document's self-reported name -- nothing of the other versions. This module walks the registry's own JSON token stream (aeson's Data.Aeson.Decoding, no new dependency) and materialises a Value only for those few pieces, __skipping every other version's tokens without allocating them__. The win is on the parse, not the fetch: the full bytes are still read (npm carries time only in the full document), but they are parsed selectively -- O(1 version) work and residency rather than O(N).

Faithful to the whole-document decode

The skip is not a shortcut past validation. The walk consumes the entire token stream, so:

  • malformed JSON anywhere surfaces as SelectiveUndecodable -- the lexer reaches the offending bytes whether or not they sit in the requested version (matching eitherDecodeStrict failing the whole body);
  • trailing non-whitespace after the top-level object is rejected likewise (the same end-of-input check eitherDecodeStrict applies);
  • every value is depth-bounded at the same budget checkNestingDepth would apply to it, so a deeply-nested sub-tree anywhere is a SelectiveTooDeeplyNested breach, not a serve.

The two pieces it does build -- the requested version object and the document name -- are produced by the same aeson Value decoder the whole-document path uses, so projecting them yields a byte-for-byte identical PackageDetails (the projection is "Ecluse.Core.Registry.Npm.Project.projectVersionEntry", run over the same Value).

What it deliberately does not re-validate

The selective walk reaches only the requested version's time entry: a structurally malformed-JSON one anywhere is still SelectiveUndecodable (the lexer reaches it), but a schema-invalid sibling (a non-ISO time string for another version, a non-string dist-tags value) is skipped unallocated and never inspected. The whole-document decode degrades the same way: it drops a malformed time/dist-tags entry per-entry (graceful per-entry degradation) rather than failing the document, so neither path refuses a sound version over an unrelated sibling malformation. The two paths agree on what is served (the one sound version, identically projected) and differ only in tracking: the whole-document projection records each dropped sibling as an InvalidEntry for the serve-path log, while this walk, skipping the siblings unallocated, cannot report them (the degenerate tracking a single-version read inherently has). The requested version's own schema-invalid stamp folds, on both paths, to a version with no known publish time (the projecting caller's lenient parse), never a document failure.

Synopsis

The selective decode

data SelectedVersion Source #

The pieces a selective decode pulls out of a packument for one requested version: the document's self-reported name, the requested version's manifest object and publish stamp (each as the raw Value the same projection the whole-document path uses then consumes), and the raw number of entries in the versions object.

Each value field is Nothing when its key is absent from the document, so the caller reproduces the whole-document outcome: an absent name is the empty-name decode failure, an absent version object is a genuine miss, an absent time entry is a version with no known publish stamp. The svVersionCount is the count the caller bounds against maxVersionCount.

Constructors

SelectedVersion 

Fields

data SelectiveError Source #

Why a selective decode could not yield a SelectedVersion -- the two refusal causes the whole-document decode would also raise, so the caller maps them onto the same MetadataError the full path does.

Constructors

SelectiveUndecodable

The body was not a well-formed JSON object (or carried trailing non-whitespace).

SelectiveTooDeeplyNested

Some value nested deeper than the depth budget allowed.

selectVersionFromPackument :: Int -> Version -> ByteString -> Either SelectiveError SelectedVersion Source #

Selectively decode a packument's bytes for one version: walk the token stream, extracting the document name, the requested version's object and time entry, and the versions count, while skipping every other version's tokens unallocated and bounding every value at maxDepth levels (the maxNestingDepth budget, so the depth bound matches checkNestingDepth over the whole document).

The body must be a well-formed JSON object with nothing but whitespace after it, or the result is SelectiveUndecodable -- exactly as eitherDecodeStrict would fail it.