| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Server.Cache
Description
The short-TTL, size-bounded metadata cache shared by the serve paths.
Resolving a package re-fetches its upstream packument, parses it, and evaluates
the rules. To avoid repeating the fetch and parse, the result (a coherent pair of
the parsed packument metadata, PackageInfo, and the raw document it was
decoded from, CacheEntry) is held here in a short-TTL, size-bounded, STM-backed
cache (the cache library backs the TTL store). Both serve paths share it: a
packument request and the tarball-gating fetch that follows reuse one fetch and
parse, and concurrent resolutions of a popular package __collapse to one upstream
call__ (single-flight).
Per-source key
A packument is fetched from two distinct upstreams, a private origin and a public
origin, whose documents differ for the same package, so one entry cannot represent
both. The key is (source, package): the source is the upstream's base URL, which
distinguishes any cached origin without naming a credential, so distinct upstreams
never cross-contaminate and the key never blurs the trust split.
Credential-free; sharing is the caller's policy
The key carries no credential dimension and the value is a canonical document, so the cache stores nothing derived from a caller's credential. Whether a given origin is handed to it, and so shared across clients, is the serve path's decision.
Under the default passthrough access strategy only the anonymous public origin is
cached. The trusted private upstream is the per-client authority: it re-authorises
each request with that client's own forwarded credential, so the serve path fetches
it per request and never hands it here. Were a private entry cached under
passthrough, the credential-free key would let one client's entry serve another
client's private document within the TTL, bypassing the upstream's authorisation.
The public origin is anonymous, so one shared entry serves every client without
crossing a trust boundary. Other strategies make a shared private entry safe by
authorising each serve before it is returned (see
docs/architecture/access-model.md → Caching); that gate lives on the serve
path, never in this store.
Coherent pair
An entry holds the parsed PackageInfo and the raw Value it was decoded
from, so a hit returns a typed view and the exact bytes that produced it. The
packument serve path needs both: it decides over the typed view but serves the raw
document edited in place, and the two must describe the same fetch.
What is cached is the metadata, not the verdict. The rules are re-evaluated on
the cached metadata each request, so time-sensitive rules
(AllowIfOlderThan) and the separately-synced advisory
tier stay correct; only each upstream's fetch and parse is memoised. The TTL is
short, and brief staleness is benign: a brand-new publish need not appear instantly
(see docs/architecture/web-layer.md → "Metadata cache").
Two properties the cache library does not provide on its own are layered here:
- Resident-byte budget with recency-aware eviction.
cacheexpires by TTL but bounds neither entry count nor memory. Each entry is wrapped with an estimate of its resident footprint (a heavy packument, parsed plus raw, costs many times its wire size) and a last-access stamp bumped on every hit. An insert first purges expired entries, then evicts the least-recently-used entries until the incoming entry fits within both a resident-byte budget (cacheMaxBytes) and an entry count (cacheMaxEntries). Recency keeps a re-accessed hot head resident under pressure while shedding the one-shot tail; the byte budget bounds memory more faithfully than a count alone. The incoming entry is always admitted (the per-entry ceiling is the upstream body cap, not this budget). - Single-flight.
cache's ownfetchWithCacheis lookup-then-fetch in plainIO, so two concurrent misses would both fetch.resolveMetadatainstead installs an in-flight marker atomically, so the first miss fetches while concurrent misses wait on its result. The leader inserts the result into the store before removing its in-flight marker, so a caller arriving the instant the fetch returns still finds either the store entry or the marker (never a gap) and never re-leads a redundant fetch.
Two coherent stores: the full packument and one version
This handle owns two stores of the same shape (the TTL + size-bound + single-flight
machinery, SingleFlight, is shared between them):
- the full-packument store (
resolveMetadata/cachedMetadata), keyed by(source, package), holding theCacheEntrydescribed above; and - a single-version store (
resolveVersion/cachedVersion), keyed by(source, package, version), holding just one version'sPackageDetails(or its determined absence, a cachedNothing): the cold tarball gate's selectively-parsed result.
They are isolated on writes: a single-version resolution caches under its own
key and never writes back to the full-packument store, so a cold tarball gate
cannot materialise a whole packument into the shared full cache. The serve path's
single-version read consults the warm full-packument store read-only first (a
packument GET followed by its tarball gate still collapses to one upstream call),
and only falls back to leading its own selective fetch into the version store when
the full entry is cold. Both stores enforce the resident-byte budget, and each
reports its own residency gauge: the full-packument store under
ecluse.metadata_cache.resident_bytes and the single-version store under
ecluse.metadata_cache.version.resident_bytes. The hit/miss counter and the
entry-count occupancy gauge stay about the full-packument store.
A third store memoises the assembled representation (resolveAssembled): the
encoded merged document, keyed by its derived validator
(packumentETag). The key is a fingerprint of
every input the document is a function of (the origin bodies, private included by
content digest; the survivor sets; the mount base), which makes the store
content-addressed: an entry can never be served stale, because changed inputs
produce a different key and simply miss. The resident-byte budget is the real bound
here, not the TTL, which only trims dead entries early. Cross-client safety follows
from the same property: a lookup key includes the digest of the private document
this request's own authorised fetch returned, so a client can only hit an entry
whose bytes its own inputs would deterministically re-produce. The transform is
shared, never the authorisation and never another client's view (the private-origin
caching prohibition is about credential-blind keying, which a content key is not).
Residency gauge: ecluse.metadata_cache.assembled.resident_bytes.
Synopsis
- data CacheConfig = CacheConfig {}
- defaultCacheConfig :: CacheConfig
- data MetadataCache
- newMetadataCache :: CacheConfig -> IO MetadataCache
- newtype Source = Source Text
- data CacheEntry = CacheEntry {}
- weighCacheEntry :: CacheEntry -> Int
- resolveMetadata :: MetricsPort -> MetadataCache -> Source -> PackageName -> IO CacheEntry -> IO CacheEntry
- resolveMetadataWith :: IO () -> MetricsPort -> MetadataCache -> Source -> PackageName -> IO CacheEntry -> IO CacheEntry
- cachedMetadata :: MetadataCache -> Source -> PackageName -> IO (Maybe CacheEntry)
- cacheSize :: MetadataCache -> IO Int
- resolveVersion :: MetricsPort -> MetadataCache -> Source -> PackageName -> Version -> IO (Maybe PackageDetails) -> IO (Maybe PackageDetails)
- resolveVersionWith :: IO () -> MetricsPort -> MetadataCache -> Source -> PackageName -> Version -> IO (Maybe PackageDetails) -> IO (Maybe PackageDetails)
- cachedVersion :: MetadataCache -> Source -> PackageName -> Version -> IO (Maybe (Maybe PackageDetails))
- resolveAssembled :: MetricsPort -> MetadataCache -> Text -> IO ByteString -> IO ByteString
Configuration
data CacheConfig Source #
The metadata cache's tunables, sourced from configuration: how long a parsed
packument stays fresh, how many distinct (source, package) entries the cache holds,
and the resident-byte budget it keeps the held entries under before it evicts.
Constructors
| CacheConfig | |
Fields
| |
Instances
| Show CacheConfig Source # | |
Defined in Ecluse.Core.Server.Cache Methods showsPrec :: Int -> CacheConfig -> ShowS # show :: CacheConfig -> String # showList :: [CacheConfig] -> ShowS # | |
| Eq CacheConfig Source # | |
Defined in Ecluse.Core.Server.Cache | |
defaultCacheConfig :: CacheConfig Source #
The default cache tunables: a 60-second TTL, 1024 entries, and a 256 MiB resident budget: short enough that a new publish appears promptly, and large enough to hold a normal install's working set of packages while capping the resident memory a handful of heavy packuments could otherwise dominate.
The cache handle
data MetadataCache Source #
The metadata-cache handle: the three single-flight stores (the full-packument
cache, the single-version cache, and the assembled-representation store). Opaque:
built with newMetadataCache and reached only through the accessors. Lives in the
composition root (one per process), so every request shares the same caches and their
connection-collapsing.
newMetadataCache :: CacheConfig -> IO MetadataCache Source #
Build a metadata cache from its configuration: the full-packument store, the single-version store, and the assembled-representation store, each over the same TTL and size bound.
Cache entries
Which upstream a cached packument was fetched from: the dimension that partitions the cache by source so distinct upstreams never share an entry.
The discriminator is the upstream's base URL: an upstream is addressed at a
distinct URL, and the URL names a location, never a credential, so keying on it
keeps the trust split intact (the cached origin is fetched with its own token, supplied
through its fetch action; the source carries none). Under the default passthrough
strategy only the anonymous public origin is cached, so in practice the cache holds one
source per package; the dimension keeps the key honest about which upstream an entry
is, never blurring the split.
data CacheEntry Source #
A coherent cache entry: the parsed PackageInfo paired with the raw Value it
was decoded from. A hit returns both, so a caller gets a typed view to decide over
and the exact bytes that produced it: the packument serve path edits the raw Value
in place and must keep its typed decision coherent with those bytes.
Constructors
| CacheEntry | |
Fields
| |
Instances
| Show CacheEntry Source # | |
Defined in Ecluse.Core.Server.Cache Methods showsPrec :: Int -> CacheEntry -> ShowS # show :: CacheEntry -> String # showList :: [CacheEntry] -> ShowS # | |
| Eq CacheEntry Source # | |
Defined in Ecluse.Core.Server.Cache | |
weighCacheEntry :: CacheEntry -> Int Source #
Estimate a CacheEntry's resident footprint in bytes as a fixed multiple of its raw
document's compact-encoded byte length. The resident cost (the parsed PackageInfo plus the
raw Value) is a near-constant multiple of the document's size, so re-encoding the raw
Value and scaling it estimates the footprint without measuring the parsed structure. The
encode is an O(document) pass run only on a leader's insert (the cold path after a fetch),
never on a hit. The multiplier is set at the high end of the observed resident-to-encoded
ratio so the estimate is an upper bound: a memory budget must not systematically under-count.
Resolution
resolveMetadata :: MetricsPort -> MetadataCache -> Source -> PackageName -> IO CacheEntry -> IO CacheEntry Source #
Resolve a package's metadata from one upstream Source, reusing the cache and
collapsing concurrent misses.
On a fresh, unexpired hit the cached CacheEntry is returned and the fetch action
is never run. On a miss the action runs exactly once even under concurrent callers:
the first installs an in-flight marker and fetches, the others wait on its result.
A successful fetch is cached (subject to the TTL and size bound); a failed fetch
caches nothing (so a transient upstream error does not poison the cache) and is
re-raised to every waiter.
A claimed in-flight slot is always eventually filled and de-registered, even if
the leader is hit by an async exception (a request timeout, a killed handler thread)
between claiming the slot and completing: the claim commits under a mask and the
leader's run is handed straight to guardInFlight, which frees the
slot on every exit and, on any exception before the marker is filled, hands that error
to every waiting follower rather than leaving them parked forever. This closes the
single-flight orphan window (without it, a cancelled leader would wedge that
(source, package) key until restart). A follower's own wait on the marker stays
interruptible.
The Source partitions the cache: distinct upstreams of the same package resolve
under distinct keys and never cross-contaminate. The fetch action supplies the origin's
own credential, so reading through one source never blurs another's trust posture.
Under the default passthrough strategy only the anonymous public origin is resolved
here: the trusted private origin is the per-client authority and is fetched per request,
never cached, so a shared entry can never serve one client another's private document.
The result is always re-decided by the caller's rules on each request -- only the fetch+parse is memoised, never the verdict.
Each resolution records the ecluse.metadata_cache.requests hit/miss counter (a
coalescing follower counts as a miss, like the leader it waits on), and a leader's
insert refreshes the ecluse.metadata_cache.entries occupancy gauge and the
ecluse.metadata_cache.resident_bytes residency gauge.
resolveMetadataWith :: IO () -> MetricsPort -> MetadataCache -> Source -> PackageName -> IO CacheEntry -> IO CacheEntry Source #
As resolveMetadata, but with a hook run on the leading thread at the
single-flight claim → fetch-runner handoff: the window between the STM transaction
committing the in-flight claim and the leader's exception guard taking ownership of
the marker. It exists only so a test can deterministically park a leader in that
window and cancel it there, exercising the orphan-window guarantee; production always
passes pure () via resolveMetadata.
cachedMetadata :: MetadataCache -> Source -> PackageName -> IO (Maybe CacheEntry) Source #
Look up a package's cached full-packument entry for one Source without fetching on a
miss: the cache's read-only view, for inspection and tests. A Nothing is a miss or an
expired entry; this never triggers a fetch and never collapses (use resolveMetadata for
the serve path).
cacheSize :: MetadataCache -> IO Int Source #
The number of full-packument entries currently held (including any not-yet-purged expired).
Single-version resolution
resolveVersion :: MetricsPort -> MetadataCache -> Source -> PackageName -> Version -> IO (Maybe PackageDetails) -> IO (Maybe PackageDetails) Source #
Resolve one version's PackageDetails (or its determined absence) from the
single-version cache, leading a selective fetch on a miss and collapsing concurrent misses
exactly as resolveMetadata does for the full packument. The cached value is the
the fetch yields, so a version determined absent over sound
metadata is cached as Maybe PackageDetailsNothing (a negative entry) and re-served without a re-fetch within
the TTL.
This writes to the single-version store only, never the full-packument store, so a cold
tarball gate's selective parse cannot materialise a whole packument into the shared full
cache. Unlike resolveMetadata, the single-version store records no hit/miss counter; a
leader's insert does refresh the single-version residency gauge
(ecluse.metadata_cache.version.resident_bytes), so the byte budget that bounds both
stores is observable on each.
resolveVersionWith :: IO () -> MetricsPort -> MetadataCache -> Source -> PackageName -> Version -> IO (Maybe PackageDetails) -> IO (Maybe PackageDetails) Source #
As resolveVersion, with the single-flight claim → fetch-runner handoff hook
resolveMetadataWith exposes, for the same orphan-window test (production passes pure ()
via resolveVersion).
cachedVersion :: MetadataCache -> Source -> PackageName -> Version -> IO (Maybe (Maybe PackageDetails)) Source #
Look up a single-version cached entry for one (source, package, version) without
fetching on a miss: the version store's read-only view (the hybrid serve path's negative/
positive lookup before it leads a selective fetch). The outer Maybe is the cache hit/miss
(an expired or absent entry is Nothing); the inner is the
cached result (a version determined absent is a cached Maybe PackageDetails).Just Nothing
Assembled-representation resolution
resolveAssembled :: MetricsPort -> MetadataCache -> Text -> IO ByteString -> IO ByteString Source #
Resolve the assembled representation for one derived validator, leading the
render (assemble + encode) on a miss and collapsing concurrent identical renders,
exactly as resolveMetadata does for a fetch.
The key is the rendered derived ETag -- a content
address over every input the served document is a function of -- so a hit is
byte-for-byte the document this request's own inputs would deterministically produce:
the store can never serve stale bytes (changed inputs miss by construction) and never
crosses a client boundary (a different private view is a different key; see the
module header). Under the TTL-zero configuration the store degrades to pure
single-flight coalescing, the same behaviour as the sibling stores.
Like the single-version store it records no hit/miss counter; a leader's insert
refreshes the ecluse.metadata_cache.assembled.resident_bytes residency gauge, so
the byte budget's third occupant is observable alongside the other two.