| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Core.Server.Route
Description
The shared serve-action vocabulary of the front door, and the agnostic default router.
A Route is one classified request -- everything the proxy is willing to serve,
named independently of any ecosystem's URL grammar. The actions are common
across registries (fetch a packument, stream a tarball, publish a first-party
package, answer a liveness probe, deny a search); only the
(method, URL)→action mapping is ecosystem-specific. That mapping is a
Classifier, injected at the composition root, so this module stays free of any
one ecosystem's path conventions while the dispatcher routes through whatever
classifier its mount carries.
The classifier is method-aware because the same path can name different
actions by HTTP method: GET /{pkg} reads a packument, PUT /{pkg} publishes
one. A read and a write are genuinely distinct serve actions (not a rendering
variation the way a HEAD is a bodiless GET), so the method is part of what the
classifier maps, and a write earns its own Route rather than being inferred at
dispatch.
The model is deny by default, mirroring the rules engine (Ecluse.Core.Rules):
the agnostic default denyAll classifies every path as Unsupported (a 404 at
the edge), so a deployment that wires no ecosystem router serves nothing rather
than guessing. An ecosystem adapter supplies a Classifier that recognises its
own paths and falls back to Unsupported for the rest.
Route is a small sum so the whole routing table is unit-testable with __no
server__: feed a Classifier some segments, assert the Route.
Synopsis
- data Route
- newtype Filename = Filename Text
- type Classifier = Method -> [Text] -> Route
- denyAll :: Classifier
- isSafeComponent :: Text -> Bool
- encodeComponent :: Text -> Text
Routes
A classified request. Everything the front door is willing to serve is one
of these; an unrecognised path is Unsupported (deny by default).
The constructors are the proxy's actions, shared across ecosystems -- the
artifact a Tarball streams and the metadata a Packument merges are the same
serve behaviour whether the upstream is npm, PyPI, or another registry. Only the
mapping from a request path to one of these (a Classifier) is
ecosystem-specific.
Constructors
| Packument PackageName | A package-metadata request -- the packument. |
| Tarball PackageName Version Filename | An artifact request, as a parsed coordinate: the package, the
|
| Publish PackageName | A first-party publish request -- |
| Ping | A registry liveness probe, answered locally. |
| Search | Package search (unsupported). |
| Unsupported | Anything unrecognised. Renders as a |
An artifact's on-the-wire file name, the agnostic artifact-name type a
Tarball route carries.
It is held as a distinct type, not a bare Text, because it is __authoritative
for fetching the bytes__: the proxy fetches an artifact at the upstream path built
from this exact name, never one reconstructed from (package, version), so that a
registry whose artifact naming differs from the proxy's own convention still
resolves. The name is preserved verbatim as received; the classifier that produces
it has already applied the component-safety gate (isSafeComponent), so the value
is safe to interpolate into a downstream URL.
Classification
type Classifier = Method -> [Text] -> Route Source #
The mapping from an ecosystem-native request to a Route.
A classifier sees the request's HTTP Method and the already-mount-stripped,
percent-decoded path segments and returns the serve action. The method is part of
the mapping because the same path names different actions by method (GET /{pkg}
reads, PUT /{pkg} publishes); a HEAD, by contrast, classifies like its GET
(it is a bodiless variation the dispatcher handles, not a distinct action). Each
ecosystem adapter contributes its own classifier -- recognising its
(method, path) grammar and denying everything else -- so the agnostic dispatcher
stays closed while every mount routes through its ecosystem's template. Dispatch
chooses the classifier per matched mount (see Ecluse.Server), so the same shape
carries either a single ecosystem or a mount-keyed selection.
denyAll :: Classifier Source #
The agnostic default classifier: every request is Unsupported.
This is the deny-by-default base a deployment runs with until a composition root wires an ecosystem's classifier in, so an unwired server serves nothing rather than guessing a grammar. It deliberately knows no path conventions of its own.
Component safety
isSafeComponent :: Text -> Bool Source #
Whether a single decoded path component is safe to interpolate into a downstream upstream URL -- the deny-by-default gate a classifier applies to every component it accepts (a scope, base name, or tarball filename).
The path is percent-decoded before it reaches us, so a single segment can carry a
'/', a '\\', a control character, or be "."/".."; any of these
enables path traversal or request smuggling once the name reaches the upstream
URL. A component is UNSAFE iff it is empty, is exactly "." or "..", or
contains a '/', a '\\', or any isControl character. Everything else
is accepted: this is a security boundary, not an ecosystem-policy validator,
so ordinary names with interior dots (lodash.merge, is.odd), hyphens,
underscores, digits, or uppercase all pass.
It lives in the agnostic layer because the threat -- interpolating a hostile segment into an upstream URL -- is ecosystem-independent; both an ecosystem's path classifier and the defence-in-depth check in Ecluse.Core.Security share this one rule.
This gate is structural: it stops a component that would change the upstream
URL's shape (a traversal, an embedded separator, a control character). It does
not stop a component that carries other URL-reserved bytes -- a '%',
'?', '#', ';', or a space -- which an accepted name can still hold
(notably a once-decoded segment carrying a literal %2e%2e%2f). Those are
neutralised not by widening this denylist but by percent-encoding every accepted
component with encodeComponent when the upstream URL is built, so the safety of
an interpolated component rests on encode-on-build, not on this gate alone.
encodeComponent :: Text -> Text Source #
Percent-encode a single decoded path component for safe interpolation
into an upstream URL -- the encode-on-build partner of isSafeComponent.
A component is the content between a URL's structural delimiters (a scope, base
name, or filename), never the delimiters themselves, so this encodes
conservatively: it keeps only the RFC 3986 unreserved set
(A-Z, a-z, 0-9, and '-', '.', '_', '~') verbatim and
percent-encodes every other byte of the component's UTF-8 encoding as
%XX (upper-case hex). A caller composing a path therefore writes the structural
'/', scope %2F, '@' sigil, and the like itself, around encoded
components -- so a '%', '/', '?', '#', ';', space, or control
byte inside a component cannot alter the URL's shape, inject a query or fragment,
or -- the once-decoded %2e%2e%2f case -- survive as a live escape a
decode-and-normalise upstream could resolve to traversal.
Encoding is per-byte over the UTF-8 form, so a multi-byte character is encoded one
%XX per byte ('é' → %C3%A9). It does not encode an already-percent-encoded
escape idempotently -- a literal '%' is always re-encoded to %25 -- which is the
point: the component is decoded content, so any '%' in it is a literal to be
escaped, not a structural escape to preserve.