ecluse
Safe HaskellNone
LanguageGHC2021

Ecluse.Runtime

Description

Resolving and applying the process's runtime posture -- how many capabilities Écluse claims and what heap ceiling it runs under -- from first-class configuration with a cgroup-derived fallback, logged at boot with each decision's provenance.

The GHC RTS sizes itself from what the machine looks like: bare -N claims a capability per visible processor, and the heap is unbounded unless -M says otherwise. In a container neither default matches the pod: a CPU limit is a cgroup quota that does not shrink the visible processor count, so the RTS claims a whole node's worth of capabilities under a two-CPU quota, and the only memory backstop is the kernel OOM killer. This module closes that gap the way Go's automaxprocs does, but config-first:

  1. Explicit configuration wins: cores (ECLUSE_CORES) and maxHeapBytes (ECLUSE_MAX_HEAP_BYTES).
  2. Omitted values fall back to the cgroup (v2): cpu.max's quota, floored (at least one) and clamped to the visible processors, and memory.max less the nursery budget and slack (deriveMaxHeapBytes). Flooring follows Go's automaxprocs: a capability count above the budget lets a stop-the-world collection outrun the CFS quota and freeze mid-pause, so a fractional entitlement is stranded rather than borrowed against.
  3. No limit found either way: the posture the RTS already resolved (its baked defaults plus any GHCRTS the operator set) stands, and the log says so.

Every decision is logged through the standard boot log with its provenance (renderRuntimePosture), so an operator reads what was decided or interpreted straight from the start-up lines.

This resolution is role-agnostic on purpose, and only the resolution: cores and the heap ceiling derive from the container's limits, which bind every role (proxy, Pilot, Dredger) alike. Workload-shaped tuning -- the allocation area, sized for the proxy's serve path -- is deliberately not modelled per role here; a role whose profile diverges is tuned per-deployment via GHCRTS until its shape earns a default of its own.

Applying the plan: setNumCapabilities, or one exec-in-place

A capability change is applied in-process (setNumCapabilities). The heap ceiling has no in-process setter -- -M is fixed when the RTS starts -- so when the plan requires one, the boot re-executes its own binary once with the resolved flags appended to GHCRTS (later flags win, verified against GHC 9.10). The exec replaces the program image in the same process: the PID never exits, so a container supervisor sees an uninterrupted process, exactly as an exec-ing entrypoint script behaves. A marker variable (reexecMarker) guards against loops: the re-launched process sees it, skips any further exec, and only logs (a warning, if the RTS still diverges from the plan -- an operator's GHCRTS fighting the config, or a flag the RTS rejected). A failure of the exec call itself is likewise degraded to a warning and an unenforced posture: tuning never loops the boot and never takes the service down.

The pure resolution (resolveRuntimePlan), the cgroup parsing (parseCpuMax, parseMemoryMax), and the rendering are separated from the thin IO shell (applyRuntimePosture) so the precedence and arithmetic are unit-tested without a cgroup in sight. Sizes are bytes everywhere here; the RTS flag fields count 4 KiB blocks and are converted at the read boundary (rtsBlockBytes).

Synopsis

Applying the resolved posture at boot

applyRuntimePosture :: (Text -> IO ()) -> (Text -> IO ()) -> Maybe Int -> Maybe Int -> IO () Source #

Resolve the runtime plan and apply it, first thing at boot.

Reads the live posture and the cgroup, resolves the plan against the given config values, and then:

  • plan already in force: log the posture lines and return;
  • only the capability count differs: apply it in-process (setNumCapabilities), log, and return;
  • a heap ceiling must be enforced: append the required flags to GHCRTS and exec this binary in place (same PID, same arguments), once, guarded by reexecMarker. The re-launched process resolves the same plan, finds it in force, and logs the posture lines as normal.

When the marker is already set and the posture still diverges (an operator's GHCRTS contradicting the config, or a flag the RTS rejected), the divergence is logged as a warning and the process continues with what the RTS gave it -- boot never loops and never aborts over tuning.

The pure resolution core

data RtsPosture Source #

The RTS posture the process is actually running with, in bytes. Read once at boot (currentRtsPosture); the plan is resolved against it and the log renders it.

Constructors

RtsPosture 

Fields

Instances

Instances details
Show RtsPosture Source # 
Instance details

Defined in Ecluse.Runtime

Eq RtsPosture Source # 
Instance details

Defined in Ecluse.Runtime

data CgroupLimits Source #

What the cgroup (v2) grants this process: the CPU quota in cores (cpu.max, quota over period) and the memory ceiling in bytes (memory.max). Nothing per axis when the file is absent (not a cgroup-v2 environment) or the value is the unlimited max sentinel.

Instances

Instances details
Show CgroupLimits Source # 
Instance details

Defined in Ecluse.Runtime

Eq CgroupLimits Source # 
Instance details

Defined in Ecluse.Runtime

data Provenance Source #

Where a resolved value came from, for the boot log's provenance clause.

Constructors

FromConfig

Explicit Écluse configuration (cores / maxHeapBytes).

FromCgroup

Derived from the cgroup limits.

FromRts

Left as the RTS resolved it (baked defaults plus any operator GHCRTS).

Instances

Instances details
Show Provenance Source # 
Instance details

Defined in Ecluse.Runtime

Eq Provenance Source # 
Instance details

Defined in Ecluse.Runtime

data RuntimePlan Source #

The resolved runtime posture: the capability count to run with and the heap ceiling to enforce, each with its provenance. A FromRts entry means "leave it alone": the plan never overrides a posture it has no better information than.

Instances

Instances details
Show RuntimePlan Source # 
Instance details

Defined in Ecluse.Runtime

Eq RuntimePlan Source # 
Instance details

Defined in Ecluse.Runtime

resolveRuntimePlan :: Maybe Int -> Maybe Int -> CgroupLimits -> RtsPosture -> RuntimePlan Source #

Resolve the runtime plan from the three layers, strongest first: explicit config, then the cgroup, then the live RTS posture.

Capabilities: an explicit cores wins; else the cgroup CPU quota rounded up (a 0.5-CPU pod still needs one capability) and clamped to the visible processors; else the RTS's own count stands. Always at least 1.

Heap ceiling: an explicit maxHeapBytes wins; else deriveMaxHeapBytes over the cgroup memory limit and the planned capability count (the nursery the process will actually run with); else the RTS posture stands -- notably, an operator's GHCRTS -M is never overridden by mere derivation, and an absent limit is left absent rather than fabricated.

deriveMaxHeapBytes :: Int -> Int -> Int -> Int Source #

The heap ceiling derived from a cgroup memory limit: the limit less the nursery budget (capabilities x allocation area -- memory the process spends over and above the heap) less 10% slack for stacks, buffers, and the RTS itself, floored at half the limit so a nursery mis-sized for a tiny pod still yields a sane ceiling rather than a vanishing (or negative) one.

requiredRtsFlags :: RtsPosture -> RuntimePlan -> [Text] Source #

The RTS flags the plan requires beyond the live posture, in GHCRTS syntax: a -N when the capability count must change, a -M when a ceiling must be enforced that is not already in force. Empty when the process is already running the plan. A FromRts entry never contributes a flag (it is the live posture).

renderRuntimePosture :: RuntimePlan -> RtsPosture -> [Text] Source #

The boot log's posture lines, one decision per line with its provenance, plus the allocation-area line (always RTS-sourced; it is deliberately not config-surfaced). Rendered from the plan, so the lines describe what the process runs with after the plan is applied.

Cgroup v2 parsing

parseCpuMax :: Text -> Maybe Double Source #

Parse a cgroup-v2 cpu.max body: "quota period" yields the granted cores (quota over period); the "max ..." sentinel (no quota) yields Nothing. A malformed body yields Nothing -- no limit is inferred from noise.

parseMemoryMax :: Text -> Maybe Int Source #

Parse a cgroup-v2 memory.max body: a byte count, or the unlimited max sentinel (Nothing). A malformed body yields Nothing.

parseCgroupSelfPath :: Text -> Maybe Text Source #

The process's cgroup-v2 path from a /proc/self/cgroup body: the 0:: line's path ("0::/a/b" yields "/a/b"). Nothing when no v2 entry is present (a pure cgroup-v1 host).

ancestorPaths :: Text -> [Text] Source #

A cgroup path and its ancestors, leaf first, ending at the root (the empty suffix): "/a/b" yields ["/a/b", "/a", ""]; the root path "/" yields just [""].