ecluse:ecluse-core
Safe HaskellNone
LanguageGHC2021

Ecluse.Core.Worker.Liveness

Synopsis

Documentation

data WorkerHeartbeat Source #

The mirror worker's consume-loop heartbeat: the wall-clock time of the worker's last successful poll of the queue.

It is the worker's own liveness signal, kept apart from the server's HTTP readiness so single-process health reflects a stalled worker today and a future standalone worker binary keeps the same probe. The worker recordPolls after each successful receive (whether or not the batch was empty -- an empty long-poll is a healthy idle, not a stall); a liveness probe reads lastPoll and compares it against the wall clock to decide whether the loop has gone quiet for too long.

newWorkerHeartbeat :: IO WorkerHeartbeat Source #

Build a fresh WorkerHeartbeat with no poll yet recorded (lastPoll is Nothing until the worker's first successful receive).

recordPoll :: WorkerHeartbeat -> UTCTime -> IO () Source #

Record the time of a successful queue poll, advancing the heartbeat. Called by the worker after each receive returns (the loop is alive even on an empty batch).

lastPoll :: WorkerHeartbeat -> IO (Maybe UTCTime) Source #

The time of the worker's last successful poll, or Nothing before its first. A liveness probe reads this and compares it against the wall clock.

workerHeartbeatStaleAfter :: NominalDiffTime Source #

How long the worker's last successful poll may be stale before the loop is considered stalled -- the staleness threshold the liveness probe applies.

It is a generous multiple of the long-poll cadence: a healthy idle worker still completes a poll at least every sqsWaitSeconds (≤ 20s by default), so a gap several times that is a genuine stall, not an idle queue. Set well above one poll window so liveness never flaps on normal scheduling jitter.

heartbeatHealthy :: UTCTime -> Maybe UTCTime -> Bool Source #

Whether the worker's consume loop is healthy as of now, given its last successful poll. This is the liveness signal the single-process /livez probe folds in (see Ecluse.Server), distinct from HTTP readiness.

  • Nothing (no poll yet) is healthy: the worker is still starting, not stalled.
  • A poll within workerHeartbeatStaleAfter is healthy.
  • A poll older than that is unhealthy: the loop has gone quiet for too long.
>>> import Data.Time (UTCTime (UTCTime), fromGregorian, secondsToDiffTime)
>>> let t0 = UTCTime (fromGregorian 2020 1 1) (secondsToDiffTime 0)
>>> heartbeatHealthy t0 Nothing
True
>>> let now = UTCTime (fromGregorian 2020 1 1) (secondsToDiffTime 10)
>>> heartbeatHealthy now (Just t0)
True
>>> let later = UTCTime (fromGregorian 2020 1 1) (secondsToDiffTime 300)
>>> heartbeatHealthy later (Just t0)
False

heartbeatHealthyNow :: WorkerHeartbeat -> IO Bool Source #

Read the worker heartbeat and decide liveness against the current wall clock -- the IO wrapper the liveness probe calls. True while the consume loop is alive (or still starting); False once the last successful poll is staler than workerHeartbeatStaleAfter.