| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Ecluse.Pilot.Osv.Retry
Description
Backoff for Pilot's periodic osv.dev fetch.
Pilot pulls the npm advisory export from osv.dev on a schedule. When that upstream
is unreachable, throttling us, or returning 5xx, a naive retry-immediately loop
would hammer it from a single egress (NAT) address and invite an aggressive
rate-limit or an outright ban. So a transient fetch failure is retried under a
truncated exponential backoff with full jitter: each wait grows exponentially
from a base delay, is capped so it cannot run away (the "truncated" part), is
randomised across the interval [0, cap] so many Pilots do not resynchronise onto
the upstream at once, and the number of retries is bounded so the loop always
terminates and hands control back to the outer sync-interval loop rather than
spinning.
Only transient faults are retried: connection failures, timeouts, and 5xx (plus the throttling 408 and 429) responses. A clean 4xx is a permanent client-side error and a corrupt archive is a parse fault; retrying neither helps, so both fail fast.
Synopsis
- defaultOsvRetryPolicy :: forall (m :: Type -> Type). MonadIO m => RetryPolicyM m
- isRetryableHttpException :: HttpException -> Bool
- isRetryableStatusCode :: Int -> Bool
- withOsvRetry :: (MonadMask m, KatipContext m) => RetryPolicyM m -> m a -> m a
- transientMessage :: RetryStatus -> HttpException -> String
Policy
defaultOsvRetryPolicy :: forall (m :: Type -> Type). MonadIO m => RetryPolicyM m Source #
The shipped osv.dev fetch backoff: full-jitter exponential backoff, capped per
attempt and bounded in count. The knobs (microseconds, the unit Control.Retry
speaks) are a 1s base doubling to a 60s ceiling, over five retries (at most six
attempts). limitRetries supplies the stop, the policy monoid short-circuits to
Nothing once the budget is spent, so the loop is finite and the worst case adds
under two minutes of waiting before the fetch gives up to the outer sync loop.
Inspect the schedule without sleeping using simulatePolicy.
Classifying a fetch failure
isRetryableHttpException :: HttpException -> Bool Source #
Should a fetch that threw this HttpException be retried? Connection failures
and timeouts are transient by nature; a status-code rejection defers to
isRetryableStatusCode; a malformed URL is a configuration fault no retry can
mend. Anything not positively known to be transient is treated as permanent, so
Pilot fails fast rather than hammering the upstream on a guess.
isRetryableStatusCode :: Int -> Bool Source #
Is this HTTP status worth retrying? A 5xx is a server-side fault that may clear, and 408 (request timeout) and 429 (too many requests) are explicit "back off and come back" signals. Every other code, in particular a 4xx that is not 408/429, is a permanent client-side error a retry cannot fix.
Running a fetch under the policy
withOsvRetry :: (MonadMask m, KatipContext m) => RetryPolicyM m -> m a -> m a Source #
Run an osv.dev fetch under a Control.Retry policy. A transient
HttpException (see isRetryableHttpException) is retried with backoff until it
either succeeds or the retry budget is spent; a permanent one is not retried.
recovering re-throws the original exception on exhaustion or when the handler
declines, so the caller's own handler (the export loop, which logs and then waits
the full sync interval) still sees it. A non-HttpException fault, for example a
corrupt-archive parse error, is not caught here and propagates unretried.
Log lines
transientMessage :: RetryStatus -> HttpException -> String Source #
The warning logged before a transient fetch failure is retried. Reports the
1-based attempt number (rsIterNumber counts retries from zero) and the cause, so
an operator reading the logs can watch the backoff engage. It depends only on its
arguments, so it can be exercised in isolation.