Skip to main content
How-To9 min read

Webhook Delivery Reliability — Retries, Dead Letters, and Replay

Deep dive into XRNotify delivery reliability: retry strategies with exponential backoff, dead-letter queues, idempotency guarantees, event replay, and delivery health monitoring.

ByAli Morgan·

When your application depends on real-time XRPL events to trigger payments, update balances, or execute business logic, a single missed webhook can cascade into reconciliation nightmares. XRNotify treats delivery reliability as a first-class concern, not an afterthought. Every webhook passes through a multi-stage pipeline engineered for durability, observability, and correctness. This guide walks through each layer of the XRNotify delivery reliability stack so you can build with confidence.

The Delivery Pipeline

Understanding the full lifecycle of a webhook delivery helps you reason about failure modes and design resilient consumers. Inside XRNotify, every event travels through a well-defined pipeline before it reaches your endpoint.

Event Detection

XRNotify maintains persistent connections to the XRP Ledger through a pool of geographically distributed validator nodes. When a ledger closes or a transaction is validated, the platform captures the raw event and normalizes it into a canonical payload. This normalized event is stamped with a globally unique event_id, a webhook_id linking it to your subscription, and a millisecond-precision timestamp.

Queue Ingestion

The normalized event is written to a durable, partitioned message queue. XRNotify uses partitioning keyed on webhook_id to preserve per-subscription ordering while allowing horizontal scale-out across delivery workers. Messages are persisted to disk before the write is acknowledged, so events survive worker restarts and infrastructure failures.

Delivery Attempt

A delivery worker dequeues the event and issues an HTTP POST to the endpoint URL you configured. XRNotify considers a delivery successful when your server responds with any 2xx status code within the configured timeout window (default 15 seconds, configurable up to 30 seconds on paid plans). The request includes the JSON payload, an HMAC signature header, and metadata headers such as X-XRNotify-Event-Id and X-XRNotify-Attempt.

Success, Retry, or Dead-Letter

If the endpoint responds with a 2xx, the delivery is marked as successful and the event is removed from the active queue. If the request times out, the connection is refused, or the server returns a 4xx or 5xx status, XRNotify schedules a retry according to its exponential backoff policy. After all retry attempts are exhausted, the event is moved to a dead-letter queue (DLQ) for manual inspection and replay. At no point is an event silently dropped.

Retry Strategy

Transient failures are the norm in distributed systems. DNS hiccups, deployment windows, rate-limit responses, and load-balancer drains all cause temporary unavailability. XRNotify uses an exponential backoff strategy with jitter to maximize the chance of eventual delivery without overwhelming recovering endpoints.

The Retry Schedule

XRNotify attempts delivery up to 10 times following the initial failed attempt. The base intervals between retries are:

  1. Attempt 2 -- 1 second after the first failure
  2. Attempt 3 -- 5 seconds
  3. Attempt 4 -- 30 seconds
  4. Attempt 5 -- 2 minutes
  5. Attempt 6 -- 10 minutes
  6. Attempt 7 -- 30 minutes
  7. Attempt 8 -- 1 hour
  8. Attempt 9 -- 3 hours
  9. Attempt 10 -- 6 hours
  10. Attempt 11 -- 12 hours

This schedule gives your endpoint approximately 23 hours of total retry runway. Short intervals at the start recover from brief blips quickly, while the longer tail accommodates extended outages such as failed deployments or cloud-provider incidents.

Why Jitter Matters

If XRNotify retried thousands of failed deliveries at exactly the same wall-clock second, the resulting traffic spike could re-trigger the very failure it was trying to recover from -- a phenomenon called the thundering herd problem. To prevent this, XRNotify adds randomized jitter to every retry delay. Each retry fires within a window of plus or minus 20% of the base interval, spreading the load across time and giving downstream services room to recover gracefully. This is a standard practice recommended by AWS, Google Cloud, and Stripe for any system that performs automated retries.

Automatic Pausing

If your endpoint fails consistently across multiple events, XRNotify automatically pauses the webhook subscription and notifies you via email. This circuit-breaker behavior protects both your infrastructure and XRNotify's delivery workers from wasting resources on an endpoint that is clearly down. Once you resolve the underlying issue, you can resume the subscription from the dashboard or API, and any events that accumulated during the pause will be replayed.

Dead-Letter Queues

Even with 10 retry attempts spanning nearly a day, some deliveries will ultimately fail. Rather than discarding these events, XRNotify routes them to a dedicated dead-letter queue (DLQ) tied to your webhook subscription.

What Lands in the DLQ

An event enters the DLQ after the final retry attempt (attempt 11) returns a non-2xx response or times out. The DLQ entry stores the full original payload, the complete retry history (timestamps, status codes, and truncated response bodies), and the reason for the final failure. This gives you all the context you need to diagnose the root cause without guessing.

DLQ Retention

XRNotify retains DLQ events for 30 days on paid plans and 7 days on the free tier. After the retention window closes, events are permanently deleted. If you need longer retention, you can export DLQ events via the API or configure a webhook that forwards DLQ notifications to your own archival storage.

Replaying from the DLQ

You can replay individual events or bulk-replay the entire DLQ contents from the XRNotify dashboard or via the REST API. A replay re-enters the event into the delivery pipeline as a fresh attempt, giving it a new set of retries. This is the recommended workflow after you fix a bug in your consumer or restore a downed endpoint.

# Replay a single DLQ event
curl -X POST https://api.xrnotify.io/v1/webhooks/{webhook_id}/dlq/{dlq_event_id}/replay \
  -H "Authorization: Bearer YOUR_API_KEY"

# Bulk replay all DLQ events for a webhook
curl -X POST https://api.xrnotify.io/v1/webhooks/{webhook_id}/dlq/replay-all \
  -H "Authorization: Bearer YOUR_API_KEY"

Idempotency

In any at-least-once delivery system, your consumer may receive the same event more than once. Network timeouts can cause XRNotify to retry a delivery that actually succeeded on the server side but whose acknowledgment was lost in transit. Designing for idempotency ensures that processing the same event twice produces the same result as processing it once.

Uniqueness Guarantees from XRNotify

Every delivery carries two identifiers that together form a unique key: webhook_id and event_id. The webhook_id identifies your subscription, and the event_id identifies the specific ledger event. XRNotify guarantees that the pair (webhook_id, event_id) is globally unique across all deliveries. If you store this composite key in your database before processing, you can detect and skip duplicates trivially.

Source-Level Deduplication

XRNotify also deduplicates at the source. If a ledger event is observed multiple times due to node failover or stream reconnection, the platform recognizes the duplicate based on the transaction hash and ledger index and suppresses it before it enters the delivery queue. This means your consumer sees far fewer duplicates than it would with a naive pub/sub relay, but you should still implement idempotent handlers as a defense-in-depth measure.

Implementing Idempotent Handlers

The simplest approach is an idempotency table keyed on event_id. Before performing any side effects (crediting a balance, sending a notification, updating a record), check whether the event_id already exists in the table. If it does, return 200 OK immediately. If it does not, insert the event_id, perform your logic, and commit the transaction atomically. This pattern works with any relational database and prevents double-processing even under concurrent delivery.

Event Replay

Beyond DLQ replay, XRNotify supports on-demand event replay for any successfully delivered event within the retention window. This feature is invaluable for backfilling data, debugging production issues, and recovering from application-level failures that have nothing to do with delivery itself.

Replay via Dashboard and API

In the XRNotify dashboard, navigate to the delivery log for any webhook and select one or more events to replay. Alternatively, use the REST API to trigger replay programmatically. You can replay by event ID, by time range, or by filtering on event type. Replayed events are delivered with the header X-XRNotify-Replay: true so your consumer can distinguish replays from live deliveries if needed.

Replay Time Window

XRNotify retains event payloads for 30 days on paid plans and 3 days on the free tier. Within this window, any event can be replayed regardless of its original delivery status. Events older than the retention window are purged and cannot be replayed.

Common Use Cases

  • Backfilling: You deploy a new feature that needs historical data. Replay the last 7 days of payment events to populate your new tables without writing a custom scraper.
  • Debugging: A customer reports a missing transaction. Replay the specific event against a staging endpoint with verbose logging to trace exactly what your handler did.
  • Disaster recovery: Your database crashed and the last backup is 6 hours old. Replay all events from the last 6 hours to bring your state back to current.

Delivery Logs

Observability is the foundation of reliability. XRNotify records comprehensive delivery logs for every webhook attempt, giving you the data you need to diagnose issues without guesswork.

What Gets Logged

Each delivery attempt produces a log entry containing the following fields:

  • Request payload: The full JSON body sent to your endpoint, including all headers.
  • Response status code: The HTTP status code returned by your server (or a timeout/connection-error indicator if no response was received).
  • Response body: The first 1 KB of the response body, which is often enough to capture error messages from your application.
  • Latency: The time in milliseconds from the start of the TCP connection to the receipt of the last response byte. This helps you identify slow endpoints before they start timing out.
  • Attempt number: Which attempt this was (1 through 11), making it easy to see how far through the retry schedule an event progressed.
  • Retry history: For events that required multiple attempts, the full timeline of all prior attempts with individual timestamps, status codes, and latencies.

Accessing Delivery Logs

Delivery logs are available in the XRNotify dashboard under each webhook subscription. You can filter by status (success, failed, retrying, dead-lettered), date range, event type, and response code. For programmatic access, the GET /v1/webhooks/{webhook_id}/deliveries endpoint returns paginated delivery log entries with the same fields visible in the dashboard. Logs are retained for 30 days on all plans.

HMAC Signature Verification

Reliability is not just about getting events to your endpoint -- it is also about ensuring that the events are authentic. XRNotify signs every webhook payload with HMAC-SHA256, allowing your consumer to verify that the request genuinely originated from XRNotify and was not tampered with in transit.

How Signing Works

When you create a webhook subscription, XRNotify generates a unique signing secret for that subscription. On every delivery, XRNotify computes an HMAC-SHA256 hash of the raw request body using this signing secret and includes the result in the X-XRNotify-Signature header as a hex-encoded string.

Verification Steps

  1. Read the raw request body as bytes. Do not parse the JSON first, because serialization differences can change the byte representation and invalidate the signature.
  2. Retrieve your signing secret from a secure location (environment variable, secrets manager, etc.).
  3. Compute the HMAC-SHA256 hash of the raw body using the signing secret.
  4. Compare your computed hash with the value in the X-XRNotify-Signature header using a constant-time comparison function to prevent timing attacks.
  5. If the hashes match, the payload is authentic. If they do not match, reject the request with a 401 status code.

Example: Node.js Verification

import crypto from 'node:crypto';

function verifyXRNotifySignature(
  rawBody: Buffer,
  signatureHeader: string,
  signingSecret: string
): boolean {
  const expected = crypto
    .createHmac('sha256', signingSecret)
    .update(rawBody)
    .digest('hex');

  return crypto.timingSafeEqual(
    Buffer.from(expected, 'hex'),
    Buffer.from(signatureHeader, 'hex')
  );
}

// In your Express handler:
// app.post('/webhooks/xrnotify', express.raw({ type: '*/*' }), (req, res) => {
//   const signature = req.headers['x-xrnotify-signature'] as string;
//   if (!verifyXRNotifySignature(req.body, signature, process.env.XRNOTIFY_SECRET!)) {
//     return res.status(401).send('Invalid signature');
//   }
//   // Process the event...
//   res.status(200).send('OK');
// });

Always verify signatures in production. Skipping verification exposes your endpoint to spoofed events, which could trigger unauthorized balance changes, fake transaction alerts, or other dangerous side effects.

Monitoring Webhook Health

XRNotify provides a real-time health dashboard for every webhook subscription, giving you at-a-glance visibility into delivery performance and early warning when things start to degrade.

Dashboard Metrics

The webhook health dashboard displays the following key metrics, updated in real time:

  • Success rate: The percentage of deliveries that succeeded on the first attempt over the selected time window (1 hour, 24 hours, 7 days, or 30 days). A healthy webhook should maintain a first-attempt success rate above 99%.
  • Latency percentiles: p50, p95, and p99 response times from your endpoint. If your p99 is approaching the timeout window, you should optimize your handler or increase the timeout setting before failures begin.
  • Failure reasons: A breakdown of failure causes -- connection refused, DNS resolution failure, TLS handshake error, timeout, and HTTP error codes. This helps you pinpoint whether the problem is on your side, your cloud provider, or in between.
  • Active vs. paused webhooks: Quick status indicators showing which of your webhook subscriptions are actively delivering, which are paused due to consecutive failures, and which you have manually paused.
  • Event throughput: The number of events processed per minute, broken down by event type. Useful for spotting unexpected spikes or drops in XRPL activity that might indicate ledger congestion or a change in your account filters.

Alerting on Degraded Delivery

XRNotify supports configurable alerting thresholds for webhook health. You can set alerts that trigger when the first-attempt success rate drops below a percentage you define (for example, 95%), when p95 latency exceeds a threshold, or when the DLQ depth grows beyond a specified count. Alerts are delivered via email by default, with Slack and PagerDuty integrations available on the Business plan.

Proactive Health Checks

On paid plans, XRNotify can send periodic health-check pings to your endpoint even when no real events are pending. These synthetic requests carry a X-XRNotify-Ping: true header and an empty payload. If your endpoint fails to respond to three consecutive pings, XRNotify marks the subscription as unhealthy and sends you an alert before real events start failing. This early warning system lets you fix issues during quiet periods rather than discovering them during a surge of ledger activity.

XRNotify is built from the ground up to ensure that every XRPL event reaches your application. The combination of durable queues, exponential backoff with jitter, dead-letter queues, idempotency guarantees, event replay, detailed delivery logs, cryptographic signature verification, and real-time health monitoring gives you a delivery pipeline you can trust in production. Whether you are processing a handful of wallet notifications or millions of ledger events per day, XRNotify scales its reliability guarantees to match your workload.

Start monitoring XRPL events

Create your free XRNotify account and receive real-time webhook notifications in minutes.

Get Started Free

Related Articles