A stressed engineer confronts conflicting data on multiple screens, grappling with a complex stripe metered billing webhook reconciliation problem.

Production engineering patternUpdated June 2026

Stripe Metered Billing: Reconcile Webhooks Without Double-Counting

Q: Why does the same Stripe webhook arrive twice?

Stripe retries for up to 3 days. Dedupe on event ID.

Q: Is the Stripe idempotency_key enough to prevent double-billing?

It protects outbound calls only. Add an inbound dedupe table too.

Q: Can I rely on Stripe webhook ordering?

Order is not guaranteed. Treat each event independently.

Q: How often should the reconciliation job run?

Hourly during the day, plus a heavier nightly pass before invoicing.

Q: What if my usage row exists but Stripe has no record of it?

Replay with the original idempotency_key from the ledger.

Q: How far back can the reconciliation job pull from Stripe?

30 days from the events API. Older periods reconcile against invoices.

Q: Do I really need event sourcing for this, or can I just patch the bug?

You can patch one bug without it. The next one will hurt more.

An engineering pattern for keeping your usage ledger and Stripe's invoices in lockstep, even when webhooks retry and network calls fail mid-flight.

Get a 15-min architecture read

The problem

The bug shows up like this. A customer pings support to say their invoice is $400 higher than the usage their dashboard shows. You pull the Stripe invoice, you pull your internal usage table, and the totals do not match. Your table says 12,400 API calls. Stripe's invoice line says 14,850. Somewhere between your usage event and Stripe's metered subscription, the count drifted.

Outbound ACK timeout creates ghost events

The root cause is usually the same shape. A meter event fires, your handler POSTs it to Stripe, the request succeeds on Stripe's side, but the ACK times out on the way back to your handler. Your retry layer sees a failure and re-fires. Stripe sees a fresh request and records a second meter event. The customer gets billed for one real usage and one ghost.

Inbound webhook retries double-process

Webhooks compound the problem in the other direction. Stripe sends invoice.created, your handler runs an enrichment job, then a transient 500 trips the retry. Stripe redelivers. The handler runs again. Two enrichment rows, two webhook side effects, and now your accounting layer disagrees with Stripe.

Money flowing the wrong direction either way

The business consequence is not abstract. Either you over-bill and a customer complains (best case: refund and apology, worst case: churn and a Twitter post), or you under-bill and the leak hides for a quarter until someone notices revenue per active customer trending sideways. In a usage-billed SaaS, every duplicate meter event is real money flowing the wrong direction, and every dropped one is real money you do not collect. Finance teams catch some of it during close. Customers catch the rest.

Symptom is months downstream of cause

What makes this hard is that the symptom (a wrong invoice) is months downstream of the cause (a webhook retry in March). By the time anyone looks, the original request log is gone, the meter event is committed in Stripe, and the only fix is a credit memo and a postmortem. The pattern below stops the bug from happening in the first place and gives you a scheduled job that catches the drift early when it does.

Engineers collaborate, designing an architectural solution with abstract diagrams for reliable stripe metered billing webhook reconciliation.

What changes for your business

The architecture has three pieces that have to work together: an outbound idempotency layer for your calls to Stripe, an inbound dedupe layer for webhooks coming back, and a reconciliation job that compares the two sides on a schedule.

Event-sourced usage ledger

Start with an event-sourced usage ledger. Every billable action in your product writes one row to an append-only table before anything else happens. That table is the source of truth for billing, not your application state, not Stripe, not anything derived. The schema is small on purpose:

CREATE TABLE usage_events (
  id              uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id     text NOT NULL,
  meter_name      text NOT NULL,
  quantity        bigint NOT NULL,
  occurred_at     timestamptz NOT NULL,
  -- The same id you will send to Stripe as the idempotency_key.
  idempotency_key text NOT NULL UNIQUE,
  -- State machine: 'observed' | 'submitted' | 'confirmed' | 'reconciled'
  state           text NOT NULL DEFAULT 'observed',
  stripe_meter_event_id text,
  submitted_at    timestamptz,
  confirmed_at    timestamptz,
  reconciled_at   timestamptz
);

CREATE INDEX usage_events_pending ON usage_events (state, occurred_at)
  WHERE state IN ('observed', 'submitted');

Outbound idempotency key pinned to the row

Because the idempotency_key is generated once at write time and stored on the row, every retry of the outbound submission carries the same key. Stripe's idempotency layer persists keys for at least 24 hours and returns the original response on duplicate calls, so re-firing a submission is safe. Generate the key as a UUIDv4 at row insertion — do not derive it from request metadata that could change between attempts.

The outbound submitter is a worker that picks up rows in observed state and POSTs them to Stripe with the row's idempotency_key. On a 2xx response, it transitions to submitted and records the returned stripe_meter_event_id. On a network error, it leaves the row in observed for the next pass — the same key will be reused, and Stripe will either accept the new submission or return the cached response from the first attempt. The submitter is allowed to be at-least-once; the idempotency_key makes that safe.

async function submitUsageEvent(row: UsageEventRow): Promise<void> {
  try {
    const event = await stripe.billing.meterEvents.create(
      {
        event_name: row.meter_name,
        payload: {
          stripe_customer_id: row.customer_id,
          value: String(row.quantity),
        },
        timestamp: Math.floor(row.occurred_at.getTime() / 1000),
      },
      { idempotencyKey: row.idempotency_key },
    );

    await db.usageEvents.update(row.id, {
      state: "submitted",
      stripe_meter_event_id: event.identifier ?? null,
      submitted_at: new Date(),
    });
  } catch (err) {
    // Leave the row in 'observed'. The next pass will retry with
    // the same idempotency_key, which is safe by construction.
    logger.warn({ err, row_id: row.id }, "usage event submission failed");
  }
}

Inbound webhook dedupe transaction

The inbound webhook handler is the dedupe layer. Stripe documents that endpoints may receive the same event more than once and recommends logging processed event IDs. Honor that literally: persist every event.id you have processed and short-circuit if you see it again. Do the dedupe inside the same transaction as the side effect, or the dedupe is a lie.

async function handleStripeWebhook(event: Stripe.Event): Promise<void> {
  await db.transaction(async (tx) => {
    const inserted = await tx.processedEvents.insertIfAbsent({
      event_id: event.id,
      received_at: new Date(),
    });
    if (!inserted) return; // already processed

    switch (event.type) {
      case "billing.meter.event_summary.updated":
        await markUsageConfirmed(tx, event);
        break;
      case "invoice.finalized":
        await recordInvoiceTotals(tx, event);
        break;
      // ...
    }
  });
}

Scheduled reconciliation job

The reconciliation job is what makes this whole thing self-healing. On a schedule, it pulls usage rows in submitted state and compares per-customer-per-meter sums against Stripe's reported usage for the same window. Drift means one of two things: a row that submitted successfully but did not confirm (retry it), or a row that confirmed but Stripe shows a different quantity (alert and investigate). The query that drives the job looks roughly like this:

SELECT
  customer_id,
  meter_name,
  date_trunc('hour', occurred_at) AS bucket,
  SUM(quantity)                   AS ledger_quantity
FROM usage_events
WHERE state IN ('submitted', 'confirmed')
  AND occurred_at >= now() - interval '48 hours'
GROUP BY customer_id, meter_name, bucket
HAVING SUM(quantity) <> (
  SELECT COALESCE(SUM(s.aggregated_value), 0)
  FROM stripe_meter_event_summary_snapshot s
  WHERE s.customer_id = usage_events.customer_id
    AND s.meter_name  = usage_events.meter_name
    AND s.bucket      = date_trunc('hour', usage_events.occurred_at)
);

Local meter-summary snapshot

That snapshot table is populated by a small fetcher that calls Stripe's meter event summary endpoint and stores the result locally. You want it local because the reconciliation query runs often, and you do not want each pass to hammer the Stripe API.

A confident CTO observes a clear dashboard showing perfectly reconciled usage metrics, a testament to successful stripe metered billing webhook reconciliation.

Common failure modes

Out-of-order webhook arrival is the first sharp edge. Stripe documents that it does not promise delivery order — a customer.subscription.created event can land after the first invoice.paid event for that subscription. Handlers that assume a sequence corrupt state. The fix is to make each event type self-contained: avoid logic that says "if I see X then Y must have already happened." Read state from the ledger, not from arrival order.

The second sharp edge is the network-timeout-both-succeed pattern. Your client times out waiting for Stripe to ACK a meter event. Your retry layer fires the same request again. Both reach Stripe. Without an idempotency_key, both are accepted as distinct events and the customer is billed twice. With an idempotency_key reused from the ledger row, the second call returns the cached response from the first. This is the single most important reason the idempotency_key lives on the ledger row, not on the request.

The third is the proration race. A customer upgrades from a metered plan to a flat plan at 14:32. Your application updates their subscription. A meter event fires at 14:33, but the meter event submitter still has the row in its queue and POSTs against the now-changed subscription. Stripe attaches it to the wrong period, the invoice is wrong, and you discover it the next day. The pattern: when a subscription change happens, drain the meter event submitter queue for that customer before processing the change, and stamp the meter event with the original subscription ID for the reconciliation job to verify.

The fourth is the Stripe Tax lag. Tax line items finalize later than the usage line items they apply to. A reconciliation job that compares pre-tax totals to post-tax invoice amounts will report drift on every invoice. Compare like with like — pre-tax aggregated usage on your side against pre-tax usage line totals on Stripe's side — and let tax reconcile separately.

The fifth is the trial-flag-flipped-at-midnight bug. A customer's trial ends at 23:59:59. A meter event fires at 23:59:58, gets queued, and submits at 00:00:01 the next day. Now the event is billable when the original action was not. Stamp meter events with the trial-status-at-time-of-event from the ledger, and let the submitter respect that flag instead of re-reading current state.

What this looks like in production

At BFEAI we run a dual-pool credit billing model on top of Stripe — purchased credits and granted credits, each with their own draw-down rules. The pattern above is what keeps the two pools and Stripe's invoices in agreement. The hourly reconciliation job runs against a 48-hour rolling window. The nightly job runs against the 24-hour window that closed at 23:59 UTC and writes one audit row per customer-meter pair regardless of whether drift was found. That nightly audit row is what gives finance a clean answer to "did billing match usage on April 14" without anyone running an ad-hoc query.

The alert rule that matters most is not "drift detected." Drift happens, gets auto-replayed, and clears. The rule that pages a human is "drift detected and unresolved after the next reconciliation pass." That filters out transient cases where Stripe was just slow to process a meter event (the docs are clear that meter events are processed asynchronously and may not immediately reflect on summaries) and surfaces only the cases where something is structurally wrong — a malformed payload, a deleted customer, an expired API key on the submitter.

The dashboard a CTO actually wants has three rows: meter events in observed state older than 5 minutes (submitter is stuck), meter events in submitted state older than 1 hour (Stripe did not confirm), and customers with unresolved drift after two reconciliation passes (something is structurally wrong). Anything else is detail. Those three numbers go to zero in normal operation, and any non-zero on any of them is a real incident.

The runbook for the "drift detected and unresolved" page is short by design. Step one: pull the ledger rows for the customer in the window. Step two: pull the meter event summary from Stripe for the same window. Step three: if the ledger is higher than Stripe, replay the missing rows with their original idempotency_keys (safe by construction). If Stripe is higher than the ledger, that is the worse case — it means events landed in Stripe that have no corresponding ledger row, which points at a code path that calls the Stripe API without going through the ledger. That is a code bug, and the fix is to find the offending call site and route it through the ledger.

One thing worth calling out about the runbook: it is short because the system does the recovery work, not the engineer. The page exists to surface structural problems that the auto-replay cannot fix on its own. If the page is firing more than once a week, the right response is not to handle each one — it is to ask what new code path is bypassing the ledger, because that is the only way Stripe ends up with rows you cannot explain. Track the page count itself as a metric, and treat a rising trend as a leading indicator of a regression somewhere in the meter event write path.

The other operational detail that matters is what to log. Every state transition on a ledger row — observed to submitted, submitted to confirmed, confirmed to reconciled — gets a structured log line with the row id, customer id, idempotency_key, and the Stripe event id where applicable. When an invoice dispute comes in three months later and finance asks "what did we submit and when," that log is the answer. Without it you are guessing from invoice line items, which by then have already been collapsed into aggregates and lost the per-event detail you need to defend the number.

What to watch in your own implementation

Open your codebase and search for direct calls to Stripe's meter events, usage records, or invoice item endpoints. For each one, answer two questions. First: is there an idempotency_key on the request? Second: does that key live on a durable row that the retry layer can re-read, or is it generated fresh on each attempt? If the answer to either is no, that is the bug. Fix the ones that POST money first.

Then search your webhook handler for any code that mutates state without first checking a processed-events table. If the dedupe check and the side effect are not in the same transaction, the dedupe is decorative — a crash between the two leaves you exactly where you would be without it. Wrap them, and add a test that fires the same event ID twice and asserts the side effect ran once.

Finally, run a one-shot reconciliation query against last month's usage. Sum your ledger by customer-meter-day. Pull the matching meter event summaries from Stripe for the same window. Anywhere the two disagree by more than rounding is a row that is either over-billed or under-billed today. That is the number to take to your team, and it is also the number that goes to zero once the pattern above is in place.

Outcomes you should expect

What this delivers

End-of-period invoices match recorded usage with no manual reconciliation tickets from finance.
Webhook handler stays idempotent under Stripe's at-least-once retry behavior over the three-day delivery window.
A scheduled job catches drift between your internal usage ledger and Stripe-reported usage before customers see it on an invoice.
Out-of-order webhook arrival no longer corrupts subscription state, because the handler is written against an event-sourced ledger instead of mutating in place.

Primary sources

By the numbers

Stripe attempts to deliver events to your destination for up to three days with an exponential back off in live mode.
Source ↗
Stripe does not guarantee the delivery of events in the order that they're generated, so destinations cannot depend on event ordering.
Source ↗
Webhook endpoints might occasionally receive the same event more than once; Stripe recommends logging processed event IDs and skipping duplicates.
Source ↗
Stripe idempotency keys persist for at least 24 hours and the idempotency layer compares parameters of repeated requests to detect accidental misuse.
Source ↗
Events are retrievable through the Retrieve Event API for 30 days, which bounds how far back a reconciliation job can replay from Stripe directly.
Source ↗
Stripe processes meter events asynchronously, so aggregated usage in summaries and on upcoming invoices may not immediately reflect recently received events.
Source ↗

Live in production today

The same engineering, shipped in production at BFEAI.

I'm co-founder & CTO of Be Found Everywhere (BFEAI), a 7-app AI SaaS platform running today. The work I deliver for clients is the work I do every week on my own platform.

Production apps

200K+

Keywords generated

1,500+

AI scans run

7,000+

Sites automated

Back to Stripe Billing Architecture

Common questions

What buyers ask before reaching out

Why does the same Stripe webhook arrive twice?

Stripe retries delivery for up to three days when your endpoint returns a non-2xx, times out, or when a network blip prevents the ACK from reaching Stripe. From your side it looks like a duplicate; from Stripe's side the first delivery was not confirmed. The fix is to dedupe on the event ID at the handler boundary, not to try to suppress retries.

Is the Stripe idempotency_key enough to prevent double-billing?

It prevents double-writes against the Stripe API when you retry the same outbound call. It does not protect against duplicate webhook deliveries, out-of-order events, or your own internal job re-running. You typically need both layers: an idempotency_key on every POST to Stripe, plus a processed-event-ID table on the inbound webhook side.

Can I rely on Stripe webhook ordering?

No. The documented behavior is that delivery order is not guaranteed; a subscription created event can arrive after the first invoice paid event. Build the handler so each event type is processed independently, and reconstruct state from the event-sourced ledger rather than assuming a sequence.

How often should the reconciliation job run?

Most teams I work with land on hourly during the day and a heavier nightly pass right before invoices finalize. The hourly pass catches drift fast enough to fix it before the customer is billed. The nightly pass is the safety net that runs against the closing window for the day and writes an audit row whether drift is found or not.

What if my usage row exists but Stripe has no record of it?

That is the most common drift case and the one the reconciliation job is designed to catch. The fix is to replay the submission with the original idempotency_key from the ledger. Because the key is the same, Stripe will either accept the missing record or return the original response if it actually did land and the loss was on the ACK path.

How far back can the reconciliation job pull from Stripe?

The Stripe events API retains events for 30 days, which is the practical ceiling for reconstructing state from Stripe alone. Anything older needs to come from your own ledger or from invoice line items. Design the job to assume 30 days as the replay window, and treat invoice line items as the source of truth for older periods.

Do I really need event sourcing for this, or can I just patch the bug?

You can patch a single bug without event sourcing. The reason teams move to an event-sourced usage ledger is that the next bug is the same shape: something corrupted state in place and now you cannot reconstruct what happened. With an append-only ledger, every reconciliation question becomes a query, and every replay becomes safe.

Ready to see if this is a fit?

A 15-minute call. No deck, no slides. We talk about what you're shipping and where engineering is the bottleneck. Either way, you walk away with a senior engineer's read on your situation.