
Webhook Idempotency: The Postgres Dedup-Table Pattern
An engineering pattern for the processed_events table that makes any webhook handler safe under at-least-once delivery — in Postgres, with INSERT ON CONFLICT and one atomic transaction.
The problem
Every webhook integration eventually hits the same bug. A customer reports the wrong charge, or a notification fires twice, or an audit row exists for an event that did not logically happen. You pull the request log and find the smoking gun: the same webhook ID delivered twice, two minutes apart, both with a 200 response, both running the side effect to completion.
At-least-once is the contract, not a bug
The cause is rarely your code being wrong on a single delivery. The cause is that webhook providers offer at-least-once delivery, not exactly-once. Stripe documents that they attempt to deliver events to your destination for up to three days with exponential backoff in live mode. GitHub redelivers when its delivery worker times out. Twilio retries on any non-2xx response. The provider's promise is "you will hear about this event"; the provider's promise is not "you will hear about it exactly one time."
Lost ACKs trigger redelivery
From the provider's side, the failure shape is typically the same. Your handler runs, commits the work, returns 200, and the TCP ACK gets lost on the way back. A load balancer drops the connection, a network blip eats the packet, your worker process is killed at the moment of response. The provider's delivery worker sees a timeout, marks the delivery as failed, and queues a retry. From the provider's view there is no successful attempt. From your view there is. The next time the event arrives you do not have any context that you have seen it before — unless you wrote that context down somewhere durable.
Side-effect shape determines the damage
The business consequence depends on what your handler does. For a billing webhook, the second run charges or credits a customer twice. For a notification webhook, two emails go out and the user thinks your product is broken. For an audit-log webhook, your compliance evidence is inflated with phantom rows. For an inventory webhook, your stock counts drift. None of these are bugs you can blame on the provider — Stripe, GitHub, and Twilio all document the at-least-once contract clearly and tell you what to do about it. The bug is that the handler shipped without the dedup layer that the contract requires.
Check-then-act is the wrong fix
The reason this bug ships so often is that the obvious solution does not work. "Check if you have seen this event before, and if so skip it" sounds correct and is wrong in concurrent code. Two redeliveries arriving at the same time can both pass the check before either commits the side effect, and now you are double-processing despite having a dedup table. The fix is not more code — it is pushing the uniqueness check into the database, where the only thing that can serialize concurrent inserts on the same key is the database itself.

What changes for your business
The pattern is a single Postgres table, one unique constraint, and an INSERT statement that returns whether the row was new. Three pieces of code total. The rigor is in where you put them.
processed_events table
Start with the table:
CREATE TABLE processed_events (
provider text NOT NULL,
event_id text NOT NULL,
received_at timestamptz NOT NULL DEFAULT now(),
event_type text,
payload_hash text,
PRIMARY KEY (provider, event_id)
);
CREATE INDEX processed_events_received_at_idx
ON processed_events (received_at);
The primary key on (provider, event_id) is what makes the whole pattern work. PostgreSQL's documentation is explicit that ON CONFLICT DO NOTHING avoids inserting a row as its alternative action, and the conflict target can name any unique constraint or unique index. The composite key lets one table serve every integration — Stripe events with the Stripe event ID, GitHub events with the X-GitHub-Delivery UUID, Twilio events with the MessageSid — without collisions across providers. The received_at index supports the nightly cleanup job.
INSERT ... ON CONFLICT as the atomic primitive
The dedup primitive is the INSERT itself:
INSERT INTO processed_events (provider, event_id, event_type, payload_hash)
VALUES ($1, $2, $3, $4)
ON CONFLICT (provider, event_id) DO NOTHING
RETURNING received_at;
The trick is the RETURNING clause. PostgreSQL documents that RETURNING only returns rows that were successfully inserted or updated. A row blocked by ON CONFLICT DO NOTHING returns nothing. That asymmetry — rows back means "first time seeing this", no rows back means "duplicate" — is the signal the application code branches on. There is no race window between the dedup check and the insert because there is no separate dedup check; the atomic INSERT is the check.
Handler wraps dedup and side effect in one transaction
The handler then wraps the dedup INSERT and the side effect in one transaction:
async function handleStripeWebhook(event: Stripe.Event): Promise<void> {
await db.transaction(async (tx) => {
const rows = await tx.query<{ received_at: Date }>(
`INSERT INTO processed_events
(provider, event_id, event_type, payload_hash)
VALUES ($1, $2, $3, $4)
ON CONFLICT (provider, event_id) DO NOTHING
RETURNING received_at`,
["stripe", event.id, event.type, hashPayload(event.data.object)],
);
if (rows.length === 0) {
logger.info({ event_id: event.id }, "duplicate webhook, skipping");
return;
}
switch (event.type) {
case "invoice.payment_succeeded":
await recordInvoicePayment(tx, event);
break;
case "customer.subscription.updated":
await syncSubscriptionState(tx, event);
break;
// ...
}
});
}
The single most important line is await db.transaction(...). If the dedup INSERT and the side effect run in separate transactions, the dedup is decorative. A crash between the two leaves you with a row that says "processed" and a side effect that was not executed. The redelivery will see the dedup row, treat the event as already handled, and the work is silently lost. With both in the same transaction, a crash rolls back both — the dedup row disappears, and the next redelivery picks up exactly as if nothing had happened.
Same shape across providers
GitHub and Twilio use the same pattern with different dedup columns. GitHub sends an X-GitHub-Delivery header that is a UUID stable across all retries of the same logical delivery:
async function handleGitHubWebhook(
deliveryId: string,
eventType: string,
payload: GitHubPayload,
): Promise<void> {
await db.transaction(async (tx) => {
const rows = await tx.query<{ received_at: Date }>(
`INSERT INTO processed_events
(provider, event_id, event_type, payload_hash)
VALUES ('github', $1, $2, $3)
ON CONFLICT (provider, event_id) DO NOTHING
RETURNING received_at`,
[deliveryId, eventType, hashPayload(payload)],
);
if (rows.length === 0) return;
await processGitHubEvent(tx, eventType, payload);
});
}
Twilio is the same shape with the resource's SID as the dedup column — MessageSid for SMS status callbacks, CallSid for voice events. The point is that one table and one INSERT pattern absorb every webhook integration the application ever adds, because the conflict target is parameterized by provider. New integration, new branch in the switch, no new dedup infrastructure.
Bounded retention via nightly cleanup
The cleanup job is the last piece. The processed_events table grows monotonically without it, and rows older than the longest retry window the application cares about are dead weight:
DELETE FROM processed_events
WHERE received_at < now() - interval '14 days';
Run that nightly. Stripe's three-day retry window is the longest of the common providers, so a 14-day retention gives a comfortable margin — long enough that any genuine straggler is still caught, short enough that the table stays small. Schedule the DELETE during low-traffic hours so the lock churn does not collide with handler INSERTs.
The reason the pattern works under load is that it pushes all the hard parts down to Postgres. The uniqueness constraint is enforced by a B-tree index that serializes concurrent inserts on the same key. The transaction wrap is enforced by Postgres's MVCC. The "is this a duplicate" check is the INSERT, not a separate read. None of these are things the application code has to get right — the application code just has to use the primitive correctly.
Hard parts pushed down to Postgres
The reason the pattern works under load is that it pushes all the hard parts down to Postgres. The uniqueness constraint is enforced by a B-tree index that serializes concurrent inserts on the same key. The transaction wrap is enforced by Postgres's MVCC. The "is this a duplicate" check is the INSERT, not a separate read. None of these are things the application code has to get right — the application code just has to use the primitive correctly.

Edge dedup vs idempotent handlers
There are two layers and you need both. The dedup table is the edge layer: it catches the case where the provider sends the exact same event_id twice. The handler being idempotent is the deeper layer: it catches the case where the same business outcome is represented by two distinct events with two distinct IDs.
Stripe documents an example of this exact failure mode: in some cases two separate Event objects are generated and sent for the same underlying change, and the documentation recommends identifying these by the ID of the object in data.object along with the event.type. The dedup table will not catch that — the two events have different IDs, so both pass the uniqueness check. The handler logic has to absorb it: applying the same customer.subscription.updated twice has to produce the same end state, not two updates.
The practical version of this is to write handlers against the desired end state, not against deltas. Instead of "increment the user's plan tier by one," write "set the user's plan tier to whatever Stripe says it is right now." Instead of "credit the customer $5," write "ensure a credit memo exists with this Stripe charge ID, and create one if not." The first form is fragile under any kind of retry — at-least-once delivery, internal job retry, manual replay during incident response. The second form is safe under all of them, because applying it twice is the same as applying it once.
Stripe's outbound idempotency_key is the same pattern in the opposite direction. When your code POSTs to Stripe with a key, Stripe's idempotency layer saves the resulting status code and body of the first request, and subsequent requests with the same key return the same result — including 500 errors. That protects your outbound calls. The processed_events table protects your inbound webhooks. A production billing integration uses both, because Stripe's outbound protection does nothing for inbound webhook duplicates, and the inbound table does nothing for an outbound retry storm.
Common failure modes
The check-then-act race is the first sharp edge and the one most teams ship with. Code that does SELECT 1 FROM processed_events WHERE event_id = $1 and then INSERT INTO processed_events ... in two separate statements has a race window between them. Two concurrent webhook deliveries can both run the SELECT, both see no row, both proceed to the INSERT, and one will fail on the unique constraint — but only after the other has already committed the side effect. Even worse, if the code uses two transactions, both can succeed and double-process. The fix is to delete the SELECT entirely and let the INSERT do the checking, because the database is the only thing that can serialize the two inserts atomically.
The separate-transaction failure is the second. Some implementations record the event ID in one transaction, return early if it already exists, and then run the side effect in a second transaction. This is broken in both directions. If the dedup commits and the side effect crashes, you have lost the work and the retry will skip. If the side effect commits and the dedup crashes, you have double-processed on the next retry. The transaction has to wrap both.
The third is dedupe-on-the-wrong-field. Stripe's event.id is stable across all retries of the same event. The body of the event — data.object — can change between retries (rarely, but possible for events that reference mutable objects). If the application dedupes on a hash of the body, the redelivery has a different hash and slips past. Dedupe on the provider's stable event ID. The payload_hash column in the table is for observability, not for the uniqueness check.
The fourth is dropping events older than the retention window. If the cleanup deletes rows after 24 hours and a Stripe redelivery arrives on day two, the event_id is gone from the table, the INSERT succeeds, and the handler double-processes. The retention has to be longer than the provider's documented retry window — 14 days for Stripe is the practical floor. The cost of being too generous is disk; the cost of being too aggressive is a real bug.
The fifth is the silent unique-violation error. Some ORMs translate Postgres's unique_violation error code into an exception that the application catches and logs as a warning, rather than detecting the conflict through the RETURNING clause. That is a code smell — it works, but it makes "duplicate detected" indistinguishable from "real database error" in the logs. Use ON CONFLICT DO NOTHING with RETURNING. The empty result set is the duplicate signal, and it is unambiguous.
The sixth is the handler that does external I/O before the dedup commit. If the handler sends an email, calls a third-party API, or writes to a non-transactional store before the transaction commits, the rollback cannot undo it. The dedup is in the transaction; the email is not. Two redeliveries will each send the email regardless of the dedup. The rule is: all external side effects either go through outbox-pattern rows committed in the same transaction, or they happen after the transaction commits and are themselves idempotent on the side of the receiver.
What to watch in your own implementation
Open your webhook handler and find the dedup check. If it is a SELECT followed by an INSERT, that is the bug. Replace it with INSERT ... ON CONFLICT DO NOTHING RETURNING received_at and branch on whether any rows came back. The replacement is usually three lines shorter than what it replaces and removes the race window entirely.
Next, find the transaction boundary. The dedup INSERT and the side effect have to be inside the same BEGIN/COMMIT. If they are not, the dedup is not actually doing what its presence implies. A test that simulates two concurrent deliveries of the same event ID should result in exactly one side-effect commit; if it results in two, the dedup is broken regardless of what the table looks like.
Then check what field you are deduping on. For Stripe it is event.id. For GitHub it is the X-GitHub-Delivery header value. For Twilio it is the resource SID specific to the event type. Hashing the payload is not the right answer — payloads can change between retries for mutable referenced objects, and your hash will miss those.
Look at the retention. Pull the longest documented retry window for every provider you handle. Set the cleanup cutoff to at least double that. For Stripe-only workloads, 14 days is the floor. For applications handling Stripe plus GitHub plus Twilio, 14 days still covers everyone with margin. Schedule the DELETE off-peak.
Finally, write the test that matters: fire the same event ID through the handler twice in parallel, then assert that the downstream side effect ran exactly once. The wrong way to write this test is to fire serially — that will pass even on a broken implementation that has only the SELECT-then-INSERT race. The right way is to fire two concurrent invocations with Promise.all or equivalent, then count the side-effect commits. A correct implementation will show one; a broken one will show two and you have found the bug before the customer does.
What this looks like in production
At BFEAI the processed_events table absorbs every inbound webhook the platform handles — Stripe billing events, identity provider callbacks, internal cross-service notifications. The table has three columns the application reads (provider, event_id, received_at) and two it writes for observability (event_type, payload_hash). Every handler is one transaction. The cleanup runs nightly with a 14-day retention.
The metric that matters most is the rate of dedup hits — INSERTs that return no rows because the event was already in the table. Under normal operation that rate is small but non-zero, because providers do retry occasionally. When the rate spikes, it is usually a signal that something downstream is timing out or returning 5xx, which is causing the provider to redeliver. The dedup table absorbs the redelivery storm without double-processing, but the metric is the leading indicator that something else is broken.
The alert that pages is not "dedup hit" — dedup hits are normal and healthy. The alert is "handler error rate on first-attempt deliveries." First-attempt errors are the original bug; redelivery errors are just symptoms of the original bug not being fixed yet. Filtering the alert to first-attempt traffic surfaces the real problem and suppresses the alert noise that would otherwise fire on every retry of the same broken event.
The dashboard a CTO actually wants is three numbers: handler error rate on first attempts, dedup-hit rate (with a baseline so spikes are visible), and table row count (with the cleanup job's daily delta visible). All three are cheap to compute, all three answer a real operational question, and a non-zero on any of them in the wrong direction is a real signal.
One operational detail that pays off: log the event_id and the provider on every handler invocation, and log a separate structured line on the dedup-hit branch. When a customer reports a billing problem three months later, the question "did we see this event, and how many times" is answered by a single log query. Without those logs you are reconstructing from the table itself, which by then has been cleaned up. The table is for serialization; the logs are for forensics. Both layers earn their keep the first time something goes wrong.
Outcomes you should expect
What this delivers
- Webhook handlers stop double-processing events even under Stripe's three-day retry window, GitHub redelivery, and Twilio retry policies.
- The duplicate-event check and the side effect commit atomically in the same transaction, so a crash mid-handler cannot leave dedupe-recorded-but-side-effect-skipped state.
- A scheduled cleanup job keeps the processed_events table bounded without losing the dedup window that matters for the provider's longest retry interval.
- New webhook integrations inherit the same dedup primitive instead of each integration re-inventing it ad hoc — one table, one INSERT, one branch on the RETURNING result.
Primary sources
By the numbers
Stripe attempts to deliver events to your destination for up to three days with an exponential back off in live mode.
Stripe explicitly recommends guarding against duplicated event receipts by logging the event IDs you've processed and not processing already-logged events.
Stripe does not guarantee the delivery of events in the order that they're generated, so destinations cannot depend on event ordering.
Stripe's idempotency layer saves the resulting status code and body of the first request for any given idempotency key, and subsequent requests with the same key return the same result, including 500 errors.
Stripe documents that idempotency keys can be removed from the system automatically after they're at least 24 hours old, and a new request is generated if a key is reused after the original is pruned.
PostgreSQL's ON CONFLICT DO NOTHING clause simply avoids inserting a row as its alternative action, and for ON CONFLICT DO NOTHING it is optional to specify a conflict_target; when omitted, conflicts with all usable constraints and unique indexes are handled.
PostgreSQL's RETURNING clause only returns rows that were successfully inserted or updated, which means a row blocked by ON CONFLICT DO NOTHING returns nothing — the signal the application uses to detect a duplicate.
Live in production today
The same engineering, shipped in production at BFEAI.
I'm co-founder & CTO of Be Found Everywhere (BFEAI), a 7-app AI SaaS platform running today. The work I deliver for clients is the work I do every week on my own platform.
7
Production apps
200K+
Keywords generated
1,500+
AI scans run
7,000+
Sites automated
Common questions
What buyers ask before reaching out
Why will the same webhook be delivered twice?
Webhook providers offer at-least-once delivery, not exactly-once. Your endpoint can succeed, commit, and return 200 — but if the TCP ACK fails to reach the provider because of a network blip, a load balancer reset, or a timeout in their delivery worker, they will redeliver. Stripe retries for up to three days with exponential backoff. From your side it looks like a duplicate; from the provider's side the first attempt was not confirmed.
Why is SELECT-then-INSERT not enough for dedup?
Two webhook deliveries can arrive in parallel, both run the SELECT, both see no matching row, and both proceed to INSERT and run the side effect. This is a textbook check-then-act race. The fix is to push the uniqueness check into the database itself with a UNIQUE constraint on event_id, and use INSERT ... ON CONFLICT DO NOTHING as the atomic primitive. The database is the only thing that can serialize two concurrent inserts on the same key.
Should the dedup INSERT and the side effect be in the same transaction?
Yes, and this is the single most-broken part of most implementations. If you record the event ID first, commit, and then run the side effect, a crash between the two leaves the dedup table claiming the event was processed when it was not — the redelivery will be silently swallowed and the side effect skipped. Wrap both in one transaction so they commit or roll back together.
Do I still need idempotency in the handler logic itself?
Yes. Edge dedup catches the common case of the same event ID arriving twice. But providers also document cases where two distinct events represent the same logical action — Stripe notes that two separate Event objects can be generated for the same underlying change. You need both layers: dedup on event_id at the edge, plus handler logic that is safe to re-run for a given business outcome.
How long should I keep rows in the processed_events table?
Long enough to cover the provider's longest retry window plus a comfortable margin. Stripe's three-day live retry window suggests a 7- to 14-day minimum. GitHub and Twilio have shorter windows. The cleanup job runs nightly, deletes anything older than your retention cutoff, and the table stays bounded. The risk of cleaning too early is that a stragglier redelivery sneaks past dedup; the cost of keeping forever is just disk.
What happens if the handler crashes mid-processing?
If the dedup INSERT and the side effect share one transaction, the crash rolls back both — the row disappears, the side effect was not committed, and the next redelivery picks up cleanly. If they are in separate transactions, you have a problem: either the dedup committed without the side effect (silent loss) or the side effect committed without the dedup (silent double-processing on retry). The transactional wrap is what makes the pattern actually safe.
How is this different from Stripe's idempotency_key on outbound calls?
Stripe's idempotency_key protects your outbound POSTs to Stripe — calling charges.create twice with the same key returns the cached response from the first call. The processed_events table protects your inbound handler — receiving the same event.id twice runs the side effect once. They are opposite directions of the same problem, and a production billing integration needs both.
Does this pattern work for GitHub and Twilio webhooks too?
Yes, with one change: the dedup column. GitHub sends an X-GitHub-Delivery header (a UUID) on every webhook, and Twilio sends a unique SmsSid or MessageSid depending on the resource. Same table shape, same INSERT ... ON CONFLICT, same transaction wrap — only the column you index on changes per provider. Most teams add a provider column so one table serves every integration.
Ready to see if this is a fit?
A 15-minute call. No deck, no slides. We talk about what you're shipping and where engineering is the bottleneck. Either way, you walk away with a senior engineer's read on your situation.