A frustrated CTO at a desk, head in hands, facing a messy screen of log data, burdened by compliance and the lack of a proper tenant audit log.
Production engineering patternUpdated

Tenant Audit Log for SOC 2 and HIPAA Without the Compliance Drag

An engineering pattern for an append-only, per-tenant audit log that maps cleanly to SOC 2 CC6/CC7 evidence and HIPAA §164.312(b) audit controls — without becoming a maintenance burden.

The problem

Application logs are not audit logs

The audit log conversation usually starts when an enterprise prospect sends a security questionnaire with a question that reads, roughly, "describe the audit log capabilities of your platform, including event coverage, retention, tenant scoping, and customer-facing export." The first founder who reads it thinks: we have application logs, that's an audit log. It is not. Application logs capture stack traces and request durations; an audit log captures who did what to which regulated record, in a form that an auditor or a customer's compliance team can read back six years later.

SOC 2 Type II evidence gap

The second wave of pain shows up at the first SOC 2 Type II audit. The auditor wants evidence for CC6.1 (logical access controls) and CC7.2 (monitoring for anomalies). They ask to see who has admin access to a sample tenant, when that access was granted, and who granted it. If your audit log is "Datadog log lines that get rotated after 30 days," the answer is "we cannot show you that." The audit either fails or you spend the next month reconstructing evidence from git history and Slack messages, which is the pattern Bill keeps seeing when a founder hands him an existing codebase to harden.

HIPAA §164.312(b) PHI access trail

The third wave is HIPAA. §164.312(b) says you have to implement mechanisms that record and examine activity in systems that contain or use ePHI. The covered entity reads that as "log every PHI access." The engineer reads that as "we already log requests." Then the first BAA-required audit happens, the assessor asks for the access trail on a specific record for a specific patient on a specific date, and the gap is suddenly visible to a regulator instead of a sales prospect.

Cross-tenant leak from the export endpoint

The fourth wave is the one that actually scares CTOs: an enterprise customer asks for an export of their own audit log, and the implementation accidentally leaks another tenant's events into the CSV. That is the same shape as a cross-tenant data leak, and the consequence is the same — it ends up on Twitter, then in a security advisory, then in churn. The audit log itself becomes a leak vector if the per-tenant filter is in application code instead of at the database boundary.

The cost of building this wrong is months of remediation work at the worst possible time — right before a Series B diligence cycle, right before a healthcare integration, right after a customer-reported incident that the audit log should have caught.

Engineers collaborate on a monitor displaying abstract data flow diagrams, building a robust tenant audit log system for security compliance.

What changes for your business

The architecture is one append-only Postgres table, written in the same transaction as the side effect it records, partitioned for retention and query speed, with a per-tenant export path that enforces isolation at the database. The work happens up front in the schema and the write path; the SOC 2 and HIPAA evidence falls out of the design instead of being assembled by hand later.

Narrow audit_events schema

Start with the table. The schema is intentionally narrow — every column either appears in evidence or supports a query that does.

CREATE TABLE audit_events (
  id              uuid        NOT NULL DEFAULT gen_random_uuid(),
  tenant_id       uuid        NOT NULL,
  actor_id        uuid,                -- nullable: system events have no actor
  actor_type      text        NOT NULL,-- 'user' | 'service' | 'system'
  event_type      text        NOT NULL,-- 'auth.login' | 'phi.read' | 'admin.role_grant' | ...
  resource_type   text,                -- 'patient' | 'invoice' | 'api_key' | ...
  resource_id     text,                -- record id, NOT record contents
  action          text        NOT NULL,-- 'read' | 'list' | 'export' | 'create' | 'update' | 'delete'
  outcome         text        NOT NULL,-- 'success' | 'failure'
  request_ip      inet,
  request_id      text,                -- correlation id, ties to app logs
  context         jsonb       NOT NULL DEFAULT '{}'::jsonb,
  event_time      timestamptz NOT NULL DEFAULT now(),
  prev_hash       bytea,               -- hash chain for integrity (HIPAA 164.312(c))
  row_hash        bytea       NOT NULL,
  PRIMARY KEY (event_time, id)
) PARTITION BY RANGE (event_time);

Column intent

The columns are deliberate. tenant_id is the first filter on every query, so it lives outside context to keep the index narrow. actor_id and actor_type together answer "who," with the type column letting you distinguish a service account from a human and a system job from both. event_type is your taxonomy — design it once, hold the line, and the same query answers "every login for this tenant" or "every PHI read by this user." resource_id is a string id only, not the record itself; this is the rule that keeps HIPAA disclosures from multiplying through your audit table. context is a jsonb for the per-event-type detail (the export format, the new role, the changed permission) and is the only freeform column.

Hash-chain integrity columns

The prev_hash and row_hash columns are the integrity chain. Each row's hash is computed over the row's content concatenated with the prior row's hash within the same tenant. That gives you the HIPAA §164.312(c) "corroborate that ePHI has not been altered" property for the audit trail itself: an auditor can pick two boundary rows and prove no row between them was inserted or removed.

Monthly range partitioning

Partition by event_time on a monthly range. With seven years of retention, that is 84 partitions per tenant boundary — well within Postgres's comfort zone, and partition pruning means a query for "this tenant in March 2026" reads exactly one partition.

CREATE TABLE audit_events_2026_01 PARTITION OF audit_events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE INDEX audit_events_2026_01_tenant_time
  ON audit_events_2026_01 (tenant_id, event_time DESC);

CREATE INDEX audit_events_2026_01_tenant_event_type
  ON audit_events_2026_01 (tenant_id, event_type, event_time DESC);

Automate the next-month partition creation with a small cron — running out of partitions is the kind of outage you discover at 00:00 on the first of the month.

Write path in the same transaction as the side effect

The write path is one function called from the same transaction as the side effect. The application does not write an audit row in a separate transaction, does not write it after the response is sent, and does not write it from a background job that runs "soon." If the audit insert fails, the side effect fails. The audit row is the receipt, and you do not perform an action you cannot record.

import { createHash } from "node:crypto";
import type { Transaction } from "./db";

type AuditEventInput = {
  tenantId: string;
  actorId: string | null;
  actorType: "user" | "service" | "system";
  eventType: string;          // e.g. "phi.read"
  resourceType?: string;
  resourceId?: string;
  action: "read" | "list" | "export" | "create" | "update" | "delete";
  outcome: "success" | "failure";
  requestIp?: string;
  requestId?: string;
  context?: Record<string, unknown>;
};

export async function writeAuditEvent(
  tx: Transaction,
  input: AuditEventInput,
): Promise<void> {
  // Hash chain: read the most recent row_hash for this tenant inside the
  // same transaction so concurrent writers serialise on it.
  const prev = await tx.queryOne<{ row_hash: Buffer | null }>(
    `SELECT row_hash
       FROM audit_events
      WHERE tenant_id = $1
      ORDER BY event_time DESC, id DESC
      LIMIT 1
      FOR UPDATE`,
    [input.tenantId],
  );
  const prevHash = prev?.row_hash ?? Buffer.alloc(32); // zero-hash genesis row

  const payload = JSON.stringify({
    tenant_id: input.tenantId,
    actor_id: input.actorId,
    actor_type: input.actorType,
    event_type: input.eventType,
    resource_type: input.resourceType ?? null,
    resource_id: input.resourceId ?? null,
    action: input.action,
    outcome: input.outcome,
    context: input.context ?? {},
  });

  const rowHash = createHash("sha256")
    .update(prevHash)
    .update(payload)
    .digest();

  await tx.query(
    `INSERT INTO audit_events
       (tenant_id, actor_id, actor_type, event_type, resource_type,
        resource_id, action, outcome, request_ip, request_id,
        context, prev_hash, row_hash)
     VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13)`,
    [
      input.tenantId,
      input.actorId,
      input.actorType,
      input.eventType,
      input.resourceType ?? null,
      input.resourceId ?? null,
      input.action,
      input.outcome,
      input.requestIp ?? null,
      input.requestId ?? null,
      input.context ?? {},
      prevHash,
      rowHash,
    ],
  );
}

The FOR UPDATE on the previous row is what makes the hash chain safe under concurrent writes for the same tenant. Two simultaneous PHI reads will serialise on the tenant's last row, both will hash the same prior, and you will not get two rows with the same prev_hash (which is what a tampering check looks for).

Lock down the table at the database level. The application role that runs day-to-day traffic only needs INSERT and SELECT.

REVOKE UPDATE, DELETE, TRUNCATE ON audit_events FROM app_role;
GRANT INSERT, SELECT ON audit_events TO app_role;

That is the second layer of "append-only." Even if a code path goes wrong and tries to issue an UPDATE on an audit row, the database refuses. The only role that can purge old partitions is the maintenance role, which runs on a schedule and detaches partitions older than the retention window — it does not delete individual rows. 3)`, [ input.tenantId, input.actorId, input.actorType, input.eventType, input.resourceType ?? null, input.resourceId ?? null, input.action, input.outcome, input.requestIp ?? null, input.requestId ?? null, input.context ?? {}, prevHash, rowHash, ], ); }


The `FOR UPDATE` on the previous row is what makes the hash chain safe under concurrent writes for the same tenant. Two simultaneous PHI reads will serialise on the tenant's last row, both will hash the same prior, and you will not get two rows with the same `prev_hash` (which is what a tampering check looks for).

### Database-level append-only lockdown

Lock down the table at the database level. The application role that runs day-to-day traffic only needs INSERT and SELECT.

sql REVOKE UPDATE, DELETE, TRUNCATE ON audit_events FROM app_role; GRANT INSERT, SELECT ON audit_events TO app_role;

That is the second layer of "append-only." Even if a code path goes wrong and tries to issue an UPDATE on an audit row, the database refuses. The only role that can purge old partitions is the maintenance role, which runs on a schedule and detaches partitions older than the retention window — it does not delete individual rows.

![A confident CTO in a calm office, viewing a monitor with a clean, color-blocked dashboard of organized tenant audit log data.](https://startups.boostframe.io/service-page-photos/patterns/tenant-audit-log-for-soc2-hipaa/03-inline.jpg)

## What gets shipped

The deliverable is the table and partitioning DDL, the write helper above wired through the application's transaction layer, an event taxonomy document that lists every `event_type` the application emits (and what `context` shape each carries), a partition-rotation cron that creates the next month and detaches partitions beyond the retention horizon, a customer-facing audit export endpoint with tenant filtering enforced at the database, and a SIEM forwarder that selects the security-relevant subset and ships it to whatever the security team already uses (Splunk, Datadog SIEM, Panther, Wazuh — the forwarder does not care).

The event taxonomy is the piece teams underestimate. Without it, three engineers invent three different `event_type` strings for the same action — `"login"`, `"auth.login"`, and `"user.signin"` all show up in the same table, and the auditor's query for "all logins for tenant X" misses two thirds of them. Bill writes the taxonomy as a TypeScript union type in the audit module so the compiler refuses any string that is not in the list.

typescript export type AuditEventType = // Auth | "auth.login" | "auth.logout" | "auth.mfa_challenge" | "auth.mfa_failure" | "auth.password_reset_request" | "auth.password_reset_complete" // Tenant access | "tenant.member_invite" | "tenant.member_join" | "tenant.member_remove" | "tenant.role_grant" | "tenant.role_revoke" // Regulated record access (HIPAA-relevant if PHI) | "record.read" | "record.list" | "record.export" | "record.create" | "record.update" | "record.delete" // Admin actions | "admin.config_change" | "admin.api_key_create" | "admin.api_key_revoke" | "admin.webhook_endpoint_change" // Data export | "export.csv_download" | "export.api_bulk_read" | "export.report_generate"; ```

The customer export endpoint resolves the tenant id from the caller's JWT — not from a URL parameter, not from a request body field — and then issues a query that includes WHERE tenant_id = $1 at the database. If the application uses Postgres RLS for tenancy elsewhere, the same RLS policy applies to the audit table for the export role. The signed CSV is streamed (not built in memory) so a customer with millions of audit rows does not blow the export server's heap.

More on this

Common failure modes

The first failure mode is the separate-transaction write. The application updates the record, returns the response, and pushes the audit event to a queue for "best effort" insertion later. When the queue worker crashes or the audit insert hits a transient error, the side effect persists and the audit row does not. Six months later the auditor asks for the trail and the gap is invisible because no one logged the fact that the log row failed. Wrap the audit insert in the same transaction as the side effect. If the audit insert fails, the action fails.

The second is logging raw PHI or PII into the audit row. The engineer is being helpful — "if we put the patient's name in the audit row, debugging is easier" — and the audit table is now another copy of every PHI record the system has touched. Every query against it is a disclosure. Every backup of it is a backup of PHI. The fix is mechanical: the audit row stores the resource type and id only, not the record contents. If a security investigation needs the underlying record, the investigator joins from the audit row id back to the source-of-truth table under their own elevated credentials, and that join is itself an audit event.

The third is the per-tenant filter in application code. The export endpoint reads tenantId from the request body, the application layer adds WHERE tenant_id = $tenantId to the query, and a missing authorization check anywhere upstream lets a caller specify another tenant's id. The fix is to resolve the tenant id from the authenticated session at the edge — not from the request payload — and to enforce the filter at the database with RLS or a parameterised view that the export role cannot bypass.

The fourth is the unrotated retention purge. The team builds the table and forgets the partition-rotation job. Three years in, the table has 36 partitions, query plans get heavy, and every analyst query joins across all of them. The fix is the rotation cron from day one, even when there is only one month of data — the cron is cheap to run and impossible to bolt on cleanly after the table is big.

The fifth is forwarding the whole audit log to a SIEM. The security team gets the bill, and the bill scales with product event volume instead of with security event volume. The fix is a forwarder with a hard-coded allowlist of event_type values that go to the SIEM (auth failures, MFA failures, admin actions, exports above a size threshold) and everything else stays in Postgres where it is cheap and queryable but does not incur per-ingest cost. The auditor reads from Postgres; the security team reads alerts from the SIEM; the bill stays linear in actual security signal, not in product growth.

The sixth, which Bill has watched happen twice, is the schema-migration tampering case. An engineer needs to backfill a column on the audit table, writes an UPDATE migration, and the database refuses because the app role does not have UPDATE rights. The engineer escalates to the maintenance role, runs the UPDATE, and silently breaks every hash-chain row from that point forward. The fix is a rule: the audit table schema is additive-only. Add columns, do not rewrite rows. If a backfill is unavoidable, it gets a documented exception, a fresh hash-chain restart row noting the operation, and the auditor is told.

What to watch in your own implementation

Open your codebase and search for every place that touches a regulated record — read, list, export, mutate. For each one, answer two questions. First: does this code path call the audit write helper? Second: is that call in the same transaction as the side effect? If the answer to either is no, that is a row your auditor will not see and a customer your evidence cannot defend.

Then check the audit table's grants. Connect as the application role and try DELETE FROM audit_events WHERE tenant_id = '...' LIMIT 1. If it succeeds, the database is not enforcing append-only and the application code is the only thing standing between a bug and a tampered audit trail. Revoke UPDATE, DELETE, and TRUNCATE from that role.

Then run the export endpoint as a low-privilege user for tenant A and try to specify tenant B's id in the request. If the response contains tenant B's audit rows, the per-tenant filter is in the wrong layer. Move the tenant id resolution to the JWT, and add a regression test that fires the export with a swapped tenant id and asserts a 403 (not an empty CSV — an empty CSV looks like success to a caller who does not know what to look for).

Then count partitions. If the partition count is one (the table is not partitioned), retention will eventually become a table-rewrite project that requires downtime to fix. Add the partitioning DDL now, before the table is in the gigabytes.

Finally, pull one tenant's audit log for a recent week and read it as the auditor will. Does the trail answer "who logged in," "who accessed which regulated record," "who changed which permission," and "what was exported"? If any of those four are missing, that is the gap the auditor finds. Close it before they do.

What this looks like in production

At BFEAI, the audit table sits behind the same RLS policies that guard the production tenant boundary. Every PHI-equivalent read (in BFEAI's case, customer-owned generation history) writes one row in the same transaction as the read. The hash chain runs per tenant. The customer-facing export endpoint is one signed-URL handler that streams JSONL filtered by the JWT's tenant claim, and the SIEM forwarder ships only the security-relevant subset.

The query that runs most often is the per-tenant compliance dashboard — a customer admin looking at "who on my team did what this week." It hits one partition (the current month), filters by tenant_id, returns in tens of milliseconds. The query that runs least often but matters most is the auditor's evidence pull — a date range across a single tenant, exported as CSV. That hits one or two partitions depending on the range, and it streams without building the result in memory.

The runbook for the audit-table-related pages has three entries. Page one: "partition rotation cron has not run in 25 hours" — the next month's partition was not created, the table is about to start failing on inserts dated in the new month, and the fix is to run the rotation manually and figure out why the cron failed. Page two: "hash chain break detected" — the integrity verifier found a row where prev_hash does not match the prior row's row_hash for that tenant, which either means a row was inserted out of order (a code bug — find the offending write path) or the table was tampered with (the worse case — pull access logs for the maintenance role). Page three: "audit insert failure rate above threshold" — the application is trying to log events that are failing the schema constraint, which usually means a new code path is emitting an event_type that is not in the taxonomy. Fix the taxonomy, redeploy, the rate drops.

None of these pages fire in steady state. When they do, they point at a real problem — usually a missed deployment step, a schema drift, or a code path that bypassed the audit helper. The pattern is built so the page count itself is the leading indicator: a quiet audit subsystem is a healthy one, and any sustained noise points at something structural that needs a fix instead of a one-off recovery.

The compliance side benefits the most. SOC 2 Type II evidence collection becomes "here is the read-only auditor account, query the audit table directly." HIPAA assessments become "here is the trail for the patient and date you asked about, generated from the table in seconds." Customer security questionnaires get answered with a paragraph that describes the actual implementation instead of a marketing-shaped maybe. The audit log stops being a compliance tax and becomes a system the engineering team can point to and the security team can rely on.

Outcomes you should expect

What this delivers

  • Pass a SOC 2 Type II audit on CC6.1 and CC7.2 evidence the first time, with the auditor pulling sample log rows from your live system instead of asking for screenshots.
  • Hand a HIPAA assessor an audit log that maps cleanly to §164.312(b), retains for the full 6-year window, and proves no row was altered after the fact.
  • Export a customer's audit log on demand without leaking another tenant's events — the per-tenant filter is enforced at the database, not in application code.
  • Keep the audit table query-fast at multi-year scale by partitioning on tenant and time, so a six-year retention window does not turn into a six-year query.

Primary sources

By the numbers

  • HIPAA §164.312(b) requires covered entities to implement hardware, software, and/or procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information.

    Source ↗

  • HIPAA §164.316(b)(2)(i) requires that documentation be retained for 6 years from the date of its creation or the date when it last was in effect, whichever is later — which is the floor for how long audit records typically need to live in a HIPAA-regulated system.

    Source ↗

  • HIPAA §164.312(c) further requires policies and procedures to protect electronic protected health information from improper alteration or destruction, including electronic mechanisms to corroborate that ePHI has not been altered in an unauthorized manner — which is what makes append-only storage and integrity hashing the safe default.

    Source ↗

  • SOC 2 Common Criterion CC7.2 requires that the entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity's ability to meet its objectives — which is the criterion most directly satisfied by a complete authentication and access audit log.

    Source ↗

  • SOC 2 Common Criterion CC6.1 requires that the entity implements logical access controls to restrict access to information assets, including data, software, functions, and IT infrastructure, to authorized and authenticated users — which is what makes per-tenant filtering on the audit log itself (not just on the production data) part of the control.

    Source ↗

  • Postgres declarative partitioning supports RANGE, LIST, and HASH methods, and partition pruning excludes unneeded partitions from the query plan at both plan-time and execution-time — which is what keeps an append-only audit table query-fast at multi-year, multi-tenant scale.

    Source ↗

Live in production today

The same engineering, shipped in production at BFEAI.

I'm co-founder & CTO of Be Found Everywhere (BFEAI), a 7-app AI SaaS platform running today. The work I deliver for clients is the work I do every week on my own platform.

7

Production apps

200K+

Keywords generated

1,500+

AI scans run

7,000+

Sites automated

Common questions

What buyers ask before reaching out

Why a tenant-scoped audit log instead of just turning on Postgres logs or AWS CloudTrail?

Postgres logs and CloudTrail capture infrastructure-level activity — who ran a query, who called an API. They do not capture business-level activity — which tenant's user exported which patient record at which time. SOC 2 CC7.2 and HIPAA §164.312(b) want the business-level trail, because that is what answers 'who saw the PHI.' A tenant-scoped application audit log lives between the application and the database, captures the business intent, and is queryable per-tenant on the way out.

What events have to be in the audit log to satisfy SOC 2 and HIPAA?

Authentication events (login, logout, MFA challenges, failed attempts), access events on regulated records (read, list, export of PHI or PII), admin actions (role changes, permission grants, tenant configuration changes), and data export events (CSV downloads, API bulk reads, report generations). HIPAA cares most about PHI access. SOC 2 cares about authentication and access-control changes. The same table covers both if the event taxonomy is designed up front.

What should NOT be in the audit log?

Passwords, full credit card numbers, full social security numbers, and raw PHI itself. The audit log records that a record was accessed and by whom — it does not store another copy of the record. Logging raw PHI duplicates your compliance surface and turns every audit query into a HIPAA disclosure. Store the record id and the action; let the audit reader pull the underlying record from the source-of-truth table if they need to.

How do you keep the audit log append-only when Postgres is fundamentally mutable?

Two layers. The application layer only ever issues INSERTs against the audit table — no UPDATE, no DELETE — and code review enforces this with a lint rule on the audit module. The database layer revokes UPDATE and DELETE privileges on the audit table from the application role, so even a SQL injection in the wrong endpoint cannot tamper with prior rows. For HIPAA §164.312(c) integrity assurance, add a hash chain column where each row's hash includes the prior row's hash, and you can prove no row was inserted or removed between two boundary rows.

How do you partition the audit table without making queries slow?

Range-partition on event_time, then sub-partition or filter on tenant_id. Most audit queries are either 'this tenant in this time window' (covered by the partition pruning) or 'this tenant across all time' (covered by a tenant_id index on each partition). Postgres partition pruning excludes the unmatched partitions at plan-time, so a six-year-old audit table behaves like a one-month-old audit table for the common query shapes.

Do customers ever get to see their own audit log, or is it internal only?

Enterprise customers ask for it as part of their own SOC 2 evidence collection. Build the export endpoint up front: it accepts a tenant id (resolved from the caller's JWT, not from a URL parameter), a date range, and an event-type filter, and it streams a signed CSV or JSONL. The per-tenant filter has to be enforced at the database layer — the same Postgres RLS or explicit WHERE that protects everything else — because the consequence of leaking another tenant's audit events into a customer's export is worse than the original cross-tenant data bleed.

How long does the audit log have to be retained?

HIPAA §164.316(b)(2)(i) sets a 6-year floor from creation or last-effective-date, whichever is later. SOC 2 does not set a fixed period in the criteria, but auditors will hold you to whatever your own retention policy states, and most teams document 7 years to cover the SOC 2 evidence window for a Type II report cycle plus a safety margin. If you operate in both frameworks, design for 7 years and you have covered both.

When should the audit log ship events to a SIEM vs stay in Postgres?

Keep the full append-only ledger in Postgres because that is the source of truth for compliance evidence. Forward a filtered subset — auth failures, admin actions, suspicious access patterns — to a SIEM for real-time alerting. The SIEM is where the security team lives; the Postgres table is where the auditor lives. Forwarding everything to a SIEM costs an order of magnitude more and gives the security team an alert volume they cannot triage.

What is the most common failure mode in a tenant audit log implementation?

Logging from the application after the side effect has already happened, in a separate transaction. The side effect succeeds, the audit insert fails, and the row that the auditor needs does not exist. The fix is to write the audit row in the same transaction as the side effect, and to make the side effect fail if the audit insert fails. The audit row is the receipt — if you cannot write the receipt, you cannot perform the action.

Ready to see if this is a fit?

A 15-minute call. No deck, no slides. We talk about what you're shipping and where engineering is the bottleneck. Either way, you walk away with a senior engineer's read on your situation.