A startup CTO stares grimly at a monitor displaying abstract cost projections, stressed by a high saas observability bill.

Production engineering patternUpdated June 2026

SaaS Observability Without the Datadog Bill: A Seed-to-Series-A Stack

Q: Why not just buy Datadog and be done with it?

Right tool at the wrong stage. The bill outruns revenue pre-PMF.

Q: What do I actually need to see in production at seed stage?

Errors, P95/P99 latency, per-tenant volume, per-tenant cost. That's it.

Q: Can I really run this on free tiers?

Yes, for most seed shapes. Grafana Cloud free tier is the only one to watch.

Q: What is the structured logging pattern that makes this work?

JSON to stdout with tenant_id, request_id, latency, status. Queryable forever.

Q: When do I actually upgrade to Datadog?

Past Series A AND a platform engineer specifically asks. Both required.

Q: What about Honeycomb, New Relic, or Better Stack?

Same logic. Vendor-agnostic. Defer the bill until someone owns it.

Q: How do I track LLM cost per tenant without a dedicated tool?

One row per LLM call into Postgres, rolled up nightly per tenant.

Q: What if I want pre-built dashboards without standing up Grafana?

Sentry dashboards plus a tiny in-app admin page beats Grafana at the start.

How a funded seed or Series A SaaS gets the four things observability is actually for — errors, latency, per-tenant volume, per-tenant cost — without committing to a $50K+/yr vendor bill before the team can operate it.

Get a 15-min architecture read

The problem

The Datadog conversation happens at every funded startup around the same time. The CTO walks back from a vendor meeting with a quote, the team is excited because the demo dashboards look incredible, and a week later finance forwards the invoice with a question mark in the subject line.

Quote-to-invoice surprise

The number on the quote is rarely the number that lands. Once you add APM, logs, real-user monitoring, synthetic tests, and the per-host charges for every container in your cluster, a seed-stage startup with eight engineers and twenty paying customers can quote out at $40,000-$80,000 a year. That is one engineer-month of runway in tooling that, honestly, half the team will not log into after the first month.

Real production needs underneath the hype

The argument for buying it anyway is real. Production breaks. Customers complain. You want to know what your P99 latency looks like for the enterprise customer who just signed. You want to catch the exception before they email support. You want to see when your LLM costs spiked and know which tenant caused it. These are not luxuries — they are the difference between operating a SaaS and hoping a SaaS operates itself. The mistake is jumping straight to the most expensive answer for problems that have a cheaper, narrower solution at your stage.

The "do nothing" trap

The cheaper solution is not "do nothing." Teams that skip observability entirely at seed stage end up flying blind into their first real incident, then panic-buying Datadog mid-outage and paying setup tax on top of the bill. The pattern below is the middle path: assemble a real observability stack from tools you are probably already paying for, structure your logs so the data is queryable later, and reserve the Datadog conversation for the moment when you have an engineer whose actual job is to make it pay off. Until then, $0-200/month of glue covers the four things observability is actually for at seed stage.

Two engineers intently collaborate on a laptop, configuring a custom saas observability stack with structured logging on screen.

What changes for your business

The four things you need to see in production at seed-to-Series-A scale, in priority order:

Four load-bearing signals

Errors with full context — every exception, with stack trace, request payload, user ID, tenant ID, and a link back to the request that caused it.

Per-endpoint latency P95/P99 — for every route, what does the 95th and 99th percentile response time look like over the last hour, day, week.

Per-tenant request volume — for every paying customer, how many requests per hour are they making, on which routes, with what error rate.

Per-tenant LLM and infra cost — if you have AI features or expensive compute, what does each tenant cost you per day, and is that less than what they pay you per day.

Nothing else is load-bearing until you have a specific incident that demanded it. Distributed tracing, custom heatmaps, synthetic uptime monitors, real-user front-end performance — all valuable, none of them in the critical path at this stage. The trap is buying a tool that solves all of these at once when you only need the first four, and paying for the breadth you do not use.

Stack components

The stack to assemble:

Sentry (free tier) handles item 1. The free Developer plan includes 5,000 errors per month, 5 million tracing spans, 50 session replays, and a 30-day lookback, all for a single user. That single-user cap is the most common forced-upgrade trigger — once your second engineer needs to triage exceptions, you are on the Team plan at $26/month. Still trivially worth it.
Netlify or Vercel function logs handle item 2. Both platforms capture stdout per invocation with timestamps, status codes, and durations. Netlify retains function logs for at least 24 hours on lower tiers and 7 days on Pro and above. Vercel's retention works similarly. Filterable in the dashboard, queryable for the windows that matter day-to-day.
Supabase logs (or your Postgres host's equivalent) handle item 2 at the database layer. Slow queries, lock contention, connection pool exhaustion — visible without a separate APM.
A metrics table in your own Postgres handles items 3 and 4. One row per request and one row per LLM call, with the keys you need to slice by tenant. This is the piece that pays the biggest dividend later because it is YOUR data in YOUR database, queryable forever.
Grafana Cloud free tier (optional) for dashboards on top of the metric tables. 10,000 active series, 50 GB of log ingest per month, 50 GB of trace ingest per month, 14 days of retention, up to three active users. Most startups stay inside that envelope until well past Series A.

Structured logging as connective tissue

The thread that holds it all together is structured logging. Every log line is a single JSON object with the same baseline keys. Write to stdout, let the platform capture it, and ingest into Postgres or Grafana Loki when you want long-term queryability.

A calm CTO sips coffee, confidently viewing a clean, color-blocked dashboard on a monitor, showing clear saas observability insights.

What gets shipped

The structured logger is the foundation. Every log line in production carries a stable baseline so that six months from now, the question "what was P99 latency for tenant acme on route POST /api/v1/run last Tuesday" is a SQL query, not an archaeological dig.

// lib/logger.ts
type LogLevel = "debug" | "info" | "warn" | "error";

interface BaseLogFields {
  ts: string;
  level: LogLevel;
  tenant_id?: string;
  user_id?: string;
  request_id?: string;
  route?: string;
  method?: string;
  status?: number;
  latency_ms?: number;
  msg: string;
  // Anything else lives in a free-form bag, not at the top level.
  fields?: Record<string, unknown>;
  err?: { name: string; message: string; stack?: string };
}

function emit(line: BaseLogFields): void {
  // Single JSON object per line. Platforms parse this natively.
  process.stdout.write(JSON.stringify(line) + "\n");
}

export function makeLogger(ctx: Partial<BaseLogFields> = {}) {
  return {
    info(msg: string, fields?: Record<string, unknown>) {
      emit({ ts: new Date().toISOString(), level: "info", msg, ...ctx, fields });
    },
    warn(msg: string, fields?: Record<string, unknown>) {
      emit({ ts: new Date().toISOString(), level: "warn", msg, ...ctx, fields });
    },
    error(msg: string, err: Error, fields?: Record<string, unknown>) {
      emit({
        ts: new Date().toISOString(),
        level: "error",
        msg,
        ...ctx,
        fields,
        err: { name: err.name, message: err.message, stack: err.stack },
      });
    },
  };
}

The handler wrapper attaches per-request context once and threads it through. Every line a handler emits from that point on carries the request ID, tenant ID, route, and method without the handler having to remember.

// lib/withObservability.ts
import * as Sentry from "@sentry/node";
import { makeLogger } from "./logger";
import { recordRequestMetric } from "./metrics";

export function withObservability<T extends (req: Request) => Promise<Response>>(
  routeName: string,
  handler: T,
): T {
  return (async (req: Request) => {
    const startedAt = performance.now();
    const requestId = crypto.randomUUID();
    const tenantId = extractTenantId(req);   // pulled from JWT or session
    const userId = extractUserId(req);

    const log = makeLogger({
      request_id: requestId,
      tenant_id: tenantId,
      user_id: userId,
      route: routeName,
      method: req.method,
    });

    Sentry.getCurrentScope().setTags({ tenant_id: tenantId, route: routeName });
    Sentry.getCurrentScope().setUser({ id: userId });

    try {
      const res = await handler(req);
      const latency = Math.round(performance.now() - startedAt);
      log.info("request handled", { status: res.status, latency_ms: latency });

      // Async, intentionally not awaited — metric write must not slow the response path.
      void recordRequestMetric({
        tenant_id: tenantId,
        route: routeName,
        method: req.method,
        status: res.status,
        latency_ms: latency,
        occurred_at: new Date(),
      });

      return res;
    } catch (err) {
      const latency = Math.round(performance.now() - startedAt);
      Sentry.captureException(err);
      log.error("request failed", err as Error, { latency_ms: latency });
      void recordRequestMetric({
        tenant_id: tenantId,
        route: routeName,
        method: req.method,
        status: 500,
        latency_ms: latency,
        occurred_at: new Date(),
      });
      throw err;
    }
  }) as T;
}

The metric tables are where the per-tenant story lives. Two tables: one for HTTP requests, one for LLM calls. Both append-only, both indexed for the queries you will actually run.

CREATE TABLE request_metrics (
  id            bigserial PRIMARY KEY,
  tenant_id     text NOT NULL,
  route         text NOT NULL,
  method        text NOT NULL,
  status        int  NOT NULL,
  latency_ms    int  NOT NULL,
  occurred_at   timestamptz NOT NULL DEFAULT now()
);

-- Partial index on the hot recent window. Older rows are still queryable
-- via the full table scan for ad-hoc analytics.
CREATE INDEX request_metrics_recent
  ON request_metrics (tenant_id, route, occurred_at DESC)
  WHERE occurred_at > now() - interval '14 days';

CREATE TABLE llm_metrics (
  id                bigserial PRIMARY KEY,
  tenant_id         text NOT NULL,
  model             text NOT NULL,
  prompt_tokens     int  NOT NULL,
  completion_tokens int  NOT NULL,
  latency_ms        int  NOT NULL,
  cost_usd          numeric(12,6) NOT NULL,
  occurred_at       timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX llm_metrics_tenant_day
  ON llm_metrics (tenant_id, date_trunc('day', occurred_at));

-- Nightly rollup for finance queries.
CREATE TABLE tenant_usage_daily (
  tenant_id      text NOT NULL,
  day            date NOT NULL,
  request_count  bigint NOT NULL,
  error_count    bigint NOT NULL,
  p95_latency_ms int  NOT NULL,
  p99_latency_ms int  NOT NULL,
  llm_cost_usd   numeric(12,2) NOT NULL,
  PRIMARY KEY (tenant_id, day)
);

The rollup runs once a day, computes the percentiles using Postgres's percentile_disc aggregate, and writes one row per tenant per day. After that, every dashboard query hits tenant_usage_daily instead of scanning millions of raw rows. Finance gets a question like "which tenants cost us more than they pay" and the answer is a join, not a project.

The Sentry integration is the smallest piece of work and the biggest single quality-of-life gain. Wire it once in your bootstrap, set tracesSampleRate to something sane (0.1 at first, dial up if you need more visibility), tag exceptions with tenant_id and route from the handler wrapper, and let the free tier carry you for several months. The 5,000-errors-per-month cap is enough for a healthy app — if you are blowing through it, the problem is not the cap, it is the error volume.

Common failure modes

The first failure mode is logging strings instead of structured objects. Someone writes console.log("user " + userId + " hit error: " + err.message) and the platform captures it as a single text blob. Six months later when you want to filter by user_id, you are doing regex on a log archive instead of WHERE user_id = $1 on a table. The fix is a lint rule that forbids console.log in handler code and forces everything through the structured logger. Every log line is JSON, or it does not ship.

The second is writing metrics synchronously in the response path. The recordRequestMetric call in the wrapper above is intentionally void-prefixed and not awaited. If the metrics insert takes 30ms, you have added 30ms to every response in production. The metric write goes async, into a queue or a fire-and-forget promise, and if it fails it logs a warning and moves on. Observability that slows down the thing it observes is worse than no observability.

The third is the Netlify Functions 4 KB log truncation trap. Lambda compatibility mode on Netlify caps log output at 4 KB per invocation — anything beyond that is silently dropped, with only the last 4 KB retained. If your handlers log verbosely (full request bodies, full response payloads, nested debug objects), you lose data exactly when you need it most — during long, complex requests that are more likely to fail. The fix is to log lean by default and gate verbose logging behind a per-tenant or per-request debug flag.

The fourth is forgetting the single-user cap on Sentry free. Day one the founding engineer wires up Sentry under their email. Two months later the second engineer joins, cannot triage incidents because they have no account, and the team upgrades reactively in the middle of an outage. Plan for the $26/month Team plan as a Day-2 cost. It is a rounding error against the alternative of dropping issues on the floor because only one person can see them.

The fifth is letting the metric tables grow without a retention policy. request_metrics at 100 requests per second is 8.6 million rows per day. Without partitioning or a regular archive job, the table grows until query performance falls off a cliff. The fix is to partition by month (Postgres declarative partitioning works fine), keep 90 days of raw rows hot, and archive older partitions to cold storage if you need them. The tenant_usage_daily rollup is what carries the long-term reporting answers anyway.

The sixth is the tenant_id-on-every-row discipline failing silently. Engineers write a new endpoint, copy the boilerplate from an old handler, and forget to populate tenant_id in the log context because the new endpoint is unauthenticated (a public marketing page, a health check, a webhook). The logs all come through with tenant_id: null and the per-tenant queries quietly miss those events. The fix is a typed LogContext that distinguishes "no tenant for this route" (explicit) from "forgot to set tenant" (implicit) and a CI check that fails the build if the latter appears.

What this looks like in production

At BFEAI we run exactly this stack. Sentry handles the exception side at the paid Team tier (the free tier got us through the first few months). Netlify Functions logs cover request-level latency and status. A metrics schema in our Supabase Postgres carries request_metrics, llm_metrics, and a tenant_usage_daily rollup that powers the in-app admin dashboard. There is no Grafana, no Datadog, no Honeycomb. The total monthly observability bill is well under what most teams quote for a single Datadog seat.

The questions we answer from this stack in production: "what is P95 latency for tenant <X> on route <Y> over the last 24 hours" is a query against request_metrics filtered by tenant and route, computed with percentile_cont. "Which tenants ran more than $50 of LLM cost yesterday" is a query against llm_metrics grouped by tenant with a SUM(cost_usd) filter. "What error is spiking right now" is a Sentry dashboard, refreshed automatically. "Which endpoints regressed in latency this week" is a comparison of tenant_usage_daily.p95_latency_ms between this week and last week, joined on route. None of these required a third-party observability vendor.

The cost-per-tenant report is the one that pays for the whole exercise. Once a month, the rollup table gets joined against the billing table and produces a one-pager: tenant, MRR, infra cost, LLM cost, gross margin per tenant. Customers who are over-using the free tier or whose enterprise contract priced the LLM cost wrong show up immediately. The first time we ran it, two tenants accounted for 70% of LLM spend on $0 of revenue — both were trial accounts that someone had forgotten to expire. That single report paid for the metric tables for the next two years.

The Sentry-to-debugging loop is the second highest-value piece. An exception fires, Sentry tags it with tenant_id and route, the on-call engineer clicks through, sees the request payload, sees the user, sees the stack, and reproduces the bug in under five minutes. Compare against the "grep through CloudWatch / Netlify logs by timestamp and hope you find it" workflow, which is what teams without Sentry actually do. That delta — minutes versus an afternoon — is what Sentry buys you, and you get it at $0 for the first user and $26/month after that.

The thing that does NOT live in this stack: distributed tracing across many services. We have a small handful of services and a clear call graph, so tracing has not been worth the operational cost yet. If we grew to a dozen services with complex async paths, Honeycomb or a self-hosted OpenTelemetry collector would land on the roadmap. That is the inflection point — service count, not customer count — where the math on a dedicated tracing vendor starts to work.

When to actually upgrade to Datadog

The honest answer is two conditions, both required. First: past Series A, with revenue that can absorb a five-figure-monthly observability bill without it being a board topic. Second: a platform engineer or SRE on the team who specifically asks for it, with a written case for which problems it solves that the current stack does not. Either alone is insufficient.

Pre-Series A, the math does not work. The bill grows with engineers and infrastructure faster than revenue does in that phase, and the dashboards sit unused because nobody on the team has the spare cycles to build them. You pay for breadth you do not use, then you cancel after twelve months and feel bad about it.

Past Series A without a platform engineer asking, you are buying a tool you will not operate. Datadog (and Honeycomb, and New Relic) earn their keep when there is a human whose job is to make them pay off — building the dashboards, tuning the alerts, integrating with the rest of the on-call workflow. Without that human, the tool stays at the demo-quality config it shipped with, and the team continues to look at Sentry and Postgres for the answers they actually need. The bill just gets larger.

When both conditions are true, the upgrade is the right call. Distributed tracing across many services, deep APM with code-level profiling, multi-account log aggregation with sophisticated alerting — these are real problems Datadog solves well, and at the right scale they pay back. The pattern in this page is what gets you to that conversation with a clear head, on your own timeline, with the cost-per-tenant table that tells you whether you can afford it.

What to watch in your own implementation

Open your codebase and search for every place that calls console.log, console.error, or any unstructured logging primitive. Each one is a future query you cannot run. Pick the busiest handler first and route it through a structured logger with the baseline fields. Spread from there.

Then check whether tenant_id is on every log line that a handler emits. Add a typed wrapper around the request handler that requires you to declare a tenant context (or explicitly opt out for public endpoints), and let TypeScript fail the build if you forget. The cost of finding a missing tenant_id three months later is higher than the cost of the wrapper.

Pull up your Sentry account and check three things: who has access, what your error volume looks like against the 5,000-per-month free tier, and whether tenant_id is set as a tag on exceptions. If any of those needs fixing, fix them this week. Sentry is the single highest-leverage piece of the stack and it deserves an hour of your attention.

Finally, ask whether you have a metric table that can answer "what does each tenant cost us per day." If the answer is no, build one. Start with llm_metrics if you have AI features (the cost variance per tenant is highest there), or request_metrics if your infra cost is mostly compute. One table, one nightly rollup, one query. That report is the one that tells you whether your business model works at the customer-by-customer level, and it is the report that pre-empts the Datadog conversation by giving leadership a clearer picture than any vendor dashboard would.

Outcomes you should expect

What this delivers

Errors caught with full request context, user identity, and stack trace in the same week you launch, for $0/month until you exceed 5,000 errors.
Per-endpoint P95/P99 latency and per-tenant request volume queryable from your own Postgres without a separate metrics vendor.
Per-tenant LLM cost tracked at the row level so finance can answer 'which customer is unprofitable' without a spreadsheet export.
A clear, dollar-anchored upgrade trigger so the team stops debating Datadog every quarter and ships features instead.

Primary sources

By the numbers

Sentry's free Developer plan includes 5,000 errors, 5 million tracing spans, 50 session replays, and a 30-day data lookback for a single user.
Source ↗
Netlify retains function logs for at least 24 hours on lower-tier plans and extends retention to 7 days on Pro and higher plans, with filtering by request ID, log level, and date range available in the dashboard.
Source ↗
Grafana Cloud's free tier includes 10,000 active metric series, 50 GB of log ingest per month, 50 GB of trace ingest per month, and 14 days of retention across metrics, logs, and traces, with up to 3 active users.
Source ↗
Netlify Function logs have a 4 KB per-invocation cap in Lambda compatibility mode — output beyond that is truncated to the last 4 KB, so verbose logging silently loses data.
Source ↗
Sentry's free tier is capped at one user; team accounts require upgrading to a paid plan, which is the most common forced-upgrade trigger for funded startups.
Source ↗

Live in production today

The same engineering, shipped in production at BFEAI.

I'm co-founder & CTO of Be Found Everywhere (BFEAI), a 7-app AI SaaS platform running today. The work I deliver for clients is the work I do every week on my own platform.

Production apps

200K+

Keywords generated

1,500+

AI scans run

7,000+

Sites automated

Common questions

What buyers ask before reaching out

Why not just buy Datadog and be done with it?

You can, and at the right stage it is the correct answer. The wrong stage is pre-PMF or early Series A, when the bill scales with engineers and infrastructure faster than revenue does. A funded seed startup with 8 engineers and 20 customers can easily quote $40-80K/year on Datadog once you include APM, logs, real-user monitoring, and synthetic tests. That is one engineer-month of runway for a tool that solves a problem you can solve with $0-200/month of glue.

What do I actually need to see in production at seed stage?

Four things, in priority order. Errors with full context (Sentry). Per-endpoint latency P95/P99 (request logs from Netlify or Vercel). Per-tenant request volume (your own structured logs into Postgres). Per-tenant cost for LLM and infra (a small metrics table you write to from your handlers). Everything else — distributed tracing, custom dashboards, synthetic monitors — is nice-to-have until you have a specific incident that demanded it.

Can I really run this on free tiers?

For most seed-to-early-Series-A SaaS shapes, yes. Sentry free covers 5K errors/month, which is enough if you are not leaking exceptions on every request. Netlify and Vercel both include function logs on lower tiers. Supabase includes Postgres and DB logs. The only piece that often crosses a paid tier is Grafana Cloud if you want pre-built dashboards, and that free tier is generous enough (10K active series, 50 GB logs) that most startups stay inside it.

What is the structured logging pattern that makes this work?

Every log line is a JSON object with the same baseline keys: tenant_id, user_id, request_id, route, latency_ms, status, and a free-form fields object. Write to stdout. Whatever platform you run on (Netlify, Vercel, Fly, Render) captures stdout and lets you query it. Six months later when you want to ask 'what was P99 for tenant X on route Y last week,' the query is trivial because the data was sitting there the whole time.

When do I actually upgrade to Datadog?

Two conditions, both required. Past Series A AND your platform engineer asks for it by name with a written case. Either alone is not enough. Pre-Series A, the bill is wrong. Post-Series A without a platform engineer asking, you are buying a tool you will not operate. When both are true, Datadog (or Honeycomb, or New Relic) earns its keep because there is someone whose job is to make it pay off.

What about Honeycomb, New Relic, or Better Stack?

Same calculus. Honeycomb is a stronger fit than Datadog for distributed tracing on high-cardinality workloads, but it is still a vendor bill you do not need pre-PMF. Better Stack and BetterUptime are reasonable lightweight alternatives at the tier above free Sentry. The pattern is vendor-agnostic — replace 'Datadog' in any sentence here with the observability vendor you are evaluating and the logic still holds.

How do I track LLM cost per tenant without a dedicated tool?

A Postgres table with one row per LLM call: tenant_id, model, prompt_tokens, completion_tokens, latency_ms, cost_usd, occurred_at. Write the row inside the same handler that makes the LLM call. A scheduled job rolls it up daily into a tenant_usage_daily table for fast queries. Finance gets one query that answers 'which tenants cost more than they pay.' This is the table that pays back the structured logging discipline most directly.

What if I want pre-built dashboards without standing up Grafana?

Sentry's free tier includes 10 custom dashboards and covers errors + tracing. For per-tenant business metrics, write a small admin page that queries your own metric tables directly — three or four charts is enough at this stage, and you control the questions they answer. Grafana Cloud free tier becomes worth setting up once you have more than a handful of charts or want alerting on metric thresholds.

Ready to see if this is a fit?

A 15-minute call. No deck, no slides. We talk about what you're shipping and where engineering is the bottleneck. Either way, you walk away with a senior engineer's read on your situation.