The billing system that works at $2,000 MRR breaks at $10,000 MRR. Not because of load — most early-stage billing systems can handle the query volume. It breaks because the edge cases that were rare enough to be invisible at small scale become common enough to matter at larger scale. A bug that affects 0.2% of invoices generates one customer complaint per 500 invoices at low volume and becomes a recurring support escalation the moment you have 50 paying customers.
What follows is a dissection of the most common invoice accuracy failure patterns in home-built billing systems, with the architectural patterns that prevent them.
The Proration Trap
Proration is the calculation of a partial-period charge when a customer changes plans mid-billing-cycle. It sounds straightforward: if a customer is on a $100/month plan and upgrades to a $200/month plan on day 15 of a 30-day month, they owe $100 for the first 15 days at the old rate and $100 for the remaining 15 days at the new rate — a total of $200 for the month. Simple enough.
The bugs emerge from the edge cases. What's the correct denominator for proration? Calendar days in the month (28, 29, 30, or 31)? A fixed 30-day month? The billing period length in seconds? Each choice produces a different number, and the customer's expectation of "fair" proration may not match your calculation method. If your terms say "daily proration" but your code calculates based on calendar hours at UTC midnight, a customer in UTC+9 whose billing period technically started at 9 AM local time may see a proration that doesn't match their intuition.
Consider a growing data infrastructure company that migrated from a $299/month plan to a $599/month plan on the 20th of a 31-day month. Their home-built system calculated proration using a fixed 30-day denominator (a common implementation shortcut): they were charged (10/30) × $299 + (20/30) × $599 ≈ $499.27. But the actual remaining days were 11/31 and 20/31, which under precise calendar-day proration would be (11/31) × $299 + (20/31) × $599 = $493.10. A $6.17 discrepancy — small enough to look like a rounding error, large enough to generate a support ticket from a finance team that audits invoices line by line.
Multiply this across dozens of mid-cycle plan changes per month at $10k MRR and you have a systemic invoice accuracy problem, not a one-off bug.
The correct approach: define a canonical proration method in your billing terms (calendar days is the standard), implement it exactly, and surface the calculation detail on the invoice so customers can verify it themselves. "Prorated from April 20 to April 30 (10 of 30 days): $99.67" — this is the invoice line item a finance team can audit.
Timezone Aggregation Bugs
Usage billing requires aggregating events across a time window — typically a calendar month. The question your system must answer precisely: "which billing period does this event belong to?" The answer depends on what timezone you use to define the period boundary.
If your billing period runs from 2025-04-01 to 2025-04-30 and you define boundaries in UTC, an event timestamped at 2025-04-30 23:45:00 UTC belongs in April. But a customer in UTC-5 (US Central) sees that event as occurring at 6:45 PM on April 30 — still in April from their perspective. That's fine. Where it breaks: an event at 2025-04-30 23:45:00 in US Central time (which is 2025-05-01 04:45:00 UTC) belongs in May if you're using UTC billing periods, but the customer experiences it as an April event. This is the timezone attribution problem: your billing period boundary and the customer's local time don't agree on which month an event falls in.
For most B2B SaaS products where customers are distributed globally, the pragmatic resolution is to define all billing periods in UTC and document this clearly. The problem is when the system is inconsistent: some events stored with local timestamps, some with UTC, and aggregation code that doesn't enforce timezone normalization before summing. This is a data quality issue that presents as a billing accuracy issue.
The test that catches this before production: generate events with timestamps within 15 minutes of a billing period boundary (both before and after, in multiple timezones) and verify they land in the correct billing period. This is a targeted regression test that most home-built billing systems don't have.
Exactly-Once Semantics in Aggregation
A metering pipeline with at-least-once delivery (the correct choice, as discussed in our pipeline architecture post) means your aggregation layer may receive duplicate events. If your aggregation job sums all received events without deduplication, you over-count usage and over-bill customers. If your deduplication is applied only at ingestion and not at aggregation time, you're relying on the ingestion layer's deduplication being 100% effective — a bet you should not make for billing-critical data.
The safe pattern: store raw events with their idempotency keys, and run aggregation against distinct event keys only. This means your aggregation query looks like:
SELECT
customer_id,
event_name,
billing_period_start,
SUM(quantity) AS total_quantity
FROM (
SELECT DISTINCT ON (idempotency_key)
customer_id,
event_name,
DATE_TRUNC('month', event_timestamp AT TIME ZONE 'UTC') AS billing_period_start,
quantity,
idempotency_key
FROM usage_events
WHERE event_timestamp BETWEEN $period_start AND $period_end
ORDER BY idempotency_key, ingested_at
) deduped
GROUP BY customer_id, event_name, billing_period_start;
The DISTINCT ON (idempotency_key) with an ORDER BY ingested_at keeps the first-received version of each event. This works correctly even if the ingestion-layer dedup had a bug and let a duplicate through — the aggregation query still produces the correct distinct count.
We're not saying ingestion-layer deduplication is unnecessary — defense in depth is appropriate here. We're saying that relying on it as your only deduplication layer means a single ingestion-side bug can silently corrupt every invoice generated after that bug was introduced.
Testing Billing Accuracy Before It Matters
A billing system that hasn't been tested against its edge cases is a billing dispute waiting to happen. The standard test coverage for a billing system — unit tests for individual functions, integration tests for the happy path — doesn't catch the bugs that matter at scale.
The test scenarios that catch real billing bugs:
- Plan change on the last day of a billing period: does the proration correctly attribute zero days to the old plan or one day depending on your proration method?
- Event arriving after billing period close: does a late-arriving event with a timestamp in the closed period get correctly attributed, or does it fall into the next period?
- Duplicate events across a billing period boundary: an event processed twice, once before and once after billing period close, should count once in the correct period.
- Customer downgrade during trial period: does the post-trial invoice correctly reflect the downgraded plan rather than the trial plan terms?
- Currency conversion at invoice generation time: if you support multi-currency, is the FX rate locked at invoice generation or at payment time? Both are defensible; only one is specified in your terms.
The $10k MRR inflection point isn't arbitrary. It's where these edge cases transition from theoretical to statistical certainties. At 5 paying customers, you might never encounter a plan change on the last day of the month. At 50, you almost certainly will. Build the test coverage for the 50-customer system when you're at 5, and the invoice accuracy will be a non-event rather than a quarterly fire drill.