API Performance Degradation

Incident Report for scalr.io

Postmortem

Summary

A backend optimization deployed on April 24 inadvertently triggered a database performance issue that caused a platform-wide outage. The change has been fully reverted and the platform is stable. We are addressing the underlying query design before re-attempting the optimization.

What Happened

The optimization changed how policy code is delivered during runs, moving from an inline payload to a download from blob storage. This introduced an additional authorization check on each policy download that was not present before.

That authorization check relied on an existing database query with significant hidden complexity: under normal conditions it goes unnoticed, but at production policy-check volume it joined approximately 2 million rows per call, taking around 25 seconds to complete. The increased call frequency exposed this latency, drove the database to 100% CPU utilization, and exhausted the connection pool, making the platform unavailable.

Resolution

The optimization was fully reverted, restoring normal platform behavior. No data was lost or corrupted.

What We're Doing Next

The authorization query is being redesigned to use a direct indexed lookup, which eliminates the row-scan behavior that caused the spike. The optimization will not be re-released until this redesign is complete and validated.

We apologize for the disruption. If you have questions or are still experiencing issues, please contact our support team.

Posted May 01, 2026 - 14:40 UTC

Resolved

This incident has been resolved. If you cannot access your account, please clear your browser cache.
Posted Apr 28, 2026 - 09:52 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Apr 28, 2026 - 09:21 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Apr 28, 2026 - 09:18 UTC

Update

We are continuing to investigate this issue.
Posted Apr 28, 2026 - 09:08 UTC

Update

We are continuing to investigate this issue.
Posted Apr 28, 2026 - 09:07 UTC

Update

We are continuing to investigate this issue.
Posted Apr 28, 2026 - 09:00 UTC

Investigating

We are currently investigating this issue.
Posted Apr 28, 2026 - 08:53 UTC
This incident affected: Scalr Platform and Scalr Worker.