Web app, API, and pipeline outage

Incident Report for Rollbar

Postmortem

On Nov 25, 2021 from 11:08 AM PT to 11:29 AM PT our API, web app, and item processing pipeline were sporadically unavailable. Unfortunately, errors sent from 11:08 AM to 11:17 AM were not processed.

The root cause was an infrastructure change intended to better handle traffic spikes to our API. Unfortunately, this change caused us to briefly overload one of our application databases, causing the unavailability.

When we detected the database issues, we reverted the work and our services recovered.

To prevent this incident from recurring, we have paused the aforementioned infrastructure change work. Meanwhile, we are continuing work to scale our application databases. As mentioned in a previous postmortem, we have significantly increased resources for database work. The first database projects have been completed, but we have more projects planned for this quarter and next to continue to improve our databases.

As always, thank you for being a Rollbar customer.

Posted Dec 01, 2021 - 13:37 PST

Resolved

This incident has been resolved. Thank you for your patience. Please expect a post-mortem by Tuesday 5 PM PT.
Posted Nov 25, 2021 - 11:51 PST

Update

The web app and API are functional again. The pipeline is processing a backlog of items.
Posted Nov 25, 2021 - 11:34 PST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Nov 25, 2021 - 11:19 PST

Identified

An issue with one of our application databases has caused a web app, API, and pipeline outage. We will update when the database is available again.
Posted Nov 25, 2021 - 11:15 PST
This incident affected: Web App (rollbar.com), API Tier (api.rollbar.com) and Processing pipeline (Core Processing Pipeline, iOS Symbolication processing pipeline).