Write-up published
Resolved
On Nov 25, 2021 from 11:08 AM PT to 11:29 AM PT our API, web app, and item processing pipeline were sporadically unavailable. Unfortunately, errors sent from 11:08 AM to 11:17 AM were not processed.
The root cause was an infrastructure change intended to better handle traffic spikes to our API. Unfortunately, this change caused us to briefly overload one of our application databases, causing the unavailability.
When we detected the database issues, we reverted the work and our services recovered.
To prevent this incident from recurring, we have paused the aforementioned infrastructure change work. Meanwhile, we are continuing work to scale our application databases. As mentioned in a previous postmortem, we have significantly increased resources for database work. The first database projects have been completed, but we have more projects planned for this quarter and next to continue to improve our databases.
As always, thank you for being a Rollbar customer.
Resolved
This incident has been resolved. Thank you for your patience. Please expect a post-mortem by Tuesday 5 PM PT.
Monitoring
The web app and API are functional again. The pipeline is processing a backlog of items.
Monitoring
A fix has been implemented and we are monitoring the results.
Identified
An issue with one of our application databases has caused a web app, API, and pipeline outage. We will update when the database is available again.