Processing latency increase

Incident Report for Rollbar

Resolved

The pipeline is fully caught up and we are now processing and alerting on data as it streams in. There was a short window, (~3 minutes) at the beginning of the incident, (~1pm PDT) where we dropped data. Afterward, no data was lost although processing was delayed. The primary cause has been identified and linked to the remediation steps we took during yesterday's outage. More details to follow.

We have scheduled a postmortem for this incident on Monday of next week. We will update this incident with the postmortem notes next week.
Posted Apr 23, 2020 - 17:30 PDT

Monitoring

The processing pipeline is currently catching up. We have also made a change to prioritize new data. This will mean that while the system is at or over capacity, new data will be given higher priority, processed, and alerted on.
Posted Apr 23, 2020 - 15:40 PDT

Update

The pipeline processing latency has peaked at 60 minutes but decreasing now.
Posted Apr 23, 2020 - 14:42 PDT

Identified

We've identified an issue causing processing latency of approximately 25 minutes. We're working to resolve the issue.
Posted Apr 23, 2020 - 13:42 PDT
This incident affected: Processing pipeline (Core Processing Pipeline).