Started
Monday, March 2, 2026
11:10 PM UTC
Duration
6h 43m
total
Resolved
05:54 AM UTC
Tuesday, March 3, 2026
Incident update
Between March 2, 21:42 UTC and March 3, 05:54 UTC project board updates, including adding new issues, PRs, and draft items to boards, were delayed from 30 minutes to over 2 hours, as a large backlog of messages accumulated in the Projects data denormalization pipeline.<br /><br />The incident was caused by an anomalously large event that required longer processing time than expected. Processing this message exceeded the Kafka consumer heartbeat timeout, triggering repeated consumer group rebalances. As a result, the consumer group was unable to make forward progress, creating head-of-line blocking that delayed processing of subsequent project board updates.<br /><br />We mitigated the issue by deploying a targeted fix that safely bypassed the offending message and allowed normal message consumption to resume. Consumer group stability recovered at 04:10 UTC, after which the backlog began draining. All queued messages were fully processed by 05:53 UTC, returning project board updates to normal processing latency.<br /><br />We have identified several follow-up improvements to reduce the likelihood and impact of similar incidents in the future, including improved monitoring and alerting, as well as introducing limits for unusually large project events.
GitHub Copilot is back online
View current status and uptime history