GitHub Availability Report: October 2023
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
October 17 10:59 UTC (lasting 2 hours and 49 minutes)
From 10:59 UTC to 13:48 UTC on October 17, GitHub Codespaces service was degraded due to an outage in authentication. This issue impacted 67% of users over this time period, with users seeing failures to create and start their Codespaces. The regional authentication layer experienced throttling with a global third-party dependency due to increased load from onboarding a new Codespaces region. The Codespaces team mitigated manually by reducing load on the external dependency. Following the incident, the Codespaces team is actively evaluating and implementing scaling improvements to make the service more resilient to increasing demands. These include implementing regional-level caching to minimize calls to the dependency and incorporating measures to ensure the continued health of the authentication service in the event of errors.
October 25 09:13 UTC (lasting 3 hours and 27 minutes cumulatively)
On October 25 through 26, GitHub Copilot experienced multiple short and partial outages which affected code completions.
GitHub Copilot completions are currently hosted in multiple regions globally. Users are typically routed to the nearest geographic region, but may be routed to other regions when the nearest region is unhealthy. Beginning at 09:13 UTC on October 25, GitHub Copilot began experiencing partial outages of individual regions, lasting approximately 12 minutes per region. These outages were due to the nodes hosting the completion model being upgraded by an automated process, and a subset of GitHub Copilot users experienced completion errors during this timeframe. The issue was fully resolved at 02:40 UTC on October 26.
In order to prevent similar outages from happening in the future, we have taken steps to disable the automated upgrade behavior that we identified as the root cause, as well as prioritizing improvements to our global load balancing during regional outages.
Please follow our status page for real-time updates on status changes. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts
GitHub Availability Report: November 2024
In November, we experienced one incident that resulted in degraded performance across GitHub services.
The top 10 gifts for the developer in your life
Whether you’re hunting for the perfect gift for your significant other, the colleague you drew in the office gift exchange, or maybe (just maybe) even for yourself, we’ve got you covered with our top 10 gifts that any developer would love.
Congratulations to the winners of the 2024 Gaady Awards
The Gaady Awards are like the Emmy Awards for the field of digital accessibility. And, just like the Emmys, the Gaadys are a reason to celebrate! On November 21, GitHub was honored to roll out the red carpet for the accessibility community at our San Francisco headquarters.