On Monday, GitHub.com was down. While the company took quite a while to get the issue completely fixed, the service was only completely offline (Major Service Outage as opposed to Partial Service Outage) for a grand total of 19 minutes and 59 seconds. On Tuesday (today), GitHub is down again, and at the time of writing it’s already been longer than half an hour.
A quick check on downforeveryoneorjustme.com shows that GitHub is indeed offline around the world: “It’s not just you! http://github.com looks down from here.” Another site, isitdownrightnow.com, says the same: “Github.com is DOWN for everyone. It is not just you. The server is not responding…”
“The most awesome stage”
Last year, Facebook's VP of Design thought the TNW Conference main stage was the best she'd ever been on.
Over on status.github.com, we can see that today’s problem started early this morning (this log is being updated, so refresh if you want the latest):
07:27 AM PST: Processing through a queue backlog. We’ll update when we’re all caught up.
07:42 AM PST: All caught up.
08:19 AM PST: Connectivity problems. Investigating
08:25 AM PST: Investigating DB problems
08:35 AM PST: We’ve taken a bad DB down and are working to return the DB cluster to a normal state now.
08:58 AM PST: DB cluster is slowly recovering.
09:10 AM PST: Performance is still impacted such that the majority of requests are hitting unicorns. Caches are slowly warming.
09:27 AM PST: Unicorns are decreasing in frequency.
09:38 AM PST: Performance is returning to normal.
09:50 AM PST: Performance is largely back to normal. We’ll be following up with more information on this outage and our plans to resolve the issues on the blog soon.
01:56 PM PST: Issue, Repository, and User search indices may be returning incorrect results after today’s DB maintenance. We’re rebuilding the search index now.
07:09 PM PST: Search indexes are still missing some results. We’re working on backfilling through the night.
Yesterday’s log sheds some light on how long today’s issue might last:
07:05 AM PST: Investigating database problems.
07:13 AM PST: We failed over to one of our secondary DBs. Services are recovering, but slow.
07:19 AM PST: Back down. Investigating.
07:22 AM PST: Back on the primary DB and recovering slowly.
07:54 AM PST: Service is recovering. Performance will continue to be degraded for a short while.
09:29 AM PST: All systems go.
The downtime has also been confirmed by GitHub on Twitter:
We’re down for emergency db maintenance. Investigating now. Updates at status.github.com
— GitHub (@github) September 11, 2012
Update at 9:20 AM PST: The service is starting to come back now. Total downtime would be 49 minutes and 50 seconds. GitHub’s status page, however, still says “Major Service Outage.” I will update you when this changes.
Update at 9:35 AM PST: We’re at “Partial service outage” now. Things are looking up.
Update at 7:00 PM PST: We’re at “Battle station fully operational.”
Update on September 14: GitHub availability this week
Image credit: stock.xchng