Skype CIO Lars Rabbe has posted to the Skype blog detailing exactly what happened when Skype went down last week, explaining that a bug within an older version of Skype was not processing responses properly, causing the Windows clients running the affected version to crash.
When Skype went down, it was completely down for the count – millions of Skype users who rely on the service to conduct business and communicate with loved ones suddenly found themselves without service for a good few hours, some even had to wait days before they could connect again.
Skype 18.104.22.168 was the client in question – Skype for Windows (version 22.214.171.124), older versions of Skype Windows (4.0 versions), Skype for Mac, Skype for iPhone, Skype on your TV, and Skype Connect were not actually affected by the initial bug.
Due to the fact that around 50% of all Skype users worldwide were using the affected client, this resulted in almost 30% of the supernodes on the Skype network to fail, bringing down service for those running unaffected versions of Skype software.
With the supernodes down, the peer-to-peer aspect of the Skype service ceased to function as the load increased on the supernodes that remained available. The increased load triggered a failsafe in the supernode design that would shut the node down if its traffic exceeded normal limits.
As supernodes failed and traffic on the remaining nodes increased further, nearly all went down as a result of the problem.
To rectify the problem, Skype scrambled thousands of Skype instances, acting as mega-supernodes to help get the service back on its feet. Slowly but surely, as the mega-supernodes took effect, service began to come back for many users, allowing the Skype team to fix the damaged supernodes and return the network back to normal.
Moving on, Skype hopes to introduce new procedures and issue hotfixes to stop the problem occuring again in the future. Rabbe notes that Skype’s investment program has already seen the company introduce servers to help scale the platform, with more servers being added in 2011.
A harsh lesson for Skype but one that it ultimately needed to make sure its service doesn’t go down again.