MinuteSort is just that, a test of how much data you can sort in a minute. Obviously, this is a critical function, as ‘big data’ becomes less a buzzword and more a reality. Technologies such as Hadoop and cloud computing have brought the need to manage huge data sets to the fore; it’s a problem that is common.
Before we get into how Microsoft managed to set the record, here’s how well it did, according to its own post on TechNet: “In raw numbers, the team’s system sorted 1401 gigabytes in just 60 seconds – using 1033 disks across 250 machines.” The company compared those hardware figures favorable to the Yahoo’s team setup, noting that its own solution employed roughly “one-sixth of the hardware resources,” while sorting about three times the data.
Interestingly, Microsoft didn’t use Hadoop, as you might have expected in its solution to the problem. Instead, a group of Microsoft Research folks created something called “Flat Datacenter Storage,” or FDS for short. The word ‘flat’ is critical. Microsoft described how FDS works in the following way:
[Microsoft Research's Jeremy] Elson compares FDS to an organizational chart. In a hierarchical company, employees report to a superior, then to another superior, and so on. In a “flat” organization, they basically report to everyone, and vice versa.
That combined with something called ‘full bisection bandwidth networks,’ and Microsoft just made data sorting news. I’ll be curious to see what happens next for FDS; Hadoop is now a commercial standard, could FDS steal some of its thunder? According to Microsoft, the technology will likely be deployed on its own projects, and it hinted at other applications.
Of course, this is all Microsoft tooting its own horn, so until we hear from those in the know and external to Redmond, salt is our friend. Still, a tripling of speed using less hardware? That’s just cool.