Werner Vogels on the definition of big data, and the bandwidth of FedEx boxes

Werner Vogels on the definition of big data, and the bandwidth of FedEx boxes

Today at the TNW Latin American conference in Sao Paulo, Amazon’s CTO Werner Vogels’ led the morning with a keynote on big data, and how it is best approached. We have a few juicy quotes following this post, in case you missed the talk.

A critical component to the lecture were two definitions of ‘big data.’ Big Data is a term that has been ascendant for some time, driven forward by the confluence and popularity of cloud computing, cloud storage, and increasing broadband penetration (more on that later).

Here’s Werner’s first definition of big data:

The collection and analysis of large amounts of data to create a competitive advantage.

This is easy to parse: big data is a huge bucket of information that you can poke in various ways, and extract actionable intelligence from, which can then be used to whack competitors. This specific way of construing big data doesn’t discuss how the data is in fact handled, only what is to be done with it; what can be done with it, really.

Now, let’s move on to the next. Big data is:

When your data sets become so large that you have to start innovating how to collect, store, organize, analyze, and share it.

As you can see with this second definition is how we achieve the first definition. Data in huge amounts requires new solutions, which comprise the ‘big data’ tools and services landscape.

Therefore we have two sides of big data: what you can do with it, and why you would want to deal with it at all, contrasted with how you get that data to sing the way that you want it to.

This matters as it’s critical to understand that companies working with big data can be exceptionally varied – they could work with things as disparate as compression algorithms, data delivery, distributed computing, algorithm creation, cloud storage, analytics of any variety, and ways to share data in new ways. Heck, even how to visualize it is a critical component of big data. Also, as Werner pointed out, “there is more to big data than analytics.”


Moving along, cloud computing, according to Werner, should become something cheap enough to simply use, without a care to its expense. His statement explains the sentiment well: “When you switch on the lights, you don’t think about it’s going to cost.”

Another fun quip, in regards to how it can be simpler to ship a box of discs to Amazon, for upload, rather than trying to do it yourself: “Do not underestimate the bandwidth of a FedEx box.” As we noted earlier, broadband has made interacting with the cloud possible, but when it comes to big data, even that isn’t always enough.

TNW Latin America is still going on, so poke your head in if you have a minute.

In other news, AWS only came to this region in late 2011, making Amazon almost a new player in the market here.

Read next: Apigee launches free self-service platform for enterprise-grade API management