This article was published on March 4, 2014

Open source data grows up: Choosing MySQL, NoSQL, or both

Open source data grows up: Choosing MySQL, NoSQL, or both
John Engates
Story by

John Engates

John Engates is the CTO of Rackspace Hosting and an evangelist for the open cloud. John Engates is the CTO of Rackspace Hosting and an evangelist for the open cloud.

John Engates is the CTO of Rackspace Hosting and an evangelist for the open cloud. 

Open source data has a split personality. There’s the NoSQL zealot who likes to fire off tirades against the restrictive world of relational databases, then there’s the MySQL devotee who’s a staunch defender of everything structured – with all that data living neatly in a table somewhere.

For all the rhetoric, you would think these two sides never have to get along. In fact, thousands of companies are making relational and schema-less databases play nice together every single day. It’s been that way for years.

But new technology trends tend to be polarizing. When NoSQL took off, it started to sound like a rallying call for the end of relational databases. That’s not likely to happen any time soon—and for good reason.

Enter Craigslist

Craigslist is a great example of a company seamlessly integrating structured and unstructured data retrieval. Historically, the company has used MySQL to handle the hourly onslaught of job and classified ads.

Despite the heavy workload, MySQL is easily up to that task. The need for a NoSQL approach surfaced only when archived data began to reach epic proportions. Because of regulatory requirements, Craigslist has to archive all of its historic data—even that five-year-old ad for the dingy, overpriced apartment in Austin during SXSW.

If a relational database were the only logic in play, then a schema change on the front end would have to be perpetuated to the archive. This is a risky and time-consuming prospect and it could mean downtime. Imagine updating a MySQL cluster of servers with a billion records!

Craigslist found itself with a real need for handling two different kinds of data—current vs. historical—in discrete ways. Craigslist might have turned to MongoDB to help tame the sprawl of its data, but it’s never had a problem running NoSQL right alongside MySQL. It’s simply about the right tool for the job.

Open source allies

More and more, app developers and hosting providers are realizing that NoSQL and MySQL are open source allies, not sworn enemies on different sides of the database wall. At the end of the day, data is data and it should serve the app and the user, not the technical restrictions of the backend database.

A growing number of Rackspace customers find themselves in situations like Craigslist. They built their data structures when relational databases covered all their bases and now they find themselves deep in the Age of Apps.

The time to hit a million customers has decreased from years to weeks and social sharing and real-time queries make new demands on data—and the infrastructure supporting that data. Suddenly, they’re surfacing a billion pieces of data every month.

They’re not necessarily about to rip out their MySQL database, but they are looking to augment the data engine. MongoDB, Cassandra, or Redis (and others) are sometimes integrated into the data mix for their speed and flexibility at massive scale. But these open source datastores are unlikely to be used for confidential user information or for financial records that must remain consistent at all times.

These days, it’s not uncommon for a technology company to employ both traditional, relational DBAs as well as a team of developers who use NoSQL in the apps they build. Sometimes, the same app communicates with the relational database world and the unstructured datastore world at the Web tier.

he old school DBAs and the new-generation developers who grew up coding under NoSQL have to collaborate on making decisions about deployment and architecture. (Who knows, DBAs and developers might even become friends.)

It’s also possible for these same companies to not even have a DBA and outsource the whole application and data tier to a hosting provider, in which case they’re hoping for deep expertise and teamwork across the SQL/NoSQL divide.

Which to choose? Or do you even have to choose?

Whether an application should go with a relational database or a NoSQL alternative (or both) depends, of course, on the nature of the data being generated and retrieved. And like most things in the technology world, there’s a set of tradeoffs involved in making that decision.

If scale and performance are more important than round-the-clock data consistency, then the NoSQL world is full of promising options (NoSQL relies on the BASE model—Basically Available, Soft state, Eventual consistency).

But if “always consistent” is part of the mandate, especially for confidential and financial information, then MySQL is likely to be the top pick. (MySQL relies on the ACID model—Atomicity, Consistency, Isolation, and Durability).

As open source data continues to mature on both sides of the structured database wall, we’re seeing a new breed of apps that play to the relative strengths of both the ACID and the BASE models.

Call it a hybrid approach. Sometimes those apps are designed with that balance in mind and sometimes they evolve as historical accidents, a set of adaptations to changing data demands. After all, who could have predicted the massive cascade of social sharing data even five years ago?

As usual, developers are at the bleeding edge of this kind of innovation. They push hosting providers to combine the best of both data worlds. They also, where necessary, make corrections to the trajectory of Open Source data technologies.

MariaDB, for example, is an attempt to reclaim the open source roots of MySQL after Oracle took possession of it. The developer community demands full transparency from its open source tools, including the freedom to run bug fixes against test cases.

As this hybrid approach continues into 2014 and beyond, hosting companies will only get better at supporting it. Like the media, we’ll stop pretending that the data world is an “either/or” binary equation.

It’s not too different from what’s happening in the hybrid cloud world in general. Blending the performance of dedicated hardware with the scalability of the public cloud can lead to enhanced flexibility and a best-fit solution. It’s all about the right tool for the job.

The goal of gathering and interpreting data, after all, is to capture a slice of the world as it moves swiftly by. Data, wherever it comes from, is just a window. What matters is the view on the other side.

Image credit: Shutterstock/agsandrew

Get the TNW newsletter

Get the most important tech news in your inbox each week.