Opinion, advice, and analysis by the TNW community

Before developers build cathedrals, teach them to build sheds

Delusions of infrastructure grandeur

Untitled design(41)
Deepak Giridharagopal
Story by
Deepak Giridharagopal

CTO, PuppetDeepak is CTO and Chief Architect at Puppet. Deepak guides Puppet’s technical development and has played an integral role in every version of Puppet (open source) and Puppet Enterprise shipped since j… (show all) Deepak is CTO and Chief Architect at Puppet. Deepak guides Puppet’s technical development and has played an integral role in every version of Puppet (open source) and Puppet Enterprise shipped since joining the company in 2011. Over his tenure, Deepak has overseen development of every major version of Puppet’s core projects, including the Puppet language, Facter, PuppetDB, Puppet Server, and major features such as Puppet Application Orchestration. Overall, his work helps Puppet achieve the massive increases in performance and scale that Puppet users, including Nike, Uber and Wells Fargo.

grim_radical

It‘s a truth universally acknowledged that managing fewer things is easier than managing lots of things. Yet, why do so many of us in tech exalt “scale” as a paramount virtue? The cloud-native arena is a particularly interesting focal point for this exact debate.

The cloud-native community is disrupting many long-held technological conventions, making us rethink how we should build the systems of tomorrow. However, many of the tools, platforms, and practices coming out of that community have been extracted from the largest technology companies on Earth. These companies dominate the cloud-native computing landscape: its technology, its evangelism, and its revenue.

It shouldn’t therefore surprise anybody to find that cloud-native architectures introduce a ton of new complexity to the uninitiated (see Conway’s Law). Everything is built and packaged up as containers, everything is scaled-out, everything is distributed, with radically different ways to deploy, operate, debug, and optimize the system. This is why platforms like Kubernetes are so critical to managing it all.

But Kubernetes wasn’t designed for the masses. It came from Google, designed by Google engineers to help other Google engineers solve mostly Google-scale problems. If you have lots of overlap on that particular Venn diagram, then it’s a clear, great choice. But what about those who don’t?

Simplicity is always in fashion

The first question you should ask yourself is what your needs truly are. Do your apps genuinely need a massive level of scale to succeed, with all the complexity that implies? Do you need 100 servers when five powerful ones would do? Do you need to break up your app into microservices, or would refactoring and tweaking your monolith suffice? Do you need Kubernetes, or would a PaaS work? Do you have the people and skills on hand to make any of these initiatives succeed?

Infrastructure shouldn’t exist just to exist; it exists to run something useful on top of it. Scale is a means to an end. Taking a step back and understanding what your applications genuinely need to thrive, and what the tradeoffs are with each possible approach, is absolutely essential.

Why orchestration is key to limiting complexities

In the cloud-native world, nearly every single task involves touching more than one “target.” Higher-level abstractions make many things easier, but their inherently distributed nature makes many things more involved. The question becomes less about “what” is being managed and more about “how” to orchestrate an activity across lots of different domains. For example: container platforms, build tooling, storage, networking, databases, monitoring, third party ticketing and deployment systems, etc.

IT teams need to focus on finding orchestration tools that integrate with the things they have, cloud-native or not. Breadth of automation is critical; it’s the foundation upon which you can solve all kinds of higher-level problems. And once you get to a certain level of complexity, automation becomes non-negotiable.

Day 2 and beyond

It’s easy to focus on the architectural and deployment benefits of cloud-native infrastructure, yet forget that it’s only after you’ve deployed your application that its life truly begins.

Provisioning tools are great for handling Day 1 of your application’s life. But what about Day 2, and beyond? How do you reconfigure your application? How do you deploy a new version? How do you handle security breaches? How do you make changes in third party services your app relies upon?

Platforms like Kubernetes offer some really nice primitives for some of these issues. But they may not capture all the nuances of how your particular application needs to be operated, and they may not even apply to services running outside the platform (e.g. third party logging, monitoring, or networking). These platforms can do a lot, but they can’t magically make your applications manage themselves.

As the cloud-native movement puts more of the application stack in the hands of developers to control, we’d all benefit from learning from the problems operations personnel have dealt with for years.  

Published May 9, 2019 — 09:00 UTC