On a recent visit to LinkedIn’s Mountain View headquarters, I sat down with three of the company’s engineers to talk through different aspects of its business. For the first article in this behind the scenes series, we’ll hear from VP of Engineering Alex Vauthey, who heads up monetization engineering and data infrastructure at LinkedIn, about the company’s efforts to contribute to the open source community.
This transcript has been edited for clarity and length.
TNW: What is LinkedIn’s approach to open source software?
Vauthey: LinkedIn really embraces open-sourcing. I can think of three main reasons for that: first one is it’s very much in line with our culture.
One of the key dimensions of LinkedIn is transformation, of the self, company and the world. That transformation includes not just creating features, but also creating the technology to create economic opportunities for other professionals around the world.
Transformation of the company will enable LinkedIn to realize its full potential by open-sourcing software that we build. We’re actually leveraging the community of software developers that help with that. With traction, the software gets more secure, more reliable. At the end of the day you use better software to build applications.
As for the third aspect, transformation of the self, what we mean by that is we want our employees to leave LinkedIn as much better professionals than when they arrive. By being able to contribute to an open source project, you have the opportunity to get the code that you’re writing looked at by a lot of people. They see their code out there being used by people, enhanced by people.
Open source also helps the branding of our engineering team – the fact that we work on world-class technical problems, the scale of the problems we have to solve, and the complexity of the features that we’re building. Being able to showcase our technology to the world is something that hopefully is going to be attractive to world class engineers around the world, which we would love to have work for us.
I think this is the right thing to do.
A lot of our stack leverages open source software: Web containers, Hadoop, a whole host of systems that we’ve been able to use just like that because they’ve been open-sourced by people before. We believe in integrity and believe that the right thing to do is contribute to the industry.
What are some of the contributions that LinkedIn has made to the OSS community?
There are roughly 20-25 projects that we have open-sourced. One of the most famous ones is: Kafka, a distributed publish/subscribe mechanism. Kafka falls into the category of systems that allow us to move massive amounts of data at scale. It’s a distributed and scalable commit-log based event system.
This is, for example, what we use to move data from our online system, which stores the data, to our analytics data infrastructure so that we can analyze the data and feedback the online system. For example, People You May Know or People Who View Your Profile are the results of data being analyzed on a regular basis.
In order to do that, you need to move the online data to the analytics infrastructure systems to crunch these numbers. Kafka is the main mechanism we use to do that, to move this data around.
Kafka is being used at a whole host of technology companies [including Twitter, Netflix, Square, Spotify, Pinterest, Uber, Tumblr, Airbnb and Box]. It’s a top level Apache project.
There is another open source project which is getting some traction right now, called Helix. Helix is a clustering framework. When you build a distributed system, the idea is that you distribute the load across as many machines as possible. The goal of scalability is basically that you have machines and your ability to sustain the load increases as you add machines in a linear way.
That’s a concern that’s common to storage, event management, and analytics systems. There’s a whole set of concerns that are common to all the distributed systems which are about node management – how many nodes in the cluster, what happens when one fails and one has to take over, auto-scaling, full tolerance load balancing and so on.
Helix allows you to do that. It’s focusing on managing the cluster itself, keeping the cluster healthy.
Voldemort is also known in the industry. It’s a distributed key value pair system. It’s a very simple database where the only type of lookup that you can do is there’s a key and a value, but it scales linearly extremely well and can sustain massive amounts of load.
The OSS movement often has a political dimension to it. Does LinkedIn view open source as an opportunity to make the Web more open? Is that a goal?
I don’t think we want to position ourselves one way or another with regard to openness of the Web. What we want to do is stay true to our mission of creating economic opportunity for professionals around the world. In that sense, contributing open source technology to the world is completely in line with that vision.
Whether or not the Web should be entirely open or entirely based on proprietary technology, I don’t think it’s something we necessarily worry about. We embrace open source technology because we’ve been able to use it and leverage it to a great extent.
Given recent open source-related security issues like the Heartbleed bug, does open source enable software to be more secure or is there a risk that it’s actually less secure?
At the end of the day, I do believe it enables us to be more secure. The reason being that the quality of the software that is being open-sourced ends up being higher than if it was purely built as a proprietary technology.
Of course, there’s another side to that, which is that open source means that people have access to the source code and people can find vulnerabilities in the code. It also means that adoption will be very wide, so if there is an identified vulnerability, then people are going to be able to do even more damage by virtue of the fact that this technology is used widely.
On the other hand, the best security experts and people have access to the code and can contribute back and make the code more secure and reliable. I think that by virtue of open sourcing software, we end up with more secure solutions than if we were not doing that, but it’s a mixed bag, obviously.
As a publicly-traded company, is there a tension between the business needs of the company and giving back to the community when you decide which projects to open source?
We’re very comfortable open-sourcing infrastructure, especially when it solves problems that have to do with scale. There is a category of things we would never open source because we consider it part of our “secret sauce.” For example, everything that revolves around machine learning and relevance. Our work that allows us to build value and build insight-driven features to our members are some of the things we would not consider open sourcing.
And actually many of those things are not good candidates for open source because it’s unclear what value the community would derive from it. When it comes to infrastructure and solving high scalability, high availability problems for massive amounts of data, we’re very comfortable open-sourcing that.
There is some of that [tension]. When we open source a technology, there’s a set of people in the company that are responsible for making sure that none of LinkedIn’s very specific, proprietary things are being open-sourced.
Beyond the code itself, how does LinkedIn view its role within the open source community?
We welcome contributors to our projects. We don’t consider open source just as a branding thing. This is one of the things that I mentioned earlier on. We want to let the world know that we’re building world class technology, and that’s one dimension, but one of our intents is to attract contributions from other developers who see the same types of problems that we do with a different angle.
We host conferences around our open source technologies, such as Kafka and Voldemort, and we regularly have meetups here at LinkedIn about these technologies. That allows us to meet people from various companies around the industry.
We really enjoy that. We want to be part of the community of developers in the Internet industry.
Parts two and three of the series will focus on LinkedIn’s mobile strategy and its publishing platform.
Update: You can read part two here
Image credit: Getty Images
Celebrate Pride 2020 with us this month!
Why is queer representation so important? What's it like being trans in tech? How do I participate virtually? You can find all our Pride 2020 coverage here.