Today, Twitter has posted up an article on its engineering blog explaining how its in-house photo storage system called Blobstore works. Twitter built the new storage product in-house to replace Photobucket, which it partnered with in 2011, when it began allowing native photo attachments.
Blobstore was turned on in September, and Twitter now uses it to serve photos, including its new filtered ones, to all of its users. We confirmed with Twitter that this is now the only system it uses to store images locally.
F**k it, we'll do it live!
Our biggest ever edition of TNW Conference is fast approaching! Join 10,000 tech leaders this May in Amsterdam.
Twitter says that it had several goals in mind when it started to build Blobstore, including that it be low-cost, high performance and easy to operate. It wanted to serve images in the ‘low tens of milliseconds’ at a throughput of 100’s of thousands of requests a second. And obviously Twitter is scaling like crazy.
Twitter Engineering Director of Core Storage and Database Engineering Armond Bigian explains how it works:
When a user tweets a photo, we send the photo off to one of a set of Blobstore front-end servers. The front-end understands where a given photo needs to be written, and forwards it on to the servers responsible for actually storing the data. These storage servers, which we call storage nodes, write the photo to a disk and then inform a Metadata store that the image has been written and instruct it to record the information required to retrieve the photo. This Metadata store, which is a non-relational key-value store cluster with automatic multi-DC synchronization capabilities, spans across all of Twitter’s data centers providing a consistent view of the data that is in Blobstore.
The blob manager component coordinates this traffic and directs files and mapping services. It interfaces with Twitter’s queue server Kestrel to handle replicating of images and data integrity. Twitter uses ‘virtual buckets’ to store the images in various server stacks, which are built to be redundant through a library called libcrunch. Libcrunch balances the need to keep the servers running at full speed, with the need to make them redundant.
You can read the whole post here if you’re interested in the data work behind this sort of thing.
It is interesting that Twitter has taken over the entirety of its photo storing services. Twitter began using Photobucket to power its photo feature in June of 2011. At the time, Photobucket CEO Tom Munro said that Twitter’s trust of Photobucket was validation for the product’s stability. “We are the largest dedicated photo hosting and serving site out there,” said Munro, “and we’ve been going at this for a long time. Twitter was looking for stability and reliability.”
Apparently, Twitter feels that it needs a more scalable and more reliable product that it can iterate on quickly. Hence the creation of Blobstore. Twitter has likely scaled this system like mad since it switched to only allowing its own photo storage service in its official apps.
The addition of photo filters to the apps and the recent emphasis on media sharing is no doubt making building up Blobstore a huge priority.
Image Credit: Peter Macdiarmid/Getty Images