“Twitter is real-time, so our search engine must be too” says Twitter on its engineering blog today, revealing that since launching the new Twitter.com, the service’s search functionality – which we pointed out earlier has a few new features – has actually been re-engineered from the ground up.
Twitter has finally left the old Summize architecture out to dry (it started working on the new architecture 6 months ago), and instead has built the new search on the open source Lucene platform, with some of its own enhancements that lead to:
- significantly improved garbage collection performance
- lock-free data structures and algorithms
- posting lists, that are traversable in reverse order
- efficient early query termination
Twitter also went into the volume of tweets that it hopes to handle, saying that at present the architecture is only at 5% capacity and that the new indexer can index 50 times the number of tweets per second than the system is currently getting:
“Our demands on the new system are immense: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines. As we want the new system to last for several years, the goal was to support at least an order of magnitude more load.”
Twitter also says that users should notice that the index is larger and twice as long (we’re assuming that means time-wise), but no slower. Also, Twitter says that “the new system is extremely versatile and extensible, which will allow us to build cool new features faster and better.” Will we be seeing a new Search API perhaps? (Update: see below, already included)
Obviously, Twitter has finally answered our question about what they’ve been doing with search.
(Note: It’s a little unclear from the blog post if the new search is on the old Twitter.com – we’ll email Twitter to ask)
Update: Twitter got back to us and says that:
“The new backend is being used for all Twitter search, including integrated search on new and old Twitter and also the external search API.”