Wikimedia will replace its search with Elasticsearch for beta users in February, for all users in March or April

Wikimedia will replace its search with Elasticsearch for beta users in February, for all users in March ...

Wikimedia today announced it is replacing its search feature with one provided by enterprise data search and analytics startup Elasticsearch. The non-profit will be rolling out new search infrastructure to all of its wikis, starting with beta users in February and then all users in March or April.

All Wikimedia sites currently use a home-grown search system called lucene-search-2 based on Apache Lucene that was written primarily by volunteer Robert Stojni─ç. While the company has been able to scale it very well for the past eight years or so, it became clear in early 2013 that a replacement was needed, especially since Stojni─ç was no longer around to keep it running smoothly.

HereÔÇÖs a screenshot of the new search box:


Wikimedia explained that it wanted to stop having to maintain a special-purpose open-source search system when there are two very good general-purpose open-source search systems already available: Solr and Elasticsearch, both based on Lucene as well. The company tried integrating both into MediaWiki but eventually picked Elasticsearch for the following reasons:

  • ElasticsearchÔÇÖs reference manual and contribution documentation promised an easy start and pleasant time getting changes upstream when needed to.
  • ElasticsearchÔÇÖs super expressive search API lets Wikimedia search any way needed and gives the company confidence that it can be expanded, including via expressive ad-hoc queries.
  • ElasticsearchÔÇÖs index maintenance API lets Wikimedia maintain the index right from its MediaWiki extension, so itÔÇÖs easier to deploy and test, and should be easier for MediaWiki users outside Wikimedia to use. At the time of the choice, SolrÔÇÖs schema API was read-only.
  • Rack awareness, automatic shard rebalancing, statistics exposed over HTTP, preference for JSON and YML over XML, and first-party Debian packages were also nice.

Wikimedia has written a new extension called CirrusSearch to provide the integration to MediaWiki. It is mostly backwards-compatible with the current search, although it canÔÇÖt handle text inside templates. That being said, updates are reflected in search results usually within seconds for single page edits, pages marked as higher or lower quality are reflected in search results, and a few new ÔÇťexpertÔÇŁ options have been added (check out the full documentation here).

Read next: Microsoft launches Bing Awards to track ceremonies like the Golden Globes, the Grammy's, and the Oscars

Shh. Here's some distraction