Wikipedia is building an open-source search engine

There have been rumblings in the Wikimedia community since last year that a better search engine is on the way and it looks like that’s true – but not everyone’s happy.

In a $250,000 grant document from the Knight Foundation for media innovation, Wikipedia owner Wikimedia describes:

Knowledge Engine by Wikipedia will democratize the discovery of media, news and information – it will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests.

Today, commercial search engines dominate search-engine use of the Internet, and they’re employing proprietary technologies to consolidate channels of access to the Internet’s knowledge and information.

Their algorithms obscure the way the Internet’s information is collected and displayed. … Knowledge Engine by Wikipedia will be the Internet’s first transparent search engine, and the first one originated by the Wikimedia Foundation.

In the spirit of transparency, having been called out by the Wikipedia editor community for not involving them more in the grant process, the organization has tried to be more clear about what’s going on here:

We are not building Google… The Wikimedia movement’s vision is to make the sum of all human knowledge freely available to everyone. Wikipedia is our largest and most well-known project, but there are many other projects like Wikimedia Commons and Wikidata that move us towards realising our vision. These projects have millions of users every month! So, yes, if we can make a search system that’s good enough and meets the needs of our users, people absolutely would use it.

Wikimedia has a team of search engineers who are initially tasked with creating better links between all of the assets it owns so it can become a truly searchable open knowledge hub.

Wikimedia says it is adding new data sources, starting with OpenStreetMaps, as well as improving its search beyond just text, to include contextual items that wouldn’t have been easily found before.

This could include a new ‘public curation of relevance,’ versus something like keywords and links, driven by Wikimedia’s Wikidata project. But is also believed to include new, scaleable machine-driven tools to ensure more search queries are answerable.

The initial grant is for a ‘discovery’ phase, with three more planned following this if all goes well, although $250,000 is probably a pretty meager sum to create something of Google’s abilities but for the public good.

Wikipedia is also exploring whether it can be pre-loaded on new handsets in order to help it reach new audiences, which also might give it a new income stream.

In an op-ed for Wikipedia’s community newsletter The Signpost, editorial board member Andreas Kolbe remains concerned.

The march of technological progress can’t be halted. And indeed, many of us find that progress inherently exciting. The idea of positioning the Wikimedia community as the central engine driving many different types of information products and services – or at least a major component of such an engine – is likely to appeal to many Wikimedians. It would certainly keep Wikimedia relevant.

Yet there is also much to be disturbed about. Omnipresent snippets, delivered to a potential audience of billions, amplify the risk of manipulation, creating an information infrastructure that seems more vulnerable to activist influence, or indeed Gleichschaltung [totalitarian control], than conventional media. It has been established that even today, search engines would have the power to sway elections if they put their mind to it.

And there are other concerns: to what extent do Silicon Valley-facing developments like those described here, efforts to build a technologically slicker product and achieve greater market penetration, detract from other efforts that volunteers might consider more relevant to the core goal of writing an encyclopedia?

In an alternative universe, the Wikimedia Foundation might put equal focus on supporting and expanding such efforts, believing that a quality product will always have a readership. Ubiquity is not the same as quality; Gresham’s law could easily be applied to the world of information as well.

Wikipedia founder Jimmy Wales had his fingers burnt in the past with his attempt to create a for-profit search engine to try to rival Google.

But if the wider Wikimedia organization ends up becoming a proper open-source search engine for all the public information it curates, perhaps only paralleled in expansiveness by something like the Internet Archive, that will surely be a truly valuable open Web asset.

The organization is currently conducting a strategy consultation with the community, which closes today, where its priorities and ongoing issues with the transparency of decision-making, will no doubt be held to light. You certainly wouldn’t get that at Google.

Update: We reached out earlier to get a comment from Wikipedia and a spokesperson has now confirmed: “We do not have plans to build a new search engine: our objective is to improve people’s ability to find content across Wikipedia and the Wikimedia projects.”

This article has been updated to reflect the fact that it was Jimmy Wales’ Wikia venture and not Wikimedia that had previously had its metaphorical fingers burnt in building a Google search engine rival. There’s no relation between the two organisations, aside from Wales.

➤ Knowledge engine grant agreement.pdf [Wikimedia]

Story by Kirsty Styles

Reporter

Kirsty Styles is a journalist who lives in Hackney. She was previously editor at Tech City News and is now a reporter at The Next Web. She l (show all) Kirsty Styles is a journalist who lives in Hackney. She was previously editor at Tech City News and is now a reporter at The Next Web. She loves tech for good, cleantech, edtech, assistive tech, politech (?), diversity in tech.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with

Wikipedia

Wikipedia is building an open-source search engine – but it’s a little confusing

Get the TNW newsletter

Also tagged with

GM is building its own in-vehicle AI assistant because Google’s Gemini cannot access what the car knows

Oracle put its cloud rival’s AI inside its business apps, and the stock jumped 8%

AI search is fragmenting, ageing and filling with ads

Wiz’s AI bug-hunter helped find a master key to every database in Azure Cosmos DB

Google’s new AI can now walk a humanoid across the room and tidy up, feet to fingertips