How Hive is raising a massive army to build AI-training datasets

How Hive is raising a massive army to build AI-training datasets

The thing people forget about artificial intelligence is that it’s exactly that — artificial. Behind Facebook’s flesh-spotting algorithms and Mobileye’s pedestrian-avoiding machines are vast, complicated datasets, which are built and labeled by a vast human workforce.

Have you ever wondered where that workforce comes from? I assumed that the likes of Tesla and Google owned vast offices in Manila and Mumbai where, under the harsh glare of halogen lighting, an workforce of digital workers painstakingly label content for a few dollars a day.

The reality is that they’re just as likely working for Hive — perhaps the most important AI startup you’ve never heard of.

Based in Silicon Valley, Hive takes Amazon’s Mechanical Turk business model, and applies it to . A workforce of 700,000 people, many working for supplemental income to their existing jobs, label and annotate data to train AI models, earning a few pennies at a time. Hive claims to have paid out tens of thousands of dollars a week to their workers.

“Hive Data is built specifically for humans to label unstructured visual and audio data,” said Hive CEO Kevin Guo, speaking to TNW over email.

He explained that its workforce — which Hive calls “contributors” — can work whenever their schedules permit, and are presented with a limitless supply of tasks to do.

“Our Hive Data contributors work from home on their own hours – no set hours or minimum hours required,” he said.

“When contributors go on the platform, they see a list of all tasks available. There are usually about 20 tasks available at any time, and each task on the platform has its own onboarding lessons and practice exams that must be passed before tasks can be completed. One of our core principles that we’ve had from the beginning is always guaranteeing a limitless supply of tasks. This means that anyone that wants to do tasks will be able to do so without limit,” Guo added.

Accuracy is a concern whenever you’re dealing with a workforce that’s, by design, hyper-casual and transient. While some platforms, most notably Amazon’s Mechanical Turk, ensure discipline by letting task-creators reject and accept submissions, Hive takes a different approach.

“A common misconception is that the accuracy of our results is lower since it’s not truly managed in a physical office like other companies. We actually have proven that our accuracy is usually quite a bit higher due to the fact that we can have many workers independently do a task and agree on a result (consensus), which is only possible due to the scale of our workforce,” Guo told me.

What separates Hive from other players in the AI space is that it controls its entire technological stack, from hardware to software, to the workforce that labels the data required for training its algorithms.

Because Hive is able to train its deep-learning algorithms on datasets that are both expansive and accurate, Guo claims that they often outperform human workers. This speed and precision is essential when applying AI to areas that are inherently risky, like driverless cars.

A common trope is that AI will inevitably undermine the bedrock of the middle-class by automating away jobs that were once considered stable and safe. While this is almost certainly true, it’s ironic that most advances in AI technology are driven by human-powered accomplishments.

At one side, you’ve got the brainy PHDs diligently cranking away from an airy Silicon Valley lab. And on the other side, you’ve got a legion of people labeling pictures of lawnmowers from their front rooms in order to make a bit of extra beer money.

Life is strange like that.

Read next: Mixer may have found the secret sauce for paying streamers full-time