It’s time to use all of Twitter’s archives to teach AI about bias

In 2016, Microsoft released a chatbot called Tay that fed off people’s replies to it. Within hours, the bot turned racist and the company had to pull it down. This incident remains one of the classic lessons that teach us why it’s a bad idea to train an AI using social media.

However, data scientists now have a chance to tune their AI to become aware of this kind of bias. Twitter announced last night that it’s opening up its entire archive of public tweets to researchers to use in their data projects. This version of the API was available to only premium customers till this point.

Last April, the social network made tweets related to COVID-19 available for researchers. Later in October 2020, it opened up its full archive in private beta.

The new public version of the API has a few extra privileges for researchers. Apart from the full history of public conversations, they can have a higher level of developer platform access for free, a 10 million tweet volume cap, and more precise filtering.

Currently, the platform is not allowing independent researchers to sign up for this program and qualified researchers have to use this access for strictly non-commercial purposes.

However, it’s a great opportunity for researchers to filter out AI bias in different models using Twitter’s ‘toxicity.’ People have left the platform because of the abuse they’ve received. So, it might not be the most pleasant site to be on, but it can be a great example for AI on what tonality and language it could avoid.

Recently, a study revealed that OpenAI’s celebrated GPT-3 model has a ‘constant and creative’ anti-Muslim bias. Much of the GPT-3 model is trained on public data. While the company might not have used tweets for training, public internet data akin to that tend to have different kinds of biases. Twitter’s new API platform allows researchers to detect certain kinds of inclinations in their models, and weed them out to create fairer AI.

You can get more information about Twitter’s API for research here.

Story by Ivan Mehta

Ivan covers Big Tech, India, policy, AI, security, platforms, and apps for TNW. That's one heck of a mixed bag. He likes to say "Bleh." Ivan covers Big Tech, India, policy, AI, security, platforms, and apps for TNW. That's one heck of a mixed bag. He likes to say "Bleh."

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with

Twitter

It’s time to use all of Twitter’s archives to teach AI about bias

Get the TNW newsletter

Also tagged with

Fusion’s best-funded bet raised another $1bn, and hired the banker who took Moderna public

Everyone else is shrinking satellites. K2 raised $500m to build giant ones, and put AI in orbit.

The chips that wire GPUs together are the new prize. Xsight raised $300m to sell an open one.

China threatens ‘resolute’ retaliation over the US robot ban, and it holds the rare-earth card

Ex-Anduril founders raise $30mn to build Europe’s synthetic battlefield for defence AI