In 2016, Microsoft released a chatbot called Tay that fed off people’s replies to it. Within hours, the bot turned racist and the company had to pull it down. This incident remains one of the classic lessons that teach us why it’s a bad idea to train an AI using social media.
However, data scientists now have a chance to tune their AI to become aware of this kind of bias. Twitter announced last night that it’s opening up its entire archive of public tweets to researchers to use in their data projects. This version of the API was available to only premium customers till this point.
Last April, the social network made tweets related to COVID-19 available for researchers. Later in October 2020, it opened up its full archive in private beta.
The new public version of the API has a few extra privileges for researchers. Apart from the full history of public conversations, they can have a higher level of developer platform access for free, a 10 million tweet volume cap, and more precise filtering.
Currently, the platform is not allowing independent researchers to sign up for this program and qualified researchers have to use this access for strictly non-commercial purposes.
However, it’s a great opportunity for researchers to filter out AI bias in different models using Twitter’s ‘toxicity.’ People have left the platform because of the abuse they’ve received. So, it might not be the most pleasant site to be on, but it can be a great example for AI on what tonality and language it could avoid.
Recently, a study revealed that OpenAI’s celebrated GPT-3 model has a ‘constant and creative’ anti-Muslim bias. Much of the GPT-3 model is trained on public data. While the company might not have used tweets for training, public internet data akin to that tend to have different kinds of biases. Twitter’s new API platform allows researchers to detect certain kinds of inclinations in their models, and weed them out to create fairer AI.
You can get more information about Twitter’s API for research here.