A new report from Reuters has revealed how contract workers for Facebook and Instagram are looking at your private posts to help the social media platforms train their AI systems.
Reuters suggests contractors are annotating the posts based on five “dimensions”: the content of the post (selfie, food, landmarks), the occasion of the post (life event, activity), expressive elements (opinion, feelings), author’s intent (to plan an event, making a joke, inspire others) and the post’s setting (home, school, work, outdoors).
The content being captured also includes screenshots and posts with comments, at times even including user names and other sensitive information.
To sift through those posts, Facebook has tapped Indian outsourcing company Wipro for help, recruiting as many as 260 of its employees to annotate Facebook status updates and Instagram posts.
According to the report, Wipro and the social media giant have been working together since 2014.
As tech companies increasingly switch to machine learning and AI to proactively serve its customers’ needs, there’s an added incentive to better understand the different kinds of content uploaded to their platforms.
AI algorithms aren’t just known for their thirst for big data, but also for their inability to understand the intricacies of human language and a variety of recognition tasks.
For example, while it’s easy for us to understand that both “New York City” and “NYC” refer to the same place, AI algorithms might interpret them as two separate terms – unless explicitly instructed not to.
The task only gets more complex when the algorithm needs to take into account different languages, and a range of content like photos, videos, and links.
This is where data annotation comes in. Content labeling provides additional information about the data sample. This, in turn, improves the effectiveness of machine learning algorithms – whether it be natural language processing, machine translation, or image, object, and speech recognition.
By letting human reviewers label the associated information – annotate “NYC” as the city “New York City” as opposed to something meaningless and random – this supervised learning approach ensures the system can better understand your requests, and improve the service for everyone.
The practice is not necessarily nefarious. Last month, Bloomberg wrote about how thousands of Amazon employees listen to voice recordings captured in Echo speakers, transcribing and annotating them to improve the Alexa digital assistant that powers the smart speakers.
But with AI technology continuing to establish a more pervasive foothold in our daily lives, the lack of transparency in its privacy policy raises significant concerns – especially considering that most users remain unaware of the existence of such algorithms.
Even more importantly, users are not given an option to opt out of these data labeling efforts, posing larger questions about user consent. Another issue is that there is hardly any mention of why (and for how long) such data might be stored, and whether there is any danger of employee misuse.
Facebook says it has 200 such content-labeling projects globally, employing thousands of people in total. Reuters also quoted an anonymous employee working for Cognizant Technology Solutions Corp, who said “he and at least 500 colleagues look for sensitive topics or profane language in Facebook videos.”
Back in February, The Verge’s Casey Newton published an investigative report, detailing the crippling mental toll contract workers tasked with moderating content on Facebook have to deal with on a daily basis.
After watching hundreds of videos depicting emotionally taxing subject matter (sometimes violent content, sometimes pornography) on the social media platform, some Cognizant employees reportedly developed PTSD-like symptoms at the job.
Facebook, for its part, has confirmed the details of the report, adding its legal and privacy teams approve all data-labeling efforts. It further added it recently introduced an auditing system “to ensure that privacy expectations are being followed and parameters in place are working as expected.”
With the social network already facing a number of regulatory challenges across the world for its privacy missteps involving user data, the timing of the revelations couldn’t have been more unfortunate.
TNW Conference 2019 is coming, and its Future Generations track explores how emerging technology will help us achieve the 17 sustainable development goals, outlined by the UN. Find out more by clicking here.
Get the TNW newsletter
Get the most important tech news in your inbox each week.