Last week, GitHub launched a new AI-powered tool called Copilot that’s meant to help developers out by suggesting snippets of code automatically.
The tool was developed in conjunction with OpenAI by training the system on publically available source code of different projects. On paper, this feels like any other AI project’s training method. But several people took to Twitter criticizing GitHub’s move and calling it a copyright violation.
“I’m leaving GitHub because copilot uses my OpenSource code for training” is such an odd move. Anyone can fork it to there and GitHub can feed OpenSource code from anywhere to it and US copyright law permits this. I’m also pretty certain we should not strengthen copyright laws …
— Armin Ronacher (@mitsuhiko) July 3, 2021
If @GitHub (Microsoft) truly believes copilot isn't infringing on anyone's work, I want to offer them a chance to prove it: I'll donate $50k to a charity of their choice (or @EFF if we can't agree) if they release a Copilot version trained solely on Windows kernel source. 1/ https://t.co/WMWD6FTcR2
— Jake Williams (@MalwareJake) July 3, 2021
However, Julia Reda — researcher and former Member of the European Parliament — has argued on her blog that GitHub’s tool doesn’t violate copyrights.
She also added that text and data mining is not against copyright laws. Plus, machine-generated work — in this case, code snippets generated by the Copilot tool — can’t be called derivative work, and is not covered under intellectual property rules:
On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either.
There’s a lot of debate going on around the world related to tweaking IP-related policies when it comes to machine-generated work, but it’ll take a while till these arguments will be put to bed. In the meantime, you’ll just have to keep tweeting out your frustrations.