This article was published on March 17, 2014

Mozilla Science Lab, GitHub and Figshare team up to fix the citation of code in academia

Mozilla Science Lab, GitHub and Figshare team up to fix the citation of code in academia
Nick Summers
Story by

Nick Summers

Nick Summers is a technology journalist for The Next Web. He writes on all sorts of topics, although he has a passion for gadgets, apps and Nick Summers is a technology journalist for The Next Web. He writes on all sorts of topics, although he has a passion for gadgets, apps and video games in particular. You can reach him on Twitter, circle him on Google+ and connect with him on LinkedIn.

Academia has a problem. Research is becoming increasingly computational and data-driven, but the traditional paper and scientific journal has barely changed to accommodate this growing form of analysis. The current referencing structure makes it difficult for anyone to reproduce the results in a paper, either to check findings or build upon their results. In addition, scientists that generate code for middle-author contributions struggle to get the credit they deserve.

The Mozilla Science Lab, GitHub and Figshare – a repository where academics can upload, share and cite their research materials – is starting to tackle the problem. The trio have developed a system so researchers can easily sync their GitHub releases with a Figshare account. It creates a Digital Object Identifier (DOI) automatically, which can then be referenced and checked by other people.

The advantages over simply linking to GitHub are twofold. For one, the DOI points to the synced release on Figshare, so the data won’t be affected if the original GitHub repository is updated. The page on GitHub is still accessible for anyone who wants to review the project’s development, but this approach ensures the code referenced in a paper can be easily reviewed.

For another, the DOI is a persistent link. Broken links are a growing problem for academia, as link structures are changed and online content is edited. “If your persistent link is pointing to something on Figshare, which is ‘this GitHub repository at that version, at that release,’ then even if the GitHub repository changes or Figshare changes its link structure, that DOI will always point to that object,” Mark Hahnel, founder of Figshare said.

Academics will be able to leverage the GitHub and Figshare integration in two different ways. A new area of the Figshare site will give researchers the ability to pull releases in from a connected account. Mozilla, meanwhile, is offering a bookmarklet or browser extension which can be activated at any GitHub repository. Select it to build a DOI, add in any other information for an accurate, well-formed citation and then it’ll be pushed to Figshare with a redirect going back and forth between the two sites.


“While it is possible to cite code and influence that in your paper otherwise, having a more explicit means of weaving that into a system that – for hundreds of years has been trusted in terms of recognizing it as a citable object – is something that plays on the social characteristics of research and credit and the understandings of those systems, rather than being anything too laborious on the technical side,” Kaitlin Thaney, director for the Mozilla Science Lab said.

In short, the partners have created a brokerage point between GitHub and Figshare, utilizing their APIs and developing systems so that any GitHib repository can be processed and received as a package.

Mozilla will offer the browser extensions through a dedicated webpage, which will also have a text field where academics can manually submit a link to test out the integration. The nonprofit also plans to offer documentation on the site for anyone that wants to track the development of the project.


The collaboration between the Mozilla Science Lab, GitHub and Figshare is far from over. Hahnel described the new integration as an “open, iterative process,” and said it’s just the first example of how academia can cope with code and scripture’s growing influence.

While GitHub is one of the most prolific platforms for hosting and sharing data, it’s possible that other coding repositories could implement the same system with an open API. “It would be fantastic – and we’re starting to see little bits of it – for other repositories to follow suit and use some of the code we’ve made available, and some of the methodologies and best practise, to provide users with an increasing amount of choice,” Thaney said. “The more people who are using this, the better.

Regardless, the ability to reference code effectively is a significant development. If the syncing helps academics to verify findings, improve research methods and expand samples, that’s a huge victory. “Having an archived version of a snapshot, of a moment in time for a code base that could be used by researchers to better build on the work that’s been published is a net win,” she added.

Read More: Code as a research object: a new project / Working with Github and Mozilla to enable ‘Code as a Research Output’

Image Credit: Joe Raedle/Getty Images

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with