The YCombinator and Khosla Ventures-backed GitLab is hugely popular among developers, who appreciate the fact that it’s as close as you’ll get to an all-in-one solution. The service includes everything a developer could possibly need over the course of a project. At the core is a Git-based version control system, which is paired with helpful extras, like an IDE and collaboration tools.
As a result, a lot of companies depend on it, ranging from smaller startups and individual developers, to larger enterprises like Intel and Red Hat. GitLab is also a tool we use use internally at The Next Web.
Yesterday, GitLab announced that it was doing some emergency database maintenance. Unfortunately, it didn’t go to plan.
We are performing emergency database maintenance, https://t.co/r11UmmDLDE will be taken offline
— GitLab.com Status (@gitlabstatus) January 31, 2017
we are experiencing issues with our production database and are working to recover
— GitLab.com Status (@gitlabstatus) February 1, 2017
We accidentally deleted production data and might have to restore from backup. Google Doc with live notes https://t.co/EVRbHzYlk8
— GitLab.com Status (@gitlabstatus) February 1, 2017
Whoops.
GitLab was quick to assure customers that no Git commits had been lost. Just things like merge requests and issue posts. Although given the often detailed nature of issue notifications, and the fact that people generally write them through the web browser instead of through, say, Microsoft Word where they can save an independent copy, this is just as bad.
Right now, GitLab is trying to restore from their backups. Given the size of its database, this is a long process.
We are 60% done with the database copy
— GitLab.com Status (@gitlabstatus) February 1, 2017
Snapshots are taken every 24 hours, and the data loss occurred six hours after the last one was taken. As a result, six hours of data has been lost, perhaps permanently. Predictably, a lot of people are frustrated.
@gitlabstatus Come on guys! What's the ETA?
— Consigliere (@clthck) February 1, 2017
The outage has also meant that the site remains offline, preventing developers from using one of their crucial tools during the middle of the workweek.
While it’ll be easy to be condemnatory of GitLab, it’s worth commending them on their radical transparency. Throughout the process, users have been kept aware of progress via its GitLab Status Twitter account. It was also honest about the cause of the data loss – human error – and didn’t try to pin it on a hardware failure, or an external attacker. GitLab is also livestreaming the recovery efforts, which is certainly novel.
This is a developing story. We’ll keep you informed with any new updates.
Update: A GitLab spokesperson got in touch. Emphasis theirs:
“On Tuesday, GitLab experienced an outage for one of its products, the online service GitLab.com. This outage did not affect our Enterprise customers or the wide majority of our users. As part of our ongoing recovery efforts, we are actively investigating a potential data loss. If confirmed, this potential incident would affect less than 1% of our user base—peripheral metadata that was written during a 6-hour window. We have been working around the clock to resume service on the affected product and set up long-term measures to prevent this from happening again. We will continue to keep our community updated through Twitter, our blog and other channels.”
Get the TNW newsletter
Get the most important tech news in your inbox each week.