Reddit’s data hoarders are frantically trying to save Tumblr’s NSFW content

Reddit’s data hoarders are frantically trying to save Tumblr’s NSFW content

After Tumblr announced, on Monday, that it was invoking the nuclear option to expel NSFW content on the platform, the internet reacted with predictable rage.

At TNW, we speculated about whether the move would kill the platform. Yesterday, I scribbled a piece on those stung by the change. To my surprise, that’s primarily women and other marginalized groups who sought solace in Tumblr‘s pseudonymous communities.

Reddit veered in a different direction, as it so often does — for better, or worse. By Wednesday afternoon, just two days after the announcement, a group was already hard at work attempting to archive the whole damn thing, or, most of it anyway.

Redditor u/itdnhr began the process, collecting some 67,000 NSFW Tumblr accounts and compiling a massive list. He then shared it with r/Datasets, where other redditors stripped the non-working accounts, leaving 43,000 accounts.

Preserving Tumblr‘s NSFW accounts, though, isn’t without its challenges, both in scope and legality.

For starters, the archive is an estimated 25 terabytes of data. The simplest solution would be uploading it as a torrent, but a file this size makes that difficult. The amount of time needed to download a file this size, and the storage space required to keep it, would be prohibitive for most internet users.

You could split the file into parts or disseminate it through a single link, or series of links, on a private server. But even 25, one-terabyte files would ultimately lead to the same problem for anyone whose ultimate goal is to download the complete archive. And this method isn’t without risk.

There’s a better than zero chance the archive contains images of child sexual abuse. Apple banned Tumblr from the App Store for this very reason.

If found in the archive, the uploader, and subsequent downloaders could face legal action based on a law enforcement probe, user reports of illegal content, and/or a simple algorithmic check for known meta data could open the possibility of the involved parties being charged with any number of offenses related to possession and/or dissemination of child sex abuse imagery.

The scope of the legal liability is murky. As one legal professional we spoke with stated, there’s a lot to consider.

“As a general matter, the possession and dissemination of images containing child pornography is certainly prosecutable under criminal law in the United States,” says Ryan Clough, general counsel at Public Knowledge, a Washington DC-based open internet non-profit. “As long as you violate the explicit terms of the statue, you could be subject to prosecution.”

Clough added, “What I will say, generally, is that intent is required in virtually any criminal prosecution.”

And this is where it gets tricky.

For prosecutors to seek charges, it is likely that they’d need corroborating evidence proving the uploader, and downloaders, had specific knowledge of the contents of the archive: child sex abuse imagery, specifically. But downloading a large file that *may* contain child sex abuse imagery isn’t the same as intentionally downloading one that you *know* contains it.

Put simply, are the archivists seeking to preserve Tumblr because it because they know it contains illegal content, or in spite of the fact it may contain illegal content?

Currently, the potential legal risk doesn’t seem to be dampening enthusiasm in two of Reddit‘s largest archival communities. Users are still hard at work finding solutions for how best to preserve Tumblr‘s endangered content.

Read next: Valve's CS:GO is the latest to shove in a battle royale mode