This article was published on February 1, 2018

This simple script renders data collected by your ISP unusable


This simple script renders data collected by your ISP unusable

After voting last month to strip Obama administration guidelines, the Federal Communications Commission ceded control of the internet to ISPs, at least for now. Corporations like Comcast and Verizon won the right to police themselves, operating on a scheme tantamount to the honor system.

With a handshake agreement and a generic promise to play nice, we’ve handed the keys to corporations that have proven time and again to only be interested in pay-for-play deals and collecting customer web browsing data through any means necessary.

While there’s little we can do about the former, the latter, at least, invites a few solutions.

There’s a virtual private network (VPN). a digital wall between you and your ISP that connects to a remote server and passes data anonymously back and forth. ISPs can see that you’re connected, they can meter the connection to monitor bandwidth usage, but ideally they’ll never know which sites you’re visiting or what you’re downloading.

The <3 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

Ideally.

VPNs are far from foolproof. We’ve seen a number of them, in fact, share user information after claiming not to store this data. It’s a Band-aid solution. It’s placing trust in a VPN provider rather than your ISP.

A 2009 paper authored by Shaozhi Ye and members of the Department of Computer Science at The University of California may offer a better solution. The group argued creating digital noise obscures valuable data by bundling it with less desirable data, called obfuscation.

Ye et. al. argued the valuable information vanishes if:

Number of noise calls ≥ (Number of user calls – 1) × Number of possible calls

Put simply, web users don’t need to block ISPS, they can overwhelm them instead. If the average web user visits 100 domains a month, surrounding the valuable data (which 100 domains you are visiting) with useless data (“noise calls” from 20,000 random domains), the data essentially becomes worthless, or at least prohibitively expensive to comb through.

Unfortunately there’s still no easy way to do this, unless you’re well-versed in Python.

For those that have the technical chops, GitHub user “essandess” created a script that creates the noise calls for you. The script works on Mac and Linux, sending random noise calls at increments defined by the user. In on-page examples, the default explanation involves a noise call about once every five seconds, which should leave even the most bandwidth-squeezed users from running into the business end of their data caps.

All told, essandess estimates bandwidth usage under 50GB per month using the supplied example of 20,000 random calls. It accomplishes this by stripping “noise” websites of images, which means you’ll be downloading less information from each page visited.

It’s not perfect. The script obviously requires a technical skill beyond what most of us novices are capable of setting up and troubleshooting. It’s also not infallible, a motivated ISP could train AI to sift through the data based on previously un-obfuscated data collected on each user.

But, it’s a start.

Get the TNW newsletter

Get the most important tech news in your inbox each week.