This article was published on March 29, 2012

ScrapeShield from CloudFlare: A one-click solution to end website content scraping


ScrapeShield from CloudFlare: A one-click solution to end website content scraping

I’ve waxed poetic about CloudFlare a few times in the past, and with good reason. The company, which focuses on securing websites while having a side-effect of making them insanely fast, has been hard at work solving problems of our broken Internet since it came into existence. Now, with a new, free CloudFlare app called ScrapeShield, it’s going even further.

ScrapShield is the answer for content creators who keep finding their work curated (scraped) onto other sites around the Internet. While this isn’t the time or place to get into an argument about curation, it’s simply understood that some publishers would prefer that their content stay right where it is. So with a suite of tools-on-toggle, ScrapeShield offers a vast amount of protection:

While hotlink protection isn’t new, and email obfuscation has been one of CloudFlare’s tools for quite some time, ScrapeShield makes it even easier by putting the tools into one place, then labeling it in a “does what it says on the box” manner. The addition of a Pinterest blocker is especially handy, but what’s probably most important is that bottom option, for Maze.

Like many of CloudFlare’s tools, there’s a bigger purpose behind Maze. Maze is a network of CloudFlare users that is aggregating scraper information to build a database of known sites that can then be blacklisted entirely. ScrapeShield focuses on a Discover, Defend, Deter method, where content is tracked via invisible markers (akin to tracking pixels, if I were a betting man) and then further scraping can be deterred.

According to CloudFlare CEO Matthew Prince, most scraping that CloudFlare finds happens via automated systems which would not catch the tracking markers. But even if the site in question is being populated manually, it’s not safe. Prince goes on to explain that CloudFlare is working on a solution that will track content even if it’s copied and pasted in plain text.

The beauty for publishers is that it’s a one-click solution for keeping your content safe. Once you have ScrapeShield enabled, you’ll see a report in your dashboard telling you the site where your content is appearing. You can then report that use to Maze with a single click and get a bit of schadenfreude as a bonus. You’ll rest in the knowledge that the offending site is having its resources tied up by Maze because the system feeds garbage to the offenders, effectively slowing down their crawl rates, which nullifies the reason that the scraping happened in the first place.

If you’ve ever found your content “curated” by another site, especially without a modicum of credit given, you know that sinking feeling that you can get. Heck, it happens to us here at TNW all the time, and I can tell you first-hand that you never really get used to it. But more than the emotional side of things, it’s an ethical and perhaps even legal dilema. With ScrapeShield, you now have a way to not only end the scraping, but to enact a bit of punishment as well.

ScrapeShield, from CloudFlare

 

Get the TNW newsletter

Get the most important tech news in your inbox each week.