Last week I was at a marketing conference in San Francisco, where I met Elias Terman, VP of Marketing at Distil Networks. He had a booth set up outside the conference hall and I got to chatting with him about his service. I know, it sounds like a pretty standard conference conversation, but it got interesting when he started talking about some of the research his bot mitigation company had done about web scraping.
It’s one thing Priceline, Amazon, Expedia and a ton of other massive retail websites deal with: bot traffic scraping their sites to pull data. And it affects a large number of other sites too. Why this happens and what bots do with the data is the story though.
I sat down with him to get some more information, and go into the rabbit hole a bit more.
Clayton: So, how much of the traffic to people’s sites is bot traffic?
Elias: According to our Bad Bot Report, 20% of all web traffic is bad bots, but the amount of bad bot traffic on your site will vary, and can be much higher if your website has features like a login page, payment processor, web forms, or proprietary content or pricing information. And even good bots, another 19% of overall web traffic, skews website analytics.
Clayton: How does it affect businesses?
Elias: Bots rarely make the headlines, but they’re the key culprits behind web scraping, competitive data mining, account takeovers, transaction fraud, unauthorized vulnerability scans, spam, click fraud, denial of service, skewed analytics, and API abuse.
Clayton: Why are bots trying to scrape so much of the web?
Elias: There are a wide range of motives for web scraping, including content aggregation (think price comparison sites), content theft, competitive intelligence, and price matching. The knock-on effects include site slowdowns and downtime, and lower SEO rankings (as Google penalizes duplicate content).
Clayton: And Google Analytics reports, have we been looking at inaccurate reports all this time?
Elias: When a bot hits your site, it fires analytics triggers, just like any other user. That means when you look at traffic reports, conversion rates, or a/b tests, you’re looking at data generated by bots, and you’re likely making the wrong business decisions on a daily basis. If you aren’t filtering out the bad data generated by bots, then you aren’t optimizing your site for your real human visitors. We recently released a free Bot Discovery for Google Analytics plugin that helps fix this problem.
Clayton: Is there some malicious intent of bots, or is there such a thing as a “good bot”?
Elias: Lots of bots help the internet run better, such web crawlers like Google Bot. Good bots help organize information and make the internet run smoother. But, even good bots can have unintended side effects. If you don’t filter good bot actions out of your analytics, you’ll make the wrong decisions. If a publisher doesn’t know a visitor to their site is a good bot, they could show it an ad and waste advertiser money. Bots are always going to be part of the internet, so it’s important to know which of your visitors they represent, even if they aren’t malicious.
Clayton: How do we stop unwanted crawling of our website?
Elias: There are a number of ways to control web scrapers, some more effective than others. Robot.txt doesn’t work because bad bots just ignore it. Dealing with them manually by blocking IP addresses is a losing game of whack-a-mole. Most bots now mimic human behavior, and spread their attacks over hundreds of IPs so you really need something that is purpose-built to detect and block scrapers. A tool with behavioral analysis and digital fingerprinting designed for the task.
Clayton: Whose role best fits the bill for this?
Elias: Stakeholders that care about the success and security of their website, mobile app and APIs. The initial catalyst to solve the problem might come from the business side in that they want to improve the customer experience, thwart competitors, and make better decisions. Or, it might come from IT security or the fraud department looking to block account takeovers, credit card fraud and other cyber threats.
Clayton: Can it help improve our traffic numbers of conversions on a website?
Elias: Absolutely. Let’s say you run an a/b test to see how a new page performs. Imagine bots were previously hitting that page to scrape data, but they aren’t designed to hit the new variant. In this case, your conversion numbers might go down, even though the page performs better when compared only to human user actions.
Another example, if you are running an ad campaign to drive clicks to your site, you are definitely paying for bot clicks. The problem is those don’t convert, like filling out a form or making a purchase. We have seen conversion rates increase by 30-40% when sites or advertisers target only human users on click campaigns.
I heard someone say recently that web scraping was one of the major key components to Amazon’s success, but it never gets talked about. Probably because it’s not the most ethical thing to do in ecommerce. With all of the malicious things web scraping can do on the web, the rabbit hole goes much farther I’m sure. I wonder how much of our Google Analytics reports are bots.
This post is part of our contributor series. It is written and published independently of TNW.