Talk about making a point, Hacker News (a service of YCombinator) has just blocked every search engine from indexing their website. Want proof? Take a look at this:
By blocking robots spidering their website, Google (et al) cannot index their content, meaning that if you search for it, you will not find it. Bad news? Not to Hacker News.
Hacker News has long been a tight knit community, and not one driven by page views. They care little if they grow to be as large as Digg, and this move is a strong one to keep their community unique, and cohesive. However, some HN users are not happy:
Proof from Hacker News itself:
What do you think?
Updated
Paul Graham has spoken and it’s apparently all one misunderstanding:
“Don’t worry, it doesn’t mean anything. The software for ranking applications runs on the same server, and it is horribly inefficient (something 4 people use every 6 months doesn’t tend to get optimized much). This weekend all of us were reading applications at the same time, and the system was getting so slow that I banned crawlers for a bit to buy us some margin. (Traffic from crawlers is much more expensive for us than traffic from human users, because it interacts badly with lazy item loading.) We only finished reading applications an hour before I had to leave for SXSW, so I forgot to set robots.txt back to the normal one, but I just did now”




![tweet photo tweet Hacker News Just Banned Google And Every Other Search Engine [Updated]](http://cdn.thenextweb.com/files/2010/03/tweet.png)
![hacker news photo hacker news Hacker News Just Banned Google And Every Other Search Engine [Updated]](http://cdn.thenextweb.com/files/2010/03/hacker-news.png)












If information is power, there is no reason no one will try to withhold or regulate? The age of free lunch is out … which is Google most feared … because it finds you the free lunch and profit by serving with a leaflet.
What did you say? It doesn't make any sense to me.
Exactly what i was thinking. He really doesnt make sense.
Of course they push all their content through Twitter which is then swallowed and indexed by Google and Bing so does it really matter if the bots don't get it from the site?
The robots.txt file's content-type is text/html, not text/plain. I don't know if search engine ignore this or not, but I'm still able to access the site from Yahoo! Pipes.
Update:
This answers my question
“If it's text/html you can assume that it's a custom error page instead of a robots.txt (which should be served as text/plain)“
from: http://stackoverflow.com/questions/1577374/chec…
The robots.txt file's content-type is text/html, not text/plain. I don't know if search engine ignore this or not, but I'm still able to access the site from Yahoo! Pipes.
Update:
This answers my question
“If it's text/html you can assume that it's a custom error page instead of a robots.txt (which should be served as text/plain)“
from: http://stackoverflow.com/questions/1577374/chec…
Their official Twitter account is @ycombinator; the post to @newsycombinator was regarding a much discussed thread at the Hacker News site asking why the robots.txt file had been changed. The headline was by no means an announcement.
Paul Graham responded in a comment three hours ago: “Don't worry, it doesn't mean anything… We only finished reading applications an hour before I had to leave for SXSW, so I forgot to set robots.txt back to the normal one, but I just did now.”
And if you look at the robots.txt, it is now back to normal.
Banning search engines via the robots.txt file is totally lame. If they were serious they would ban the search engines by blocking their IPs. Most search engines ignore robots.txt. Most site scrapers ignore robots.txt. Almost every malicious site scraper, e-mail harvester, data miner etc ignores robots.txt. Every experimental search engine indexer being developed by some basement or garage genius ignores robots.txt files. So although it may be astounding news to you and some kind of agressive move by hacker news, it actually is rather lame and pointless. Block their IPs by spotting them in your server logs and then you can be certain that they won't be wasting any of their bandwidth.
Maybe they should just admit that the problem they were facing is that that all that non-human traffic was simply wasting their bandwidth and slowing their web site down to a crawl. Robots.txt isn't the answer..
It makes me wonder how clever these 'hackers' at hacker news are if they think robots.txt is that powerful or obeyed.
Update from Paul G. at Hacker News: http://news.ycombinator.com/item?id=1194797
Don't worry, it doesn't mean anything. The software for ranking applications runs on the same server, and it is horribly inefficient (something 4 people use every 6 months doesn't tend to get optimized much). This weekend all of us were reading applications at the same time, and the system was getting so slow that I banned crawlers for a bit to buy us some margin. (Traffic from crawlers is much more expensive for us than traffic from human users, because it interacts badly with lazy item loading.) We only finished reading applications an hour before I had to leave for SXSW, so I forgot to set robots.txt back to the normal one, but I just did now.
You guys should actually read his comment:
http://news.ycombinator.com/item?id=1194421
When I load the robots.txt (http://news.ycombinator.com/robots.txt), I see a more generous version:
User-Agent: * Disallow: /x? Disallow: /vote? Disallow: /reply? Disallow: /submitted? Disallow: /threads?
Anyone else seeing this?
they should not banned it.google crawler is the best thing that ever happened in the internet. :-)
I wish all search engnies would either clean up their searches or get of the web. Most do not search for what you stated or asked and overload you with useless links to others searches that run you in circles. If they only looked for what you asked, no more no less they would be useful. Likw they are now they are useless tools. I would like to find what I asked for for once. If nothing matches then at least be honest and send you on a wild goose chase.