Dreamhost asking clients to block GoogleBot

 
1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 3.91 out of 5)
Loading ... Loading ...
| 36,826 views
 

Yes, it’s quite true. DreamHost representatives are asking their clients to block GoogleBot trough the .htaccess file, because their websites were “hammered by GoogleBot”.

Dreamhost representatives are also guiding their clients to make all their websites “unsearchable and uncrawlable by search engine robots”, because they cause “high memory usage and load on the server”.

Dreamhost email

PS: This is NOT a rumor. Multiple people are already complaining about Dreamhost asking clients to block search engine spiders. A post in QuickOnlineTips confirms this too.

Initial news bit via Zoso.

Submit your business or company to Webxperience! and Webotopia directories.
If you found this post useful please Subscribe with Bloglines Add to Technorati Favorites (new windows)
Tags: none
 

51 Comments so far

Paul Drago said:
May 17th, 2007

Do you have a link? Or an email any sort of official statement?

May 17th, 2007

You have an e-mail in the screenshot above. It’s not hard to figure it out.

Everett said:
May 17th, 2007

If they were smart enough to do a reverse DNS they’d probably find most of those Googlebot visits are rogues spoofing the Google user agent.

What a waste… the sad part is some people are actually going to listen to them.

Tomche said:
May 17th, 2007

Ridiculous I would say. Why would anyone block access to googlebot? They are always a welcome.. Crawl more, fellows..

John said:
May 17th, 2007

I am the system stability manager here at DreamHost and I thought I might try to explain how googlebot can occasionally be problematic. Of late I have been working on our very heavy usage customers (people using over a quarter to in some cases 250% of what the full server should be processing). In excess of half of these cases the cause is google’s crawler malfunctioning in how it interacts with the site resulting in heavily loaded or even crashing machines (I have had cases where googlebot has over 95% of the last 10,000 hits on a site that doesn’t even have that many pages). Our terms of service (http://www.dreamhost.com/tos.html) specifically state that “if your processes are adversely affecting server performance disproportionately DreamHost Web Hosting reserves the right to negotiate additional charges with the Customer and/or the discontinuation of the offending processes” so that we can ensure that we keep machines working for everyone on them and not just one user. In a case where a faulty googlebot interaction is killing a machine I have two options:

1. disable the site
2. block the bot

We feel that the best solution for our customers is to stop just the malfunctioning behavior and keep _everyone’s_ sites working as well as possible on the machine. In the specific case referenced above I do not feel that it was handled with the appropriate level of care (I always include detailed information about usage, the logs I looked at and the resulting improvement of loads or the like on the customer’s server) and will more strictly define our policy on this subject so we can avoid any more confusion like this. I will also remove the block here if it was not necessary (we do make mistakes sometimes and I am glad this came to my attention so that I can properly train everyone). Really the goal is to provide the best possible hosting experience for our customers as a whole!

May 17th, 2007

John, no matter how the client’s website is behaving, no matter how much the GoogleBot crawler is crawling, you just CAN’T ASK your clients to ban the bot, that in most cases brings in excess of 50% of the total traffic.

Try to think that more then 40% of your clients, maybe don’t even know what code you have given them, and they will put the code in the .htaccess file, thus depriving themselves of all the Google traffic.

It’s not natural what you are doing.

If some of your clients are using the servers at 250% capacity, then disable those accounts, or recommend an hardware upgrade.

Not this.

Oradeanul said:
May 17th, 2007

I agree they handled it professionally but this shouldn’t of happened in the first place, this is the whole deal.
Oh well …you can’t expect too much for 10$ / month.
I would like to add that I also have some sites with dreamhost and they have their ups and downs …but I really like them hosting my static content and keeping my 46gigs of backups.
thank you

John said:
May 17th, 2007

We actually do provide upgrade paths for heavy usage customers. From my experience customers overwhelmingly prefer that we disable their account rather than specifically address the source of the usage issue. Also there appears to be the misconception that we are blocking traffic from visitors to google – we are not, we are blocking the google crawler that is causing the artificial traffic (people connect from their ISP which is a different location). Hits from googlebot is not actual people trying to access the domain (the legitimate visitors are exactly who we are protecting by getting rid of the malfunctioning software that is hammering the site =).

John said:
May 17th, 2007

My apologies – I meant that customers prefer that we ‘not’ disable their account.

May 17th, 2007

Hits from googlebot is not actual people trying to access the domain

Oh My God John. You’re writing in an Internet Marketing blog.

From second 1, we KNEW that we were not talking about visitors.

But if GOOGLE can’t find your website, because it’s crawler is banned, how is that website going to be indexed in Google ?

It will NOT. And it will loose 50% or more of it’s pottentian traffic, because the client’s hosting company recommended them a “GOOD solution”.

the legitimate visitors are exactly who we are protecting by getting rid of the malfunctioning software that is hammering the site =)

So john, GoogleBot, and Google indexing your website = malfunctioning software ?

What kind of hosting company is DreamHost ?

And why are you employed there ? You should not write anywhere, representing DreamHost. You are doing more harm then good.

I rest my case.

Sergiu said:
May 17th, 2007

Basically you should be able to check if Google is really hammering your site(you as the owner) by using the Google webmaster tools. There you can even control the crawl rate.

John said:
May 17th, 2007

I apologize if I did not make myself clear – the only cases where we would block googlebot are when the following conditions are met:

-the site in question is causing the server it is on to be unstable

-the site in question is causing erratic or abnormal behavior on the part of google’s crawler

We do not block googlebot on every busy customer site, only when it is demonstrated that it is causing artificial usage (a 10 page site does not require 5000 hits from googlebot to be indexed =) and when the alternative to blocking googlebot is disabling the entire domain. It is disingenuous to suggest that this refers to google simply indexing sites – I actually have been in direct with google engineers to help sort out the specific cases where by their their crawler was not performing as it should be.

May 17th, 2007

I actually have been in direct with google engineers to help sort out the specific cases where by their their crawler was not performing as it should be.

As a commenter said above, a more reasonable resolve to your problem, as a hosting company, would be to SLOW the crawling, not ban GoogleBot.

Allex said:
May 18th, 2007

For most of us adsense is the reason we have websites. They should say they want to not offer anymore hosting services. Everyone will “pack up” (back up) their websites and move to another host. I suggest hostgator or vamphost (if you don’t have a lot of money).

John said:
May 18th, 2007

We actually looked into that as a preferable option, unfortunately this is not under our control as googlebot specifically ignores the slowcrawl directive in robots.txt files (also it can take up to 24 hours for it to recheck the status of a robots.txt file and in the cases where we are forced to take action an immediate solution is required). What we do suggest to customers is working with google to determine what specifically is happening and if they can resolve the issue with google engineers there is no need to protect thier site from the crawler. Again the cases where this is even an issues is a minute fraction of a percent (out of a half million domains I probably run into at most 10 a week that are having this sort of problem, that’s less than 2e-05).

May 18th, 2007

John, please do us a favor: stop talking nonsense. Telling a customer to ban googlebot is like telling a classic store to get rid of the nice front window and put some bricks instead. Maybe also a steel door.

I run some websites that have more than 70% traffic from google. Imagine how bad it will get if I tell google: stop crawling and get the f@#% off.

From my point of view your intervention here did nothing good. Oh, I’m wrong, it did something good: made me one of the guys that tell others that dreamhost sucks.

Cheers :)

John said:
May 18th, 2007

I feel that you are not really reading my posts – we’re not blocking traffic from google to all or most or even many customer websites (and in the specific case referenced above we did not handle it correctly, I have already removed the blocks and contacted the customer to apologize – I also have trained the tech responsible and conveyed the proper situations in which such steps would be necessary to our entire company at large).

I am sure your websites would run perfectly fine on our system without any interference from us or block of google (more than 99.99998% of sites hosted here never run into this problem). Please consider the effect also on other customers on servers with one of these anomalies. Would you accept the explanation that your hosting company couldn’t do anything about it because it’s just googlebot malfunctioning while hitting another customer’s site and causing them to consume most of the server resources to service it? It’s really no different from disabling a bad script that is crashing a server – in both cases the action taken is required to maintain server stability and is a far cry better than simply disabling a customer’s entire site or account.

May 18th, 2007

in the specific case referenced above we did not handle it correctly, I have already removed the blocks and contacted the customer to apologize

Seems fair to me.

action taken is required to maintain server stability and is a far cry better than simply disabling a customer’s entire site or account

True as well, you don’t need 50 angry customers just because of one. If I was in the the specific case referenced above I’ll keep and eye on the execution time of my scripts… something is still wrong.

Blah said:
May 18th, 2007

Oh how message boards make my head hurt…

John: You’re taking things a bit too literally… it’s pretty clear to me when someone compares blocking googlebot from locking customers out, they mean that the traffic _will_ stop because their site won’t be crawled.

That being said, no one else hear seems to be able to read either. It was pretty clear that it was only when someone’s site was already using more than their allotted resources that they recommandation was made.

I would say it’s pretty disingenuous to assume it’s the fault of the GoogleBot, but you’re working with them so that should eventually be resolved.

On a side note, I’ve seen the infrastructure of hosting companies where one site can crash or severely degrade an entire server, and I have to say that would imply your network/setup is crap. Anyone who values customers would have shared backend storage and would be able to easily handle an influx of traffic to a few sites, whether it’s through load balancing or not cramming a server so full that a single customer can affect however many hundreds of other users are on the server.

Good luck getting through to the hard heads on this site, and good luck maintaining what sounds like an architecture that won’t scale :)

John said:
May 18th, 2007

The cases I have been working on are ones where users are at a quarter or more of the processing consumption of the entire server. It is true that our architecture does not scale such that one user can horde 25% of the processing power of a server without ill effect (though to be fair the servers can usually manage to stay up, we simply don’t think really high loads all the time are acceptable). At the point though were a user is taking up that kind of utilization they aren’t in the shared hosting market (no $10/month plan is going to support hosting 4 or less customers per machine =). If you would like to make informed observations about our hosting infrastructure we have some openings available:

http://dreamhost.com/jobs.html

You are correct that issues of scale are a great challenge for any large ISP but we’re pretty upfront about how things work:

http://blog.dreamhost.com/2006/05/18/the-truth-about-overselling/

Thanks very much for the dialog I hope that I have addressed the concern regarding the initial post and will get back to trying to provide a positive experience for our customers. Cheers!

Jai said:
May 18th, 2007

Another case where they edited the .htaccess and brought the site down.

Clickfire said:
May 18th, 2007

John, why don’t you ask users who have the problem to regulate the frequency of crawling through Google Sitemaps. At least that way your customers won’t be de-indexed.

jaja said:
May 18th, 2007

I’m not a fan of google, they are worse then microsoft. If googlebot is blocked to keep it from crashing the server how is google going to make there billions of dollars on imaginary hits. Adsense is probably a ripoff, the money you get is probably just a fraction of what google gets.

Florin said:
May 18th, 2007

This sucks big time…
this makes they to loose future and actual customers … what are they thinking about ?

Vlad said:
May 18th, 2007

If I were one of their customer I would vote with my two feet. No, good. I hope they will learn from it.

James said:
May 18th, 2007

Of course if the server is going unstable steps must be taken to rectify it and if that means inhibiting googlebot then so be it. BUT this should be a short term measure and the offending site owner given all the information required to make the site work with googlebot again. I am not sure this is happening here.

May 18th, 2007

John,

Have you tried talking to someone from Google? I’m sure since you host so many sites, they might be receptive to your questions. Maybe they can look into what’s wrong with those 0.00001% websites that are overloading your servers, and you might find a workaround that would not affect your server’s performance nor the traffic & site performance of the websites in subject.

Ant Onaf said:
May 19th, 2007

IMHO if a web host wants to block Googlebot or any major search engine robot, then it does not make sense to host a website there. The point of having a website is so others can find your site on the search engines, in this case on Google, this cannot be accomplished if you are blocking Googlebot.

It sounds like Dreamhost may need to look at their own network and hardware, to see where the bottleneck is at. I would advise trying to find out what your competitors are doing differently, because they do not seem to have this problem, even though I am certain they get just as many crawls from Googlebot. I think Dreamhost should take a closer look at their load balancing equipment, as you are a web host and suppose to be able to handle high spikes in traffic. I worked for years as a systems engineer and have worked in data centers, I have dealt with many traffic spike issues, but never banning bots were the answer.

John Bishoff said:
May 20th, 2007

To all those idiots out there who don’t see the bigger picture:

[QUOTE]
IMHO if a web host wants to block Googlebot or any major search engine robot, then it does not make sense to host a website there.
[/QUOTE]

They are not blocking Googlebot from all their clients sites. Only a .0001% of their clients get the google block.

Instead of blocking Googlebot from these sites, should Dreamhost:

#1) Tell the customer to fuck off and suspend their site? This option would make it where NO ONE could get to their Web Site, including Google.

#2) Tell the other 100 clients on the server to fuck off, and continue letting google consume the majority of the servers resources? Meanwhile, all clients on the server (including the site getting pounded by google) is loading extremely slow, or just flat timing out.

#3) Temporairly block the googlebot from the 1 site it’s causing problems on, and notify the client. Meanwhile, the clients site is still operating (able to take sales, show ads, or whatever) and every other site on the server is running fine as well.

Hmm… Let’s see… The choice is so fucking hard isn’t!!!

According to the majority posting comments… Dreamhost should just tell all the clients on the server to fuck off. After all, google is doing it’s natural thing…indexing sites. The thing your obviously not getting…if google is generating that much of a strain on the server, google won’t be able to index alot of the pages? Why??? Because the pages aren’t even going to load when the server is under that much stress!!! Not only will it not load for google…it ain’t going to load for anyone else visiting the site either. Not only will it be that 1 site with problems… It will be 100′s of other sites on that server too!

As a site owner… I would MUCH rather a hosting company suspend google temporairly than suspend my whole site. That will give me time to analyze the situation and upgrade to a dedicated server if I need to. If google is displaying erractic behavior as John suggested, I would personally contact them to tell them to fix there shit.

Last but not least.. Dreamhost is taking an extra step to identify the exact problem. Most hosting companies would simply suspend your account without even seeing that the googlebot was the culprit and was acting erratically. Cheers to them for identifying the problem in the first place and avoiding having to suspend a clients account.

Google isn’t going to de-list your web site from not being able to access it for a day. So stop overreacting.

May 21st, 2007

John, please refrain with your language. I left your previous comment go, but the next ones …

Respiro said:
May 25th, 2007

I am a D.H. user and I have to tell you that John’s responses are disappointing…

June 2nd, 2007

That is pretty insane. If my hosting company gives me such a dump suggestion. I will move my site to new server right away. blocking googlebot is blocking Dollars.

Sarah B said:
June 11th, 2007

Dreamhost is a fantastic hosting company. They may not be the best choice for everybody of course. Dreamhost is a worker owned co-op, carbon neutral, and they purchase their electricity with green credits (i.e. they financially support wind and solar power until such time that they can obtain all power from such sources). I don’t work for Dreamhost, I am just a completely satisfied customer, and have been for 5 years. They do offer dedicated servers if you can afford it. But if you choose the $8/month shared server deal, then you have to abide by the agreement that you “sign” when you join. If you are more concerned with googlebots then the overall health of the server and all of its users, simply upgrade to a different deal. I don’t see why people are attacking Dreamhost for handling their situation for the benefit of the whole versus the benefit of the few. You chose the server you use because of what it offers. If you choose Dreamhost, it would be because of what they offer. For me, that offer includes green hosting with a worker-owned co-op. In the big picture, that means more to me than traffic.

Sarah B said:
June 11th, 2007

oh, and take note, John Bishoff does not claim to be John from Dreamhost. John from Dreamhost was quite polite, not yelling and cussing. John Bishoff does not use any website or contact info. So, it may be a different John, or someone trying to make Dreamhost look bad. Just a thought…

wwwoliondorcom said:
June 11th, 2007

Hello,

Do you think that googlebot could be the reason for not being able to edit some pages on a drupal website?

Thanks.

June 14th, 2007

If I were one of their customer I would vote with my two feet. No, good. I hope they will learn from it.

July 2nd, 2007

So far so good, Dreamhost has been really kind and professional to me in my 4 years with them.

sample, google this:

Taking Over Education

I’ve been graced by Google and Dreamhost so far.

Thanks, Rbt

Galla said:
July 28th, 2007

John, do your customers block the GoogleBot?

August 15th, 2007

To be fair, I too have had this problem on some of my dedicated boxes on another host. Google’s bot, particularly the one that checks for adwords permission, likes to run crazy on websites when people start creating 1000+ keyword campaigns. I’m not entirely sure why either, but for the life of me, we see these days where our hit count shoots from the 100′s to the 1000′s and it’s all coming from a Google IP-space. The strange part about the whole thing, is it’s a 3 page site.

Now if they were all dynamically generated pages, and poorly coded to execute… 100 slow queries… that could put a massive strain on a server.

I think there are times where the host has to take action even for the site owners own good, and it sounds like in most cases, this is exactly what dreamhost does. On the other upside now, they have their virtual server stuff in place, so you can now get your processor dedicated to you for those times where you do generate that much traffic.

Dreamhost is one of the most open shared hosts I’ve worked with (I’ve worked with about 15 at this point). There in my top 3 choices, and with promocodes allowing you to get the price down to like 2.50/month for the first year… hard to beat.

If your in this situation though, and generating that much *dynamic* traffic, it’s probably time to start looking at dedicated hosting anyway…

August 15th, 2007

Dreamhost is one of the most open shared hosts I’ve worked with

The website under your name having no relationship with this right ? :)

August 18th, 2007

Don’t be fooled, that promocode website is how Justin here makes money, not how DH makes money.

I, too, am a very satisfied customer of Dreamhost, having been with them for a little over 2 years (and currently having a 2 year contract with them expiring in 2009). For what it’s worth, I have never had a problem with them; heck, if you’re having a problem with your website, and it’s due to a coding error, or maybe an endless loop in your .htaccess, they will actually figure out what’s wrong, instead of just telling you “f**k off, it’s all good on our end, let us fix the real problems”. I’ve had that exact line fed to me by other shared hosts – when one could even get a response out of them. Most of the time, I couldn’t even get that. I would definitely rather work with a host that will look at every conceivable problem point, rather than take what we’ll call the “There’s nothing wrong with our s**t, check your code” route.

Antonio said:
September 20th, 2007

I just wanted to point that I have got the same emails to restrict googlebots due high traffic storms at one of my sites.

I understand and share the measure, indeed I’d have done the same and also I think for the cash I spent for my hosting plan, I’m already getting a lot. I’m glad they care of these kind of issues, as I’dnt be really happy if my server-neighbors were crashing the server the whole day.

And finally, I have to said Dreamhost support team it’s the BEST one I ever have seen (and I suffered from several already at other hosting companies).

And from my own experience, being myself a “professional-resource-eater” (huge databases, thousands of queries, dynamic graphs,…) :) Dreamhost support team is extremely helpful, take care of your problems and I really believe they enjoy helping you in any aspect.

I have submitted several tickets and all of them have been answered faster than I’d expect. And in all cases I had professional answers.

So I’d never complain about DH having invested so few bucks and getting such cool support (which is MOST important value IMHO).

Else I’dnt applied for a DH PS server. :)

Best regards.

October 8th, 2007

The problem is we are on a SEO forum, and the forest is not visible. Because of the trees.

Posting here that you “dared” to suggest temporary Google-ban in order to keep a shared hosted up it’s like being a physician and suggesting a blood transfusion to a Jehovah Witness in order to save his life :)

John Bishoff is right, although he could have expressed that in a nicer way. Unless… Google is God and “Thou shall not question erratic Googlebot behavior!” in which case: John Bishoff shame on you! [let's burn him!]

October 9th, 2007

Alex, this is not a forum. It’s a blog. But it’s like saying that I know what your business is. And I don’t. Thus, I’m not writing advices about it.

If you don’t know what SEO means for a business, off course it’s ok for you to block GoogleBot.

October 9th, 2007

I know that if a server does not work at all there is no business at all for anyone hosted on that server.

And I do know what SEO means for a business, but I am not sure why you guys here keep being rude to people having different opinions.

I’ll stay away I promise :)

MK said:
November 6th, 2007

Googlebot sucks!
We can’t limit how much bandwith it consumes, via crawl-dealy nor REVISIT-AFTER, it’s 0 or 100.
If we pay for traffic, we pay it for Googlebot!

sakjdf said:
November 9th, 2007

Google bot absolutely sucks!

I love Dreamhost and I completely respect their decision.

I hate cheapskates like the ones complaining who DON’T WANT to pay but expect the moon with service. People like you should be put on a permanent never-host-with-these-customers blacklist. Good luck with finding a host that takes your bull.

November 13th, 2007

Hi, I have adwords, adsense, sitemaps, analytics, everything; I live and breath SEO, I dream SEO, I’m also a Dreamhost customer, I was reading blog.dreamhost.com and they link here from an old blog post. I love Dreamhost, they are amazing, and have values that make them outstaning in a cold cold business world, they are completely transparent about shit that goes wrong and help out whenever they can. If not for Dreamhost, I’d would have Go daddy or fu**ing Wildwest, sneakily doing everything they can to steal my traffic in them dodgy 404′s or forced parking, and charge me for shit they just make up, no way!

Look, I am not only an SEO guy but a hardened PHP programmer and I can totally agree that it’s the responsibility of the programmer to make his programs run “nicely” on their environment, I have payed dearly for shit scripting, with sites being taken offline which is devastating to say the least, but not with DreamHost, blocking Google’s crap spider(which is just a machine and has no feelings ;) for a short period of time IS NOT GOING TO HURT YOUR BLOODY SITE, as much as a host pulling the plug on you, it probably won’t have an effect at all because Google still has your site indexed, and cached, and will stay in the index for a long time. If your really are the type to care and are half as good an SEO as you say you are you’ll have sitemaps and Google will still know you are there! Plus for those that don’t know about Googlebot like the saying goes – it won’t hurt them.

Some of you people are so one eyed that its stupid and you can’t spell – that sucks for SEO and you’re all indignant about something you know nothing about so you suck. I feel for John being on an SEO website and putting his case to any sensible person but getting dumb scatty replies.

I don’t know anyone at Dreamhost but John you can Block Googlebot anytime its crashing my $10 a month machine because I know that my $10 per month comes nowhere near the cost of me using it’s full capacity or even 1/4; I trust Dreamhost they’ve been good. And stop bloody complaining. You’ll notice that the longer a thread draws out the more sensible it becomes. GodBless.

Death to Google said:
November 16th, 2007

Let the guy alone. He is doing all of us a favor, by not letting these fracking bastards from Google to harm their servers and collect much data without our consent. I wish my free host was able to provide the same support to avoid these bots (which I am unable to do). John, you have my full support. Unfortunately, these Google-fanatics are like a disease, without a cure.

wallpaper dude said:
June 1st, 2008

This is what happens, i use dreamhost and recently got 10 emails saying im using to much bandwidth, plus we need to block google bot.

OK so i upgraded to PS!!! wow now i should be getting 5x more traffic, HELL NO!!! my site is now slow as sh1t! internal server errors almost any point of the day.

Then they want to put you at 150Mhz cap on server usage and 150Mb ram for 15 extra per month. FOR EVEN WORSE SERVICE THAN shared hosting.

John can you explain this again??? your not making any sense here on the public boards.

June 1st, 2008

About Dreamhost PS, you need to set the memory according to your websites usage, but it’s true that I wonder who can use only 150 MB for 15$, as even just google #$&*#@#@ bots might use more than that…

Leave a reply

Allowed tags (XHTML): <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

RSS Readers & Technorati:

My Subscribers Add to Technorati Favorites
View Cristian Mezei's profile on LinkedIn

Subscribe & Syndication:

RSS Feed at FeedBurner NewsGator Google Bloglines My MSN Add to My Yahoo! Add to FeedLounge Add to My AOL Add to NetVibes Add to NewsBurst Add to Pluck Add to NewsIsFree

Updates by e-mail:

My Blogroll:

Blogs & Forums (RO)

Blogs & Websites

Official Blogs

Social Websites
Powered by Bloglines

Tag Cosmos: Top Tags