Big Websites with Big Robots

Because I’m having a problem with my robots.txt where Googlebot can’t reach and crawl my site, so I decided to walk around and check big company’s robots.txt file.

Do you want to know how many disallowed folders they have set in their robots.txt file? Let’s find it out.

Google – robots.txt
Google Adsense – robots.txt
Gmail – robots.txt
Microsoft – robots.txt
Friendster – robots.txt
Paypal – robots.txt
CNet – robots.txt
Digg – robots.txt
Blogger – robots.txt
Wordpress – robots.txt
Technorati – robots.txt
HP – robots.txt
Sony – robots.txt
Apple – robots.txt
Canon – robots.txt

Cypher: Actually, I’m bored and don’t know how to fix the Google robots.txt problem.

Comments

  1. I’m not a wordpress user but your sitemap.xml is just a dynamic URI right? So the actual file should reside in /wp-content/ (which is marked as disallowed). Try enabling to /wp-content/ and see what happen.

  2. my sitemap is located at http://www.cypherhackz.net/sitemap.xml

    i have tested my robots.txt permission in webmaster tool (robots.txt analysis). google bot should able to reach my sitemap and robots.