Wednesday, 3 June 2015

Why Your Site Might Not Get Indexed

Typically these kinds of issues are caused by one or more of the following reasons:


  • Robots.txt - This text file which sits in the root of your website's folder communicates a certain number of guidelines to search engine crawlers. For instance, if your robots.txt file has this line in it; User-agent: * Disallow: / it's basically telling every crawler on the web to take a hike and not index ANY of your site's content.
  • .htaccess - This is an invisible file which also resides in your WWW or public_html folder. You can toggle visibility in most modern text editors and FTP clients. A badly configured htaccess can do nasty stuff like infinite loops, which will never let your site load.
  • Meta Tags - Make sure that the page(s) that's not getting indexed doesn't have these meta tags in the source code: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
  • Sitemaps- Your sitemap isn't updating for some reason, and you keep feeding the old/broken one in Webmaster Tools. Always check, after you have addressed the issues that were pointed out to you in the webmaster tools dashboard, that you've run a fresh sitemap and re-submit that.
  • URL Parameters - Within the Webmaster Tools there's a section where you can set URL parameters which tells Google what dynamic links you do not want to get indexed. However, this comes with a warning from Google: "Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don't recommend you use this tool unless necessary."
  • You don't have enough Pagerank - The number of pages Google crawls is roughly proportional to your pagerank.
  • Connectivity or DNS issues - It might happen that for whatever reason Google's spiders cannot reach your server when they try and crawl. Perhaps your host is doing maintenance on their network, or you've just moved your site to a new home, in which case the DNS delegation can stuff up the crawlers access.
  • Inherited issues - You might have registered a domain which had a life before you. I've had a client who got a new domain and did everything by the book, but Google refused to index them, even though it accepted their sitemap. After some investigating, it turned out that the domain was used several years before that, and part of a big linkspam farm. We had to file a reconsideration request with Google.

No comments:

Post a Comment