How to Fix “indexed, though blocked by robots.txt” in GSC

No Comments


“Indexed, though blocked by robots.txt” exhibits in Google Search Console (GSC) when Google has listed URLs that they aren’t allowed to crawl.

In most instances, this might be an easy problem the place you blocked crawling in your robots.txt file. But there are just a few extra circumstances that may set off the issue, so let’s undergo the next troubleshooting course of to diagnose and make things better as effectively as doable:

You can see that step one is to ask your self whether or not you need Google to index the URL.

If you don’t need the URL listed…

Just add a noindex meta robots tag and ensure to permit crawling—assuming it’s canonical.

If you block a web page from being crawled, Google should index it as a result of crawling and indexing are two various things. Unless Google can crawl a web page, they received’t see the noindex meta tag and should index it as a result of it has hyperlinks.

If the URL canonicalizes to a different web page, don’t add a noindex meta robots tag. Just be certain that correct canonicalization indicators are in place, together with a canonical tag on the canonical web page, and permit crawling so indicators cross and consolidate appropriately.

If you do need the URL listed…

You want to determine why Google can’t crawl the URL and take away the block.

The almost certainly trigger is a crawl block in robots.txt. But there are just a few different eventualities the place you may even see messages saying that you just’re blocked. Let’s undergo these within the order it’s best to most likely be in search of them.

  1. Check for a crawl block in robots.txt
  2. Check for intermittent blocks
  3. Check for a user-agent block
  4. Check for an IP block

Check for a crawl block in robots.txt

The best technique to see the difficulty is with the robots.txt tester in GSC, which can flag the blocking rule.

2-robots-tester.gif

If you already know what you’re in search of otherwise you don’t have entry to GSC, you’ll be able to navigate to area.com/robots.txt to search out the file. We have extra data in our robots.txt article, however you’re doubtless in search of a disallow assertion like:

Disallow: /

There could also be a selected user-agent talked about, or it might block everybody. If your website is new or has just lately launched, chances are you’ll wish to look for:

User-agent: *
Disallow: /

Can’t discover an problem?

It’s doable that somebody already mounted the robots.txt block and resolved the difficulty earlier than you’re wanting into the difficulty. That’s the best-case situation. However, if the issue seems to be resolved however seems once more shortly after, you will have an intermittent block.

How to repair

You’ll wish to take away the disallow assertion inflicting the block. How you do that varies relying on the expertise you’re utilizing.

PhrasePress

If the difficulty impacts your total web site, the almost certainly trigger is that you just checked a setting in PhrasePress to disallow indexing. This mistake is widespread on new web sites and following web site migrations. Follow these steps to verify for it:

  1. Click ‘Settings’
  2. Click ‘Reading’
  3. Make certain ‘Search Engine Visibility’ is unchecked.
3-wordpress-search-engine-block.png
PhrasePress with Yoast

If you’re utilizing the Yoast SEO plugin, you’ll be able to instantly edit the robots.txt file to take away the blocking assertion.

  1. Click ‘Yoast SEO
  2. Click ‘Tools’
  3. Click ‘File editor’
PhrasePress with Rank Math

Similar to Yoast, Rank Math means that you can edit the robots.txt file instantly.

  1. Click ‘Rank Math’
  2. Click ‘General Settings’
  3. Click ‘Edit robots.txt’
FTP or internet hosting

If you have got FTP entry to the positioning, you’ll be able to instantly edit the robots.txt file to take away the disallow assertion inflicting the difficulty. Your internet hosting supplier might also offer you entry to a File Manager that means that you can entry the robots.txt file instantly.

Check for intermittent blocks

Intermittent points will be tougher to troubleshoot as a result of the circumstances inflicting the block could not all the time be current.

What I’d suggest is checking the historical past of your robots.txt file. For occasion, in the GSC robots.txt tester, should you click on the dropdown, you’ll see previous variations of the file you can click on and see what they contained.

4-historic-robots-txt.gif

The Wayback Machine on archive.org additionally has a historical past of the robots.txt recordsdata for the web sites they crawl. You can click on on any of the dates they’ve knowledge for and see what the file included on that exact day.

5-wayback-machine.png

Or use the beta model of the Changes report, which helps you to simply see content material modifications between two completely different variations.

6-wayback-machine.gif

How to repair

The course of for fixing intermittent blocks will rely on what’s inflicting the difficulty. For instance, one doable trigger could be a shared cache between a check surroundings and a stay surroundings. When the cache from the check surroundings is lively, the robots.txt file could embody a blocking directive. And when the cache from the stay surroundings is lively, the positioning could also be crawlable. In this case, you’ll wish to break up the cache or possibly exclude .txt recordsdata from the cache within the check surroundings.

Check for user-agent blocks

User-agent blocks are when a website blocks a selected user-agent like Googlebot or AhrefsBot. In different phrases, the positioning is detecting a selected bot and blocking the corresponding user-agent.

If you’ll be able to view a web page fantastic in your common browser however get blocked after altering your user-agent, it implies that the precise user-agent you entered is blocked.

You can specify a particular user agent using Chrome devtools. Another possibility is to make use of a browser extension to alter person brokers like this one.

Alternatively, you’ll be able to verify for user-agent blocks with a cURL command. Here’s how to do that on Windows:

  1. Press Windows+R to open a “Run” field.
  2. Type “cmd” after which click on “OK.”
  3. Enter a cURL command like this:
curl -A “user-agent-name-here” -Lv [URL]curl -A “Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)” -Lv https://ahrefs.com

How to repair

Unfortunately, that is one other one the place understanding how one can repair it should rely on the place you discover the block. Many completely different methods could block a bot, together with .htaccess, server config, firewalls, CDN, and even one thing chances are you’ll not have the ability to see that your internet hosting supplier controls. Your greatest guess could also be to contact your internet hosting supplier or CDN and ask them the place the block is coming from and how one can resolve it.

For instance, listed below are two other ways to dam a person agent in .htaccess that you just would possibly have to look for.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]RewriteRule .* - [F,L]

Or…

BrowserMatchNoCase "Googlebot" bots
Order Allow,Deny
Allow from ALL
Deny from env=bots

Check for IP blocks

If you’ve confirmed you’re not blocked by robots.txt and dominated out user-agent blocks, then it’s doubtless an IP block.

How to repair

IP blocks are tough points to trace down. As with user-agent blocks, your greatest guess could also be to contact your internet hosting supplier or CDN and ask them the place the block is coming from and how one can resolve it.

Here’s one instance of one thing chances are you’ll be in search of in .htaccess:

deny from 123.123.123.123

Final ideas

Most of the time, the “indexed, though blocked by robots.txt” warning outcomes from a robots.txt block. Hopefully, this information helped you discover and repair the difficulty if that wasn’t the case for you.

Have questions? Let me know on Twitter.



About us and this blog

We are a digital marketing company with a focus on helping our customers achieve great results across several key areas.

Request a free quote

We offer professional SEO services that help websites increase their organic search score drastically in order to compete for the highest rankings even when it comes to highly competitive keywords.

Subscribe to our newsletter!

More from our blog

See all posts

Leave a Comment