customization - Setting custom search engine indexing for a "dynamic WordPress page" with htaccess

admin2025-04-20  0

I am searching for hours but didnt found the code for my requirement, Can my requirement be done with htaccess?

Requirement - These urls of a page should be indexed by Google:

  1. example/example-page
  2. example/example-page?para_a=1&para_b=1
  3. example/example-page?para_a=2&para_b=2

Requirement - But this page with all other parameters in the URL should NOT be indexed by Google.

I found these code parts:

<If "%{QUERY_STRING} =~ /foo=bar/">
...
</If>

.

<If "req('Host') != 'www.wooga'">
...
</If>

.

<Directory /var/www/html/>
...
</Directory>

.

Header set X-Robots-Tag "noindex"

I have no idea which combination of code will work for me. Please tell me if anyone knows. Thanks.

I am searching for hours but didnt found the code for my requirement, Can my requirement be done with htaccess?

Requirement - These urls of a page should be indexed by Google:

  1. example/example-page
  2. example/example-page?para_a=1&para_b=1
  3. example/example-page?para_a=2&para_b=2

Requirement - But this page with all other parameters in the URL should NOT be indexed by Google.

I found these code parts:

<If "%{QUERY_STRING} =~ /foo=bar/">
...
</If>

.

<If "req('Host') != 'www.wooga'">
...
</If>

.

<Directory /var/www/html/>
...
</Directory>

.

Header set X-Robots-Tag "noindex"

I have no idea which combination of code will work for me. Please tell me if anyone knows. Thanks.

Share Improve this question edited Oct 4, 2019 at 16:15 MrWhite 3,8911 gold badge20 silver badges23 bronze badges asked Oct 3, 2019 at 22:37 proseosocproseosoc 1176 bronze badges 4
  • 1 "this page with all other parameters in url" - Is this still a valid page or should/could this simply return a 404 (to not even hit WordPress)? Which version of Apache are you on? Is the order of these URL parameters always para_a=<value>&para_b=<value> or could they be in any order? – MrWhite Commented Oct 3, 2019 at 23:08
  • 1 @MrWhite Thanks for reply. All the url parameters return a valid working page, but I want to hide all urls from search engine except few. Apache/2.4.34 – proseosoc Commented Oct 3, 2019 at 23:16
  • 1 @MrWhite Number of parameters can be more or less(as requested) and also random order. – proseosoc Commented Oct 3, 2019 at 23:24
  • I know some regex but I dont know much htaccess – proseosoc Commented Oct 3, 2019 at 23:42
Add a comment  | 

1 Answer 1

Reset to default 1

You'll need the Header set .... directive, but to set it conditionally based on the URL. One way of doing this is to use mod_rewrite to set an environment variable (eg. ROBOTS_INDEX) when your URL criteria are met (for the URLs you want indexed) and use the env= argument to the Header directive to conditionally set the X-Robots-Tag header when this env var is not set.

I found it easier to express the logic in this manner, rather than checking for the URLs that you don't want indexed (and setting the opposite env var, eg. ROBOTS_NOINDEX). And setting the response header when the var is set. Although it may be worth researching this approach some more.

You'll need to use mod_rewrite, as opposed to mod_setenvif, to set the env var, since you need to examine the query string portion of the URL. (The SetEnvIf directive only allows you to examine the URL-path portion of the URL.)

The complication is that these parameters can be in any order and there can be additional parameters not related that need to be ignored. And that the URL parameter values cannot be mixed, ie. para_a=1&para_b=2 is presumably a "noindex" situation.

  1. Set the env var ROBOTS_INDEX when the URLs you want indexed are requested. Note that these mod_rewrite directives must go before the WordPress front-controller. ie. before the # BEGIN WordPress section.

    # INDEXABLE: Any request that does not include a query string
    # Includes /example-page (no query string at all)
    RewriteCond %{QUERY_STRING} ^$
    RewriteRule ^ - [E=ROBOTS_INDEX:1]
    
    # INDEXABLE: /example-page?para_a=1&para_b=1 (parameters in any order)
    RewriteCond %{QUERY_STRING} (^|&)para_a=1($|&)
    RewriteCond %{QUERY_STRING} (^|&)para_b=1($|&)
    RewriteRule ^example-page$ - [E=ROBOTS_INDEX:1]
    
    # INDEXABLE: /example-page?para_a=2&para_b=2 (parameters in any order)
    RewriteCond %{QUERY_STRING} (^|&)para_a=2($|&)
    RewriteCond %{QUERY_STRING} (^|&)para_b=2($|&)
    RewriteRule ^example-page$ - [E=ROBOTS_INDEX:1]
    
  2. Conditionally set the X-Robots-Tag header when the env var is not set. Note the ! negation prefix on the env var.

    Header set X-Robots-Tag "noindex" env=!ROBOTS_INDEX
    

However, I do feel there is a better "WordPress" way of doing this, without using .htaccess?

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745105766a285280.html

最新回复(0)