Na navigaci | Klávesové zkratky

Are these URLs the same?

A question that many webmasters ask: do search engines perceive these URLs as the same? How should they be treated?

  • http://example.com/article
  • http://example.com/article/
  • http://example.com/Article
  • https://example.com/article
  • http://www.example.com/article
  • http://example.com/article?a=1&b=2
  • http://example.com/article?b=2&a=1

The short answer would be: “URLs are different.” However, a more detailed analysis is needed.

From a user's perspective, these addresses differ only in minor details which they generally disregard. Thus, they perceive them as the same, although technically, they are different addresses. Let's call them similar addresses. For the sake of “user experience”, two principles should be adhered to:

  1. Do not allow different content on similar addresses. As I will show soon, this would not only confuse users but also search engines.
  2. Allow users access through similar addresses.

If the addresses differ in protocol http / https or with www domain or without, search engines consider them different. Not so for users. It would be a fatal mistake to place different content on such similar addresses. However, it would also be a mistake to prevent access through a similar address. The address with www and without www must both function, with SEO recommending sticking to one variant and redirecting the others to it using a 301 HTTP code. This can be managed for the www subdomain with a .htaccess file:

# redirection to the non-www variant
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule ^.*$   http://example.com/$0  [R=301,NE,L]

# redirection to the www variant
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^.*$   http://www.example.com/$0  [R=301,NE,L]

Immediately test whether your servers redirect, including the full address and correct parameter passing. Don't forget variants like www.subdomain.example.cz. Because some browsers can bypass missing redirections, try a low-level service like Web-Sniffer.

URLs are case-sensitive except for the scheme and domain. However, users do not differentiate and therefore, it is unfortunate to offer different content on addresses differing only by letter case. A poor example can be seen in Wikipedia:

Bing amusingly suffers from an error, returning the same URL whether you search for acid or a database (although the textual description is correct). Google and Yahoo do not have this issue.

Bing does not differentiate between acid and database

Some services (webmails, ICQ) convert uppercase letters in URLs to lowercase, which are all reasons to avoid distinguishing letter size, even in parameters. Better adhere to the convention that all letters in URLs should be lowercase.

Distinguishing some similar addresses is also a challenge for search engines. I conducted an experiment by placing different content on URLs differing in details like the presence of a trailing slash or parameter order. Only Google was able to index them as different. Other search engines could always handle only one of the variants.

Only Google can index these pages as different

As for trailing slashes, the web server usually redirects to the canonical form for you; if you access a directory without a trailing slash, it adds one and redirects. Of course, this does not apply when you manage URIs on your own (Cool URIs, etc.)

Finally: does the order of parameters really matter? There should be no difference between article?a=1&b=2 and article?b=2&a=1. However, there are situations where this is not the case, especially when passing complex structures such as arrays. For instance, ?sort[]=name&sort[]=city might be different from ?sort[]=city&sort[]=name. Nevertheless, redirecting if parameters are not in the specified order would be considered unnecessary overcorrection.

p.s. Nette Framework automatically handles redirection to canonical URLs on its own.

You might be interested in


phpFashion © 2004, 2024 David Grudl | o blogu

Ukázky zdrojových kódů smíte používat s uvedením autora a URL tohoto webu bez dalších omezení.