Are these URLs the same?
A question that many webmasters ask: do search engines perceive these URLs as the same? How should they be treated?
http://example.com/article
http://example.com/article/
http://example.com/Article
https://example.com/article
http://www.example.com/article
http://example.com/article?a=1&b=2
http://example.com/article?b=2&a=1
The short answer would be: “URLs are different.” However, a more detailed analysis is needed.
From a user's perspective, these addresses differ only in minor details which they generally disregard. Thus, they perceive them as the same, although technically, they are different addresses. Let's call them similar addresses. For the sake of “user experience”, two principles should be adhered to:
- Do not allow different content on similar addresses. As I will show soon, this would not only confuse users but also search engines.
- Allow users access through similar addresses.
If the addresses differ in protocol http
/ https
or
with www
domain or without, search engines consider them different.
Not so for users. It would be a fatal mistake to place different content on such
similar addresses. However, it would also be a mistake to prevent access through
a similar address. The address with www
and without
www
must both function, with SEO recommending sticking to one
variant and redirecting the others to it using a 301 HTTP code. This can be
managed for the www
subdomain with a
.htaccess
file:
# redirection to the non-www variant RewriteCond %{HTTP_HOST} ^www\. RewriteRule ^.*$ http://example.com/$0 [R=301,NE,L] # redirection to the www variant RewriteCond %{HTTP_HOST} !^www\. RewriteRule ^.*$ http://www.example.com/$0 [R=301,NE,L]
Immediately test whether your servers redirect, including the full address
and correct parameter passing.
Don't forget variants like www.subdomain.example.cz
. Because some
browsers can bypass missing redirections, try a low-level service like Web-Sniffer.
URLs are case-sensitive except for the scheme and domain. However, users do not differentiate and therefore, it is unfortunate to offer different content on addresses differing only by letter case. A poor example can be seen in Wikipedia:
- http://en.wikipedia.org/wiki/Acid about acids
- http://en.wikipedia.org/wiki/ACID about database transactions
Bing amusingly suffers from an error, returning the same URL whether you search for acid or a database (although the textual description is correct). Google and Yahoo do not have this issue.
Bing does not differentiate between acid and database
Some services (webmails, ICQ) convert uppercase letters in URLs to lowercase, which are all reasons to avoid distinguishing letter size, even in parameters. Better adhere to the convention that all letters in URLs should be lowercase.
Distinguishing some similar addresses is also a challenge for search engines. I conducted an experiment by placing different content on URLs differing in details like the presence of a trailing slash or parameter order. Only Google was able to index them as different. Other search engines could always handle only one of the variants.
Only Google can index these pages as different
As for trailing slashes, the web server usually redirects to the canonical form for you; if you access a directory without a trailing slash, it adds one and redirects. Of course, this does not apply when you manage URIs on your own (Cool URIs, etc.)
Finally: does the order of parameters really matter? There should be no
difference between article?a=1&b=2
and
article?b=2&a=1
. However, there are situations where this is
not the case, especially when passing complex structures such as arrays. For
instance, ?sort[]=name&sort[]=city
might be different from
?sort[]=city&sort[]=name
. Nevertheless, redirecting if
parameters are not in the specified order would be considered unnecessary
overcorrection.
p.s. Nette Framework automatically handles redirection to canonical URLs on its own.