Rubrika HTML & co

Rubrika HTML & co

SASS, LESS, Stylus or pure CSS? (3)

Journey into the heart of the three most known CSS preprocessors continues, though not in the way I originally planned.

CSS preprocessor is a tool that take code written in their own syntax and generates the CSS for the browser. The most popular preprocessors are SASS, LESS and Stylus. We have talked about installation and syntax + mixins. All three preprocessors have a fundamentally different way of mixins conception.

Each of them have gallery of finished mixins: For SASS there is a comprehensive Compass, the LESS has framework Twitter Bootstrap or small Elements a Stylus NIB.

… this was opening sentences of article I started write year and quarter ago and never finished. I came to the conclusion that all three preprocessors are useless. They required to do so many compromises that potential benefits seemed insignificant. Today I will explain it.

…pokračování

Are these URLs the same?

A question that many webmasters ask: do search engines perceive these URLs as the same? How should they be treated?

http://example.com/article
http://example.com/article/
http://example.com/Article
https://example.com/article
http://www.example.com/article
http://example.com/article?a=1&b=2
http://example.com/article?b=2&a=1

The short answer would be: “URLs are different.” However, a more detailed analysis is needed.

From a user's perspective, these addresses differ only in minor details which they generally disregard. Thus, they perceive them as the same, although technically, they are different addresses. Let's call them similar addresses. For the sake of “user experience”, two principles should be adhered to:

Do not allow different content on similar addresses. As I will show soon, this would not only confuse users but also search engines.
Allow users access through similar addresses.

If the addresses differ in protocol http / https or with www domain or without, search engines consider them different. Not so for users. It would be a fatal mistake to place different content on such similar addresses. However, it would also be a mistake to prevent access through a similar address. The address with www and without www must both function, with SEO recommending sticking to one variant and redirecting the others to it using a 301 HTTP code. This can be managed for the www subdomain with a .htaccess file:

# redirection to the non-www variant
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule ^.*$   http://example.com/$0  [R=301,NE,L]

# redirection to the www variant
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^.*$   http://www.example.com/$0  [R=301,NE,L]

Immediately test whether your servers redirect, including the full address and correct parameter passing. Don't forget variants like www.subdomain.example.cz. Because some browsers can bypass missing redirections, try a low-level service like Web-Sniffer.

URLs are case-sensitive except for the scheme and domain. However, users do not differentiate and therefore, it is unfortunate to offer different content on addresses differing only by letter case. A poor example can be seen in Wikipedia:

http://en.wikipedia.org/wiki/Acid about acids
http://en.wikipedia.org/wiki/ACID about database transactions

Bing amusingly suffers from an error, returning the same URL whether you search for acid or a database (although the textual description is correct). Google and Yahoo do not have this issue.

Bing does not differentiate between acid and database

Some services (webmails, ICQ) convert uppercase letters in URLs to lowercase, which are all reasons to avoid distinguishing letter size, even in parameters. Better adhere to the convention that all letters in URLs should be lowercase.

Distinguishing some similar addresses is also a challenge for search engines. I conducted an experiment by placing different content on URLs differing in details like the presence of a trailing slash or parameter order. Only Google was able to index them as different. Other search engines could always handle only one of the variants.

Only Google can index these pages as different

As for trailing slashes, the web server usually redirects to the canonical form for you; if you access a directory without a trailing slash, it adds one and redirects. Of course, this does not apply when you manage URIs on your own (Cool URIs, etc.)

Finally: does the order of parameters really matter? There should be no difference between article?a=1&b=2 and article?b=2&a=1. However, there are situations where this is not the case, especially when passing complex structures such as arrays. For instance, ?sort[]=name&sort[]=city might be different from ?sort[]=city&sort[]=name. Nevertheless, redirecting if parameters are not in the specified order would be considered unnecessary overcorrection.

p.s. Nette Framework automatically handles redirection to canonical URLs on its own.

Finally, the Truth About XHTML and HTML

I recently participated in a discussion that reminded me (again) of the deeply entrenched myths regarding the differences between HTML and XHTML. The campaign for the formats with the letter “X” was accompanied by great emotions, which usually do not go hand in hand with a clear head. Although the enthusiasm has long since faded, a significant part of the professional community and authors still believe a number of misconceptions.

In this article, I will attempt to bury the biggest of these myths in the following way. This article will contain only facts. I will save my opinions and your comments for a second article.

In the text below, by HTML I mean the version HTML 4.01, and by XHTML I mean the version XHTML 1.0 Second Edition. For completeness, I add that HTML is an application of the SGML language, while XHTML is an application of the XML language.

Myth: HTML allows tag crossing

Not at all. Tag crossing is directly prohibited in SGML, and consequently in HTML. This fact is mentioned, for example, in the W3C recommendation: “…overlapping is illegal in SGML…”. All these markup languages perceive the document as a tree structure, and therefore it is not possible to cross tags.

I am also responding to a reformulation of the myth: “The advantage of XHTML is the prohibition of crossing tags.” This is not the case; tags cannot be crossed in any existing version of HTML or XHTML.

Myth: XHTML banned presentation elements and introduced CSS

Not at all. XHTML contains the same sort of elements as HTML 4.01. This is mentioned right in the first paragraph of the XHTML specification: “The meaning of elements and their attributes is defined in the W3C recommendation for HTML 4.” From this perspective, there is no difference between XHTML and HTML.

Some elements and attributes were deprecated already in HTML 4.01. Presentation elements are forbidden in favor of CSS, which also answers the second part of the myth: the arrival of cascading styles with XHTML is unrelated, having occurred earlier.

Myth: HTML parser must guess tag endings

Not at all. In HTML, for a defined group of elements, the ending or starting tag can optionally be omitted. This is for elements where omitting the tag cannot cause ambiguity. As an example, take the ending tag for the p element. Since the standard states that a paragraph cannot be inside another paragraph, it is clear by writing…

<p>....
<p>....

…that by opening the second paragraph, the first must close. Therefore, stating the ending tag is redundant. However, for example, the div element can be nested within itself, so both the starting and ending tags are required.

Myth: HTML attribute notation is ambiguous

Not at all. XHTML always requires enclosing attribute values in quotes or apostrophes. HTML also requires this, except if the value consists of an alphanumeric string. For completeness, I add that even in these cases, the specification recommends using quotes.

Thus, in HTML it is permissible to write <textarea cols=20 rows=30>, which is formally as unambiguous as <textarea cols="20" rows="30">. If the value contained multiple words, HTML insists on using quotes.

Myth: HTML document is ambiguous

Not at all. The reasons given for ambiguity are either the possibility of crossing tags, ambiguity in writing attributes without quotes, which are already debunked myths, or also the possibility of omitting some tags. Here I repeat that the group of elements where tags can be omitted is chosen so as to omit only redundant information.

Thus, an HTML document is always unambiguously determined.

Myth: Only in XHTML is the ‘&’ character written as ‘&’

Not at all – it must also be written that way in HTML. For both languages, the characters < and & have a specific meaning. The first opens a tag and the second an entity. To prevent them from being understood in their meta-meaning, they must be written as an entity. Thus also in HTML, as stated by the specification.

Myth: HTML allows ‘messes’ that would not pass in XHTML

Not at all. This view is rooted in a series of myths that I have already refuted above. I haven't yet mentioned that XHTML, unlike HTML, is case sensitive for element and attribute names. However, this is a completely legitimate feature of the language. In this way, Visual Basic differs from C#, and it cannot objectively be said that one or the other approach is worse. HTML code can be made confusing by inappropriately mixing upper and lower case (<tAbLe>), XML code can also be confusing by using strings like id, ID, Id for different attributes.

The clarity of the notation in no way relates to the choice of one language over the other.

Myth: Parsing XHTML is much easier

Not at all. Comparing them would be subjective and therefore has no place in this article, but objectively, there is no reason why one parser should have a significantly easier time. Each has its own set of challenges.

Parsing HTML is conditioned by the fact that the parser must know the document type definition. The first reason is the existence of optional tags. Although their addition is unambiguous (see above) and algorithmically easy to handle, the parser must know the respective definition. The second reason concerns empty elements. That an element is empty is known to the parser only from the definition.

Parsing XHTML is complicated by the fact that the document can (unlike HTML) contain an internal subset DTD with the definition of its own entities (see example). I add that an “entity” does not have to represent a single character, but any lengthy segment of XHTML code (possibly containing further entities). Without processing the DTD and verifying its correctness, we cannot talk about parsing XHTML. Furthermore, syntactically, DTD is essentially the opposite of XML language.

In summary: both HTML and XHTML parsers must know the document type definition. The XHTML parser additionally must be able to read it in DTD language.

Myth: Parsing XHTML is much faster

In terms of the syntactic similarity of both languages, the speed of parsing is only determined by the skill of the programmers of the individual parsers. The time required for machine processing of a typical web page (whether HTML or XHTML) on a regular computer is imperceptible to human perception.

Myth: HTML parser must always cope

Not at all. The HTML specification does not dictate how an application should behave in case of processing an erroneous document. Due to competitive pressures in the real world, browsers have become completely tolerant of faulty HTML documents.

It is different in the case of XHTML. The specification, by referring to XML dictates that the parser must not continue processing the logical structure of the document in case of an error. Again, due to competitive pressures in the real world, RSS readers have become tolerant of faulty XML documents (RSS is an application of XML, just like XHTML).

If we were to deduce something negative about HTML from the tolerance of web browsers, then we must necessarily deduce something negative about XML from the tolerance of RSS readers. Objectively, the draconian approach of XML to errors in documents is utopian.

Conclusion?

If your mind is no longer burdened by any of the myths mentioned above, you can better perceive the difference between HTML and XHTML. Or rather, you can better perceive that there is no difference. The real difference occurs a level higher: it is the departure from SGML and the transition to the new XML.

Unfortunately, it cannot be said that XML only solves the problems of SGML and adds no new ones. I have encountered two in this article alone. One of them is the draconian processing of errors in XML, which is not in line with practice, and the other is the existence of a different DTD language inside XML, which complicates parsing and the understandability of XML documents. Moreover, the expressive capability of this language is so small that it cannot formally cover even XHTML itself, so some features must be defined separately. For a language not bound by historical shackles, this is a sad and striking finding. However, criticism of XML is a topic for a separate article.

(If I encounter more myths, I will gradually update the article. If you want to refer to them, you can take advantage of the fact that each headline has its own ID)

The stretched buttons problem in IE

As you might know, web forms have to by styled with care, since their native look often is the best you can achieve.

That said, sometimes even default look has its bugs. A truly flagrant mistake concerns buttons in Internet Explorer (including version 7) in Windows XP. If the button's caption is too long, the browser produces such a nasty thing:

…pokračování

The magic with conditional comments

This articles is actually an answer to e-mail by Honza Bien, who was asking me about the manipulations I was doing with conditional coments. Say, generally accepted idea is, that one kind of a comments (downlevel-hidden) is valid and the other (downlevel-revealed) is not. I tried to adapt those invalid comments the way that they would be valid. I'll explain the whole sequence.

…pokračování

How to correctly insert a Flash into XHTML

Although flash is the most spread active element of webpages, a lot of designers still don't know the correct way to insert it into HTML document.. The standard concept, advertised by Macromedia is absolutely unusable.

…pokračování

phpFashion

SASS, LESS, Stylus or pure CSS? (3)

Are these URLs the same?

Finally, the Truth About XHTML and HTML

Myth: HTML allows tag crossing

Myth: XHTML banned presentation elements and introduced CSS

Myth: HTML parser must guess tag endings

Myth: HTML attribute notation is ambiguous

Myth: HTML document is ambiguous

Myth: Only in XHTML is the ‘&’ character written as ‘&’

Myth: HTML allows ‘messes’ that would not pass in XHTML

Myth: Parsing XHTML is much easier

Myth: Parsing XHTML is much faster

Myth: HTML parser must always cope

Conclusion?

The stretched buttons problem in IE

The magic with conditional comments

How to correctly insert a Flash into XHTML