Please roll out the fanfare as Latte 3 enters the scene with a completely rewritten compiler. This new version represents the biggest developmental leap in Nette's history.
Why Latte, Exactly?
Latte has an intriguing history. Originally, it wasn’t meant to be taken seriously. In fact, it was supposed to demonstrate that no templating system was needed in PHP. It was tightly integrated with presenters in Nette, but it wasn’t enabled by default and programmers had to activate it using its then-awkward name, CurlyBracketsFilter.
The turning point came with the idea that a templating system could actually understand HTML pages. Let me explain. For other templating systems, the text around tags is just noise without any meaning. Whether it's an HTML page, CSS style, or even text in Markdown, the templating engine only sees a cluster of bytes. Latte, on the other hand, understands the document. This brings many significant advantages, from convenience features like n:attributes to ultimate security.
Latte knows which escaping function to use (something most programmers don’t know, but thanks to Latte, it doesn’t matter and doesn’t create a security hole like Cross-site scripting). It prevents printing strings that could be dangerous in certain contexts. It can even prevent misinterpretation of mustache brackets by a frontend framework. And security experts will have nothing to complain about :)
I wouldn't have expected this idea to put Latte a decade ahead of other systems, as to this day I only know of two that work this way. Besides Latte, there’s Google's Soy. Latte and Soy are the only truly secure templating systems for the web. (Although Soy only has the escaping feature from the mentioned perks.)
Another key feature of Latte is that for expressions within tags (sometimes
referred to as macros), it uses PHP. Thus, the syntax is familiar to the
programmer. Developers don’t need to learn a new language. They don’t need
to figure out how this or that is written in Latte. They just write it as they
know how. By contrast, the popular templating system Twig uses Python syntax,
where even basic constructs are written differently. For example,
foreach ($people as $person)
is written as
for person in people
in Python (and thus in Twig), which
unnecessarily forces the brain to switch between two opposing conventions.
Thus, Latte adds so much value compared to its competitors that it makes sense to invest effort in its maintenance and development.
Current Compiler
Latte and its syntax were created 14 years ago (2008), with the current compiler following three years later. It already knew everything essential that is still used today, including blocks, inheritance, snippets, etc.
The compiler operated in a single-pass mode, meaning it parsed the template
and directly transformed it into PHP code, which was compiled into the final
file. The PHP language used in the tags (i.e., in macros) was tokenized and then
underwent several processes that modified the tokens. One process added
quotation marks around identifiers, another added syntactic perks that PHP did
not know at the time (such as array writing with []
instead of
array()
, nullsafe operators ?->
) or that are still
unknown (short ternary operator, filters
($var|upper|truncate)
, etc).
These processes did not check PHP syntax or used constructions. This changed dramatically two years ago (2020) with the introduction of sandbox mode. Sandbox searches for possible function and method calls in tokens and modifies them, which is not simple. Any failure here is essentially a security flaw.
New Compiler
In the eleven years since Latte was developed, there were situations where the single-pass compiler was insufficient (such as when including a block that was not yet defined). While all issues could be resolved, it would be ideal to switch to a two-step compilation, first parsing the template into an intermediate form, the AST tree, and then generating class code from it.
Also, with the gradual improvement of the PHPlike language used in the tags, the representation in tokens was no longer sufficient, and it would be ideal to parse it into an AST tree as well. Programming a sandbox over an AST tree is significantly easier and guarantees that it will be truly bullet
proof.
It took me five years to get started with rewriting the compiler because I knew it would be extremely challenging. The mere tokenization of the template is a challenge, as it must run parallel to parsing. The parser must be able to influence the tokenization, for example, when it encounters the attribute n:syntax=off.
Support for parallel execution of two codes is brought by Fibers in PHP 8.1, however, Latte does not yet use them to be compatible with PHP 8.0. Instead, it uses similar coroutines (you won’t find documentation about them in PHP documentation, so here’s a link to Generator RFC). Under the hood, Latte performs magic.
However, writing a lexer and parser for a language as complex as the PHP dialect used in the tags seemed even more challenging. Essentially, it meant creating something like nikic/PHP-Parser for Latte. And also the need to formalize the grammar of this language.
Today I can say that I've managed to complete everything. Latte has the compiler I've long wished for. And not a single line of code from the original remains 🙂
Leave a comment