» Strana 2 » phpFashion

When Copilot Loses Direction: A Celebration of Shoddy Workmanship

A video from Microsoft, intended to be a dazzling demonstration of Copilot's capabilities, is instead a tragically comedic presentation of the decline in programming craftsmanship.

I'm referring to this video. It's supposed to showcase the abilities of GitHub Copilot, including how to use it to write a regular expression for searching <img> tags with the hero-image class. However, the original code being modified is as holey as Swiss cheese, something I would be embarrassed to use. Copilot gets carried away and instead of correcting, continues in the same vein.

The result is a regular expression that unintentionally matches other classes, tags, attributes, and so on. Worse still, it fails if the src attribute is listed before class.

I write about this because this demonstration of shoddy work, especially considering the official nature of the video, is startling. How is it possible that none of the presenters or their colleagues noticed this? Or did they notice and decide it didn't matter? That would be even more disheartening. Teaching programming requires precision and thoroughness, without which incorrect practices can easily be propagated. The video was meant to celebrate the art of programming, but I see in it a bleak example of how the level of programming craftsmanship is falling into the abyss of carelessness.

Just to give a bit of a positive spin: the video does a good job of showing how Copilot and GPT work, so you should definitely give it a look 🙂

First Steps in OOP in PHP: Essentials You Need to Know

Are you looking to dive into the world of Object-Oriented Programming in PHP but don't know where to start? I have for you a new concise guide to OOP that will introduce you to all the concepts like class, extends, private, etc.

In this guide, you will learn about:

This guide is not intended to make you a master of writing clean code or to provide exhaustive information. Its goal is to quickly familiarize you with the basic concepts of OOP in current PHP and to give you factually correct information. Thus, it provides a solid foundation on which you can further build, such as applications in Nette.

As further reading, I recommend the detailed guide to proper code design. It is beneficial even for those who are proficient in PHP and object-oriented programming.

Compilation errors in PHP: why are they still a problem?

Programming in PHP has always been a bit of a challenge, but fortunately, it has undergone many changes for the better. Do you remember the times before PHP 7, when almost every error meant a fatal error, instantly terminating the application? In practice, this meant that any error could completely stop the application without giving the programmer a chance to catch it and respond appropriately. Tools like Tracy used magical tricks to visualize and log such errors. Fortunately, with the arrival of PHP 7, this changed. Errors now throw exceptions like Error, TypeError, and ParseError, which can be easily caught and handled.

However, even in modern PHP, there is a weak spot where it behaves the same as in its fifth version. I am talking about errors during compilation. These cannot be caught and immediately lead to the termination of the application. They are E_COMPILE_ERROR level errors. PHP generates around two hundred of them. It creates a paradoxical situation where loading a file with a syntax error in PHP, such as a missing semicolon, throws a catchable ParseError exception. However, if the code is syntactically correct but contains a compilation-detectable error (like two methods with the same name), it results in a fatal error that cannot be caught.

try {
    require 'path_to_file.php';
} catch (ParseError $e) {
    echo "Syntactic error in PHP file";
}

Unfortunately, we cannot internally verify compilation errors in PHP. There was a function php_check_syntax(), which, despite its name, detected compilation errors as well. It was introduced in PHP 5.0.0 but quickly removed in version 5.0.4 and has never been replaced since. To verify the correctness of the code, we must rely on a command-line linter:

php -l file.php

From the PHP environment, you can verify code stored in the variable $code like this:

$code = '... PHP code to verify ...';
$process = proc_open(
    PHP_BINARY . ' -l',
    [['pipe', 'r'], ['pipe', 'w'], ['pipe', 'w']],
    $pipes,
    null,
    null,
    ['bypass_shell' => true],
);
fwrite($pipes[0], $code);
fclose($pipes[0]);
$error = stream_get_contents($pipes[1]);
if (proc_close($process) !== 0) {
    echo 'Error in PHP file: ' . $error;
}

However, the overhead of running an external PHP process to verify one file is quite large. But good news comes with PHP version 8.3, which will allow verifying multiple files at once:

php -l file1.php file2.php file3.php

Why is the operator ?? sheer misfortune?

PHP users have been waiting for the ?? operator for an incredibly long time, perhaps ten years. Today, I regret that it took longer.

Wait, what? Ten years? You're exaggerating, aren't you?

Really. Discussion started in 2004 under the name “ifsetor”. And it didn't make it into PHP until December 2015 in version 7.0. So almost 12 years.

Aha! Oh, man.

It's a pity we didn't wait longer. Because it doesn't fit into the current PHP.

PHP has made an incredible shift towards strictness since 7.0. Key moments:

introducing scalar types (which were approved just barely)
nullity (since 7.1, null became important, replacing the earlier return false)
sanity comparisons (since 8.0, I don't understand how Nikic pushed this through so easily, JavaScript must be envious)
abolish dynamic properites (the definitive end of old PHP)

The ?? operator simplified the annoying:

isset($somethingI[$haveToWriteTwice]) ? $somethingI[$haveToWriteTwice] : 'default value'

to just:

$write[$once] ?? 'default value'

But it did this at a time when the need to use isset() has greatly diminished. Today, we more often assume that the data we access exists. And if they don't exist, we damn well want to know about it.

But the ?? operator has the side effect of being able to detect null. Which is also the most common reason to use it:

$len = $this->length ?? 'default value'

Unfortunately, it also hides errors. It hides typos:

// always returns 'default value', do you know why?
$len = $this->lenght ?? 'default value'

In short, we got ?? at the exact moment when, on the contrary, we would most need to shorten this:

`php
$somethingI[$haveToWriteTwice] === null
? ‘default value’
: $somethingI[$haveToWriteTwice]
`

It would be wonderful if PHP 9.0 had the courage to modify the behavior of the ?? operator to be a bit more strict. Make the “isset operator” really a “null coalesce operator”, as it is officially called by the way.

PHPStan and checkDynamicProperties: true helps you to detect typos suppressed by the ?? operator.

Tabs Instead of Spaces as a Courtesy

You've probably encountered the “tabs vs. spaces” debate for indentation before. This argument has been around for ages, and both sides present their reasons:

Tabs:

Indenting is their purpose
Smaller files, as indentation takes up one character
You can set your own indentation width (more on this later)

Spaces:

Code will look the same everywhere, and consistency is key
Avoid potential issues in environments sensitive to whitespace

But what if it's about more than personal preference? ChaseMoskal recently posted a thought-provoking entry on Reddit titled Nobody talks about the real reason to use tabs instead of spaces that might open your eyes.

The Main Reason to Use Tabs

Chase describes his experience with implementing spaces at his workplace and the negative impacts it had on colleagues with visual impairments.

One of them was accustomed to using a tab width of 1 to avoid large indentations when using large fonts. Another uses a tab width of 8 because it suits him best on an ultra-wide monitor. For both, however, code with spaces poses a serious problem, requiring them to convert spaces to tabs before reading and back to spaces before committing.

For blind programmers who use Braille displays, each space represents one Braille cell. Therefore, if the default indentation is 4 spaces, a third-level indentation wastes 12 precious Braille cells even before the start of the code. On a 40-cell display, which is most commonly used with laptops, this is more than a quarter of the available cells, wasted without conveying any information.

Adjusting the width of indentation may seem trivial to us, but for some programmers, it is absolutely essential. And that’s something we simply cannot ignore.

By using tabs in our projects, we give them the opportunity for this adjustment.

Accessibility First, Then Personal Preference

Sure, not everyone can be persuaded to choose one side over the other when it comes to preferences. Everyone has their own. And we should appreciate the option to choose.

However, we must ensure that we consider everyone. We should respect differences and use accessible means. Like the tab character, for instance.

I think Chase put it perfectly when he mentioned in his post that “…there is no counterargument that comes close to outweighing the accessibility needs of our colleagues.”

Accessible First

Just as the “mobile first” methodology has become popular in web design, where we ensure that everyone, regardless of device, has a great user experience with your product – we should strive for an “accessible first” environment by ensuring that everyone has the same opportunity to work with code, whether in employment or on an open-source project.

If tabs become the default choice for indentation, we remove one barrier. Collaboration will then be pleasant for everyone, regardless of their abilities. If everyone has the same opportunities, we can fully utilize our collective potential ❤️

This article is based on Default to tabs instead of spaces for an ‘accessible first’ environment. I read a similarly convincing post in 2008 and changed from spaces to tabs in all my projects that very day. It left a trace in Git, but the article itself has disappeared into the annals of history.

Add the `{texy}` Tag to Latte

As of version 3.1.6, the Texy library adds support for Latte 3 in the form of the {texy} tag. What can it do and how do you deploy it?

The {texy} tag represents an easy way to write directly in Texy syntax in Latte templates:

{texy}
You Already Know the Syntax
----------

No kidding, you know Latte syntax already. **It is the same as PHP syntax.**
{/texy}

Simply install the extension in Latte and pass it a Texy object configured as needed:

$texy = new Texy\Texy;
$latte = new Latte\Engine;
$latte->addExtension(new Texy\Bridges\Latte\TexyExtension($texy));

If there is static text between the {texy}...{/texy} tags, it is translated using Texy during the template compilation and the result is stored in it. If the content is dynamic (i.e., there are Latte tags inside), the processing using Texy is performed each time the template is rendered.

If it is desirable to disable Latte tags inside, it can be done like this:

{texy syntax: off} ... {/texy}

In addition to the Texy object, a custom function can also be passed to the extension, thus allowing parameters to be passed from the template. For instance, we might want to pass the parameters locale and heading:

$processor = function (string $text, int $heading = 1, string $locale = 'cs'): string {
	$texy = new Texy\Texy;
	$texy->headingModule->top = $heading;
	$texy->typographyModule->locale = $locale;
	return $texy->process($text);
};

$latte = new Latte\Engine;
$latte->addExtension(new Texy\Bridges\Latte\TexyExtension($processor));

Parameters in the template are passed like this:

{texy locale: en, heading: 3}
...
{/texy}

If you want to format text stored in a variable using Texy, you can use a filter:

{$description|texy}

Latte 3: The Biggest Leap in Nette's History

Please roll out the fanfare as Latte 3 enters the scene with a completely rewritten compiler. This new version represents the biggest developmental leap in Nette's history.

Why Latte, Exactly?

Latte has an intriguing history. Originally, it wasn’t meant to be taken seriously. In fact, it was supposed to demonstrate that no templating system was needed in PHP. It was tightly integrated with presenters in Nette, but it wasn’t enabled by default and programmers had to activate it using its then-awkward name, CurlyBracketsFilter.

The turning point came with the idea that a templating system could actually understand HTML pages. Let me explain. For other templating systems, the text around tags is just noise without any meaning. Whether it's an HTML page, CSS style, or even text in Markdown, the templating engine only sees a cluster of bytes. Latte, on the other hand, understands the document. This brings many significant advantages, from convenience features like n:attributes to ultimate security.

Latte knows which escaping function to use (something most programmers don’t know, but thanks to Latte, it doesn’t matter and doesn’t create a security hole like Cross-site scripting). It prevents printing strings that could be dangerous in certain contexts. It can even prevent misinterpretation of mustache brackets by a frontend framework. And security experts will have nothing to complain about :)

I wouldn't have expected this idea to put Latte a decade ahead of other systems, as to this day I only know of two that work this way. Besides Latte, there’s Google's Soy. Latte and Soy are the only truly secure templating systems for the web. (Although Soy only has the escaping feature from the mentioned perks.)

Another key feature of Latte is that for expressions within tags (sometimes referred to as macros), it uses PHP. Thus, the syntax is familiar to the programmer. Developers don’t need to learn a new language. They don’t need to figure out how this or that is written in Latte. They just write it as they know how. By contrast, the popular templating system Twig uses Python syntax, where even basic constructs are written differently. For example, foreach ($people as $person) is written as for person in people in Python (and thus in Twig), which unnecessarily forces the brain to switch between two opposing conventions.

Thus, Latte adds so much value compared to its competitors that it makes sense to invest effort in its maintenance and development.

Current Compiler

Latte and its syntax were created 14 years ago (2008), with the current compiler following three years later. It already knew everything essential that is still used today, including blocks, inheritance, snippets, etc.

The compiler operated in a single-pass mode, meaning it parsed the template and directly transformed it into PHP code, which was compiled into the final file. The PHP language used in the tags (i.e., in macros) was tokenized and then underwent several processes that modified the tokens. One process added quotation marks around identifiers, another added syntactic perks that PHP did not know at the time (such as array writing with [] instead of array(), nullsafe operators ?->) or that are still unknown (short ternary operator, filters ($var|upper|truncate), etc).

These processes did not check PHP syntax or used constructions. This changed dramatically two years ago (2020) with the introduction of sandbox mode. Sandbox searches for possible function and method calls in tokens and modifies them, which is not simple. Any failure here is essentially a security flaw.

New Compiler

In the eleven years since Latte was developed, there were situations where the single-pass compiler was insufficient (such as when including a block that was not yet defined). While all issues could be resolved, it would be ideal to switch to a two-step compilation, first parsing the template into an intermediate form, the AST tree, and then generating class code from it.

Also, with the gradual improvement of the PHPlike language used in the tags, the representation in tokens was no longer sufficient, and it would be ideal to parse it into an AST tree as well. Programming a sandbox over an AST tree is significantly easier and guarantees that it will be truly bullet

proof.

It took me five years to get started with rewriting the compiler because I knew it would be extremely challenging. The mere tokenization of the template is a challenge, as it must run parallel to parsing. The parser must be able to influence the tokenization, for example, when it encounters the attribute n:syntax=off.

Support for parallel execution of two codes is brought by Fibers in PHP 8.1, however, Latte does not yet use them to be compatible with PHP 8.0. Instead, it uses similar coroutines (you won’t find documentation about them in PHP documentation, so here’s a link to Generator RFC). Under the hood, Latte performs magic.

However, writing a lexer and parser for a language as complex as the PHP dialect used in the tags seemed even more challenging. Essentially, it meant creating something like nikic/PHP-Parser for Latte. And also the need to formalize the grammar of this language.

Today I can say that I've managed to complete everything. Latte has the compiler I've long wished for. And not a single line of code from the original remains 🙂

Are You Just Following a Cargo Cult?

Many years ago, I realized that when I used a variable containing a predefined data table in a PHP function, the array had to be “recreated” each time the function was called, which was surprisingly slow. For example:

function isSpecialName(string $name): bool
{
    $specialNames = ['foo' => 1, 'bar' => 1, 'baz' => 1, ...];
    return isset($specialNames[$name]);
}

Then I discovered a simple trick that prevented the array from being recreated. It was enough to define the variable as static:

function isSpecialName(string $name): bool
{
    static $specialNames = ['foo' => 1, 'bar' => 1, 'baz' => 1, ...];
    return isset($specialNames[$name]);
}

The speed-up, if the array was a bit larger, was several orders of magnitude (like 500×).

Since then, I have always used static for constant arrays. It's possible that others followed this habit without knowing the real reason behind it, but I can't be sure.

A few weeks ago, I wrote a class that held large tables of predefined data in several properties. I realized that this would slow down the creation of instances, meaning the new operator would “recreate” the arrays each time, which is slow as we know. Therefore, I had to change the properties to static, or perhaps even better, use constants.

Then I asked myself: Hey, are you just following a cargo cult? Is it still true that without static it is slow?

It's hard to say, PHP has undergone revolutionary development and old truths may no longer be valid. I prepared a test sample and did a few measurements. Of course, I confirmed that in PHP 5, using static inside a function or with properties significantly sped things up by several orders of magnitude. However, note that in PHP 7.0, it was only by one order of magnitude. Excellent, a sign of optimizations in the new core, but the difference is still substantial. Yet, with further PHP versions, the difference continued to decrease and eventually nearly disappeared.

I even found that using static inside a function in PHP 7.1 and 7.2 actually slowed down the execution by about 1.5–2×, which in terms of the orders of magnitude we are discussing, is negligible, but it was an interesting paradox. From PHP 7.3, the difference disappeared completely.

Habits are a good thing, but it is necessary to validate their meaning continuously.

I will no longer use unnecessary static within function bodies. However, for that class holding large tables of predefined data in properties, I thought it was programmatically correct to use constants. Soon, I had the refactoring done, but even as it was being created, I lamented how ugly the code was becoming. Instead of $this->ruleToNonTerminal or $this->actionLength, the code now contained the screaming $this::RULE_TO_NON_TERMINAL and $this::ACTION_LENGTH, which looked really ugly. A stale whiff from the seventies.

I even hesitated, wondering if I even wanted to look at such ugly code, and whether I might prefer to stick with variables, or static variables.

And then it hit me: Hey, are you just following a cargo cult?

Of course, I am. Why should a constant shout? Why should it draw attention to itself in the code, be a protruding element in the flow of the program? The fact that the structure is read-only is not a reason FOR STUCK CAPSLOCK, AGGRESSIVE TONE, AND WORSE READABILITY.

THE TRADITION OF UPPERCASE LETTERS COMES FROM THE C LANGUAGE, WHERE MACRO CONSTANTS FOR THE PREPROCESSOR WERE MARKED IN THIS WAY. IT WAS USEFUL TO UNMISTAKABLY DISTINGUISH CODE FOR THE PARSER FROM CODE FOR THE PREPROCESSOR. IN PHP, NO PREPROCESSORS WERE EVER USED, SO THERE IS NO REASON to write constants in uppercase letters.

That very evening, I removed them everywhere. And still couldn't understand why it hadn't occurred to me twenty years ago. The bigger the nonsense, the tougher its roots.

Should nullable types be written with or without a question mark?

I've always been bothered by any redundancy or duplication in code. I wrote about it many years ago. Looking at this code just makes me suffer:

interface ContainerAwareInterface
{
    /**
     * Sets the container.
     */
    public function setContainer(ContainerInterface $container = null);
}

Let's set aside the unnecessary commentary on the method for now. And this time also the misunderstanding of dependency injection, if a library needs such an interface. The fact that using the word Interface in the name of an interface is, in turn, a sign of not understanding object-oriented programming, I'm planning a separate article on that. After all, I've been there myself.

But why on earth specify the visibility as public? It's a pleonasm. If it wasn't public, then it wouldn't be an interface, right? And then someone thought to make it a “standard” ?‍♂️

Sorry for the long introduction, what I'm getting to is whether to write optional nullable types with or without a question mark. So:

// without
function setContainer(ContainerInterface $container = null);
// with
function setContainer(?ContainerInterface $container = null);

Personally, I have always leaned towards the first option, because the information given by the question mark is redundant (yes, both notations mean the same from the language's perspective). This is how all the code was written until the arrival of PHP 7.1, the version that added the question mark, and there would have to be a good reason to change it suddenly.

With the arrival of PHP 8.0, I changed my mind and I'll explain why. The question mark is not optional in the case of properties. PHP will throw an error in this case:

class Foo
{
	private Bar $foo = null;
}
// Fatal error: Default value for property of type Bar may not be null.
// Use the nullable type ?Bar to allow null default value

And from PHP 8.0 you can use promoted properties, which allows you to write code like this:

class Foo
{
	public function __construct(
		private ?Bar $foo = null,
		string $name = null,
	) {
		// ...
	}
}

Here you can see the inconsistency. If ?Bar is used (which is necessary), then ?string should follow on the next line. And if I use the question mark in some cases, I should use it in all cases.

The question remains whether it is better to use a union type string|null instead of a question mark. For example, if I wanted to write Stringable|string|null, maybe the version with a question mark isn't at all necessary.

Update: It looks like PHP 8.4 will require the notation with a question mark.

Which Framework Has the Best Documentation?

I was curious about which PHP framework has the best documentation and how Nette ranks among them. But how can you find out?

We all know that the worst scenario is having no documentation at all, followed by inadequate documentation. The opposite is extensive documentation. It seems, therefore, that the sheer volume of documentation is an important indicator. Of course, its understandability and currency, as well as readability and accuracy, play a huge role. These factors are very difficult to measure. However, I know from my own experience how many sections of Nette's documentation I have rewritten multiple times to make them clearer, and how many corrections I have merged, and I assume this happens with any long-standing framework. Thus, it appears that all documentation gradually converges towards a similar high quality. Therefore, I allow myself to take the sheer volume of data as a guide, though it is an oversimplification.

Of course, the volume of documentation must be proportional to the size of the library itself. Some are significantly larger than others and should accordingly have significantly more documentation. For simplicity, I determine the size of the library by the volume of PHP code, normalized for white space and excluding comments.

I created a chart showing the ratio of English documentation to code for well-known frameworks CakePHP (4.2), CodeIgniter (3.1), Laravel (8.62), Nette (3.1), Symfony (5.4), YII (2.0), and Zend Framework (2.x, no longer in development):

As you can see from the chart, the extent of documentation relative to the code is more or less similar across all frameworks.

CodeIgniter stands out. I tip my hat to CakePHP and YII, which strive to maintain documentation in a range of other languages. The comprehensiveness of Nette's documentation is above average. Additionally, Nette is the only framework that has a 1:1 translation in our native language.

The purpose of the chart is NOT to show that one framework has so many percent more comprehensive documentation than another. The metric is too primitive for that. Instead, the purpose is to show that the extent of documentation among the various frameworks is largely comparable. I created it mainly for myself, to get an idea of how Nette's documentation compares to its competitors.

Originally published in August 2019, data updated for October 2021.

novější články starší články