phpFashion

Na navigaci | Klávesové zkratky

Are You Just Following a Cargo Cult?

Many years ago, I realized that when I used a variable containing a predefined data table in a PHP function, the array had to be “recreated” each time the function was called, which was surprisingly slow. For example:

function isSpecialName(string $name): bool
{
    $specialNames = ['foo' => 1, 'bar' => 1, 'baz' => 1, ...];
    return isset($specialNames[$name]);
}

Then I discovered a simple trick that prevented the array from being recreated. It was enough to define the variable as static:

function isSpecialName(string $name): bool
{
    static $specialNames = ['foo' => 1, 'bar' => 1, 'baz' => 1, ...];
    return isset($specialNames[$name]);
}

The speed-up, if the array was a bit larger, was several orders of magnitude (like 500×).

Since then, I have always used static for constant arrays. It's possible that others followed this habit without knowing the real reason behind it, but I can't be sure.


A few weeks ago, I wrote a class that held large tables of predefined data in several properties. I realized that this would slow down the creation of instances, meaning the new operator would “recreate” the arrays each time, which is slow as we know. Therefore, I had to change the properties to static, or perhaps even better, use constants.

Then I asked myself: Hey, are you just following a cargo cult? Is it still true that without static it is slow?

It's hard to say, PHP has undergone revolutionary development and old truths may no longer be valid. I prepared a test sample and did a few measurements. Of course, I confirmed that in PHP 5, using static inside a function or with properties significantly sped things up by several orders of magnitude. However, note that in PHP 7.0, it was only by one order of magnitude. Excellent, a sign of optimizations in the new core, but the difference is still substantial. Yet, with further PHP versions, the difference continued to decrease and eventually nearly disappeared.

I even found that using static inside a function in PHP 7.1 and 7.2 actually slowed down the execution by about 1.5–2×, which in terms of the orders of magnitude we are discussing, is negligible, but it was an interesting paradox. From PHP 7.3, the difference disappeared completely.

Habits are a good thing, but it is necessary to validate their meaning continuously.


I will no longer use unnecessary static within function bodies. However, for that class holding large tables of predefined data in properties, I thought it was programmatically correct to use constants. Soon, I had the refactoring done, but even as it was being created, I lamented how ugly the code was becoming. Instead of $this->ruleToNonTerminal or $this->actionLength, the code now contained the screaming $this::RULE_TO_NON_TERMINAL and $this::ACTION_LENGTH, which looked really ugly. A stale whiff from the seventies.

I even hesitated, wondering if I even wanted to look at such ugly code, and whether I might prefer to stick with variables, or static variables.

And then it hit me: Hey, are you just following a cargo cult?

Of course, I am. Why should a constant shout? Why should it draw attention to itself in the code, be a protruding element in the flow of the program? The fact that the structure is read-only is not a reason FOR STUCK CAPSLOCK, AGGRESSIVE TONE, AND WORSE READABILITY.

THE TRADITION OF UPPERCASE LETTERS COMES FROM THE C LANGUAGE, WHERE MACRO CONSTANTS FOR THE PREPROCESSOR WERE MARKED IN THIS WAY. IT WAS USEFUL TO UNMISTAKABLY DISTINGUISH CODE FOR THE PARSER FROM CODE FOR THE PREPROCESSOR. IN PHP, NO PREPROCESSORS WERE EVER USED, SO THERE IS NO REASON to write constants in uppercase letters.

That very evening, I removed them everywhere. And still couldn't understand why it hadn't occurred to me twenty years ago. The bigger the nonsense, the tougher its roots.


Should nullable types be written with or without a question mark?

I've always been bothered by any redundancy or duplication in code. I wrote about it many years ago. Looking at this code just makes me suffer:

interface ContainerAwareInterface
{
    /**
     * Sets the container.
     */
    public function setContainer(ContainerInterface $container = null);
}

Let's set aside the unnecessary commentary on the method for now. And this time also the misunderstanding of dependency injection, if a library needs such an interface. The fact that using the word Interface in the name of an interface is, in turn, a sign of not understanding object-oriented programming, I'm planning a separate article on that. After all, I've been there myself.

But why on earth specify the visibility as public? It's a pleonasm. If it wasn't public, then it wouldn't be an interface, right? And then someone thought to make it a “standard” ?‍♂️

Sorry for the long introduction, what I'm getting to is whether to write optional nullable types with or without a question mark. So:

// without
function setContainer(ContainerInterface $container = null);
// with
function setContainer(?ContainerInterface $container = null);

Personally, I have always leaned towards the first option, because the information given by the question mark is redundant (yes, both notations mean the same from the language's perspective). This is how all the code was written until the arrival of PHP 7.1, the version that added the question mark, and there would have to be a good reason to change it suddenly.

With the arrival of PHP 8.0, I changed my mind and I'll explain why. The question mark is not optional in the case of properties. PHP will throw an error in this case:

class Foo
{
	private Bar $foo = null;
}
// Fatal error: Default value for property of type Bar may not be null.
// Use the nullable type ?Bar to allow null default value

And from PHP 8.0 you can use promoted properties, which allows you to write code like this:

class Foo
{
	public function __construct(
		private ?Bar $foo = null,
		string $name = null,
	) {
		// ...
	}
}

Here you can see the inconsistency. If ?Bar is used (which is necessary), then ?string should follow on the next line. And if I use the question mark in some cases, I should use it in all cases.

The question remains whether it is better to use a union type string|null instead of a question mark. For example, if I wanted to write Stringable|string|null, maybe the version with a question mark isn't at all necessary.

Update: It looks like PHP 8.4 will require the notation with a question mark.


Readonly Variables in PHP 8.1 Will Surprise You

PHP 8.1 introduces an interesting feature: readonly member variables:

Let's start with an example of how to use it:

class Test
{
	public readonly string $prop;

	public function setProp(string $prop): void
	{
		$this->prop = $prop; // legal initialization
	}
}

$test = new Test;
$test->setProp('abc');
echo $test->prop; // legal read
$test->prop = 'foo'; // throws exception: Cannot modify readonly property Test::$prop

Once initialized, a variable cannot be overwritten with another value.

Scope

Interestingly, attempting to assign a value to $test->prop will also throw an exception even if the variable hasn't been initialized:

$test = new Test;
$test->prop = 'foo';
// throws exception too: Cannot initialize readonly property Test::$prop from global scope

This will even throw an exception:

class Child extends Test
{
	public function __construct()
	{
		$this->prop = 'hello';
		// throws exception: Cannot initialize readonly property Test::$prop from scope Child
	}
}

A readonly variable simply cannot be written from anywhere other than the class that defined it. Quite peculiar.

Immutability

The fact that the content of readonly variables cannot be changed doesn't mean the data written to them is immutable. If an object is written to such a variable, its internal variables can still be modified. The object does not become immutable.

The same applies to arrays. Although the behavior is slightly different here. Changing elements in the array is considered a change to the entire array and as such is impermissible in a readonly variable. However, if the array contains an element that is a reference, changing its content is not considered a change to the entire array and thus can occur in a readonly element. This, however, is standard PHP behavior as always.

In other words, this is possible:

class Test
{
	public readonly array $prop;

	public function test(): void
	{
		$item = 'foo';
		$this->prop = [1, &$item, 2];
		dump($this->prop); // [1, 'foo', 2]
		$item = 'bar'; // legal
		dump($this->prop); // [1, 'bar', 2]
	}
}

But this is not possible:

class Test
{
	public readonly array $prop;

	public function test(): void
	{
		$this->prop = ['a', 'b'];
		$this->prop[1] = 'c'; // throws exception!
	}
}

Type

Since readonly variables utilize the ‘uninitialized’ state, which exists for variables with a defined type, it is only possible to declare a variable as readonly in conjunction with a data type.


Which Framework Has the Best Documentation?

I was curious about which PHP framework has the best documentation and how Nette ranks among them. But how can you find out?

We all know that the worst scenario is having no documentation at all, followed by inadequate documentation. The opposite is extensive documentation. It seems, therefore, that the sheer volume of documentation is an important indicator. Of course, its understandability and currency, as well as readability and accuracy, play a huge role. These factors are very difficult to measure. However, I know from my own experience how many sections of Nette's documentation I have rewritten multiple times to make them clearer, and how many corrections I have merged, and I assume this happens with any long-standing framework. Thus, it appears that all documentation gradually converges towards a similar high quality. Therefore, I allow myself to take the sheer volume of data as a guide, though it is an oversimplification.

Of course, the volume of documentation must be proportional to the size of the library itself. Some are significantly larger than others and should accordingly have significantly more documentation. For simplicity, I determine the size of the library by the volume of PHP code, normalized for white space and excluding comments.

I created a chart showing the ratio of English documentation to code for well-known frameworks CakePHP (4.2), CodeIgniter (3.1), Laravel (8.62), Nette (3.1), Symfony (5.4), YII (2.0), and Zend Framework (2.x, no longer in development):

As you can see from the chart, the extent of documentation relative to the code is more or less similar across all frameworks.

CodeIgniter stands out. I tip my hat to CakePHP and YII, which strive to maintain documentation in a range of other languages. The comprehensiveness of Nette's documentation is above average. Additionally, Nette is the only framework that has a 1:1 translation in our native language.

The purpose of the chart is NOT to show that one framework has so many percent more comprehensive documentation than another. The metric is too primitive for that. Instead, the purpose is to show that the extent of documentation among the various frameworks is largely comparable. I created it mainly for myself, to get an idea of how Nette's documentation compares to its competitors.

Originally published in August 2019, data updated for October 2021.

3 years ago v rubrice PHP


How Shutdown and Destructor Calls Occur in PHP

The shutdown process in PHP consists of the following steps performed in the given order:

  1. Calling all functions registered using register_shutdown_function()
  2. Calling all __destruct() methods
  3. Emptying all output buffers
  4. Terminating all PHP extensions (e.g., sessions)
  5. Shutting down the output layer (sending HTTP headers, cleaning output handlers, etc.)

Let's focus more closely on step 2, the calling of destructors. It's important to note that even in the first step, when registered shutdown functions are called, object destruction can occur. For example, if one of the functions held the last reference to an object or if the shutdown function itself was an object.

Destructor calls proceed as follows:

  1. PHP first attempts to destroy objects in the global symbol table.
  2. Then it calls the destructors of all remaining objects.
  3. If execution is halted, e.g., due to exit(), the remaining destructors are not called.

ad 1) PHP iterates through the global symbol table in reverse order, starting with the most recently created variable and proceeding to the first created variable. During this iteration, it destroys all objects with a reference count of 1. This iteration continues as long as such objects exist.

Basically, it does the following: a) removes all unused objects in the global symbol table, b) if new unused objects appear, removes them as well, and c) continues this process. This method of destruction is used so that objects can depend on other objects in their destructor. This usually works well if objects in the global scope don't have complicated (e.g., circular) mutual dependencies.

Destruction of the global symbol table is significantly different from the destruction of other symbol tables. For the global symbol table, PHP uses a smarter algorithm that tries to respect object dependencies.

ad 2) Other objects are processed in the order they were created, and their destructors are called. Yes, PHP merely calls __destruct, but it doesn't actually destroy the object (nor does it even change its reference count). If other objects still refer to it, the object will remain available (even though its destructor has already been called). In a sense, they will be using a “half-destroyed” object.

ad 3) If execution is halted during the calling of destructors, e.g., due to exit(), the remaining destructors are not called. Instead, PHP marks the objects as already destroyed. The important consequence is that destructor calls are not guaranteed. While such cases are relatively rare, they can happen.

Source: https://stackoverflow.com/…ucted-in-php


How to write error handler in PHP?

When writing your own error handler for PHP, it is absolutely necessary to follow several rules. Otherwise, it can disrupt the behavior of other libraries and applications that do not expect treachery in the error handler.

Parameters

The signature of the handler looks like this:

function errorHandler(
    int $severity,
    string $message,
    string $file,
    int $line,
    array $context = null // only in PHP < 8
): ?bool {
    ...
}

The $severity parameter contains the error level (E_NOTICE, E_WARNING, …). Fatal errors such as E_ERROR cannot be caught by the handler, so this parameter will never have these values. Fortunately, fatal errors have essentially disappeared from PHP and have been replaced by exceptions.

The $message parameter is the error message. If the html_errors directive is enabled, special characters like < are written as HTML entities, so you need to decode them back to plain text. However, beware, some characters are not written as entities, which is a bug. Displaying errors in pure PHP is thus prone to XSS.

The $file and $line parameters represent the name of the file and the line where the error occurred. If the error occurred inside eval(), $file will be supplemented with this information.

Finally, the $context parameter contains an array of local variables, which is useful for debugging, but this has been removed in PHP 8. If the handler is to work in PHP 8, omit this parameter or give it a default value.

Return Value

The return value of the handler can be null or false. If the handler returns null, nothing happens. If it returns false, the standard PHP handler is also called. Depending on the PHP configuration, this can print or log the error. Importantly, it also fills in internal information about the last error, which is accessible by the error_get_last() function.

Suppressed Errors

In PHP, error display can be suppressed either using the shut-up operator @ or by error_reporting():

// suppress E_USER_DEPRECATED level errors
error_reporting(~E_USER_DEPRECATED);

// suppress all errors when calling fopen()
$file = @fopen($name, 'r');

Even when errors are suppressed, the handler is still called. Therefore, it is first necessary to verify whether the error is suppressed, and if so, we must end our own handler:

if (!($severity & error_reporting())) {
    return false;
}

However, in this case, we must end it with return false, so that the standard error handler is still executed. It will not print or log anything (because the error is suppressed), but ensures that the error can be detected using error_get_last().

Other Errors

If our handler processes the error (for example, displays its own message, etc.), there is no need to call the standard handler. Although then it will not be possible to detect the error using error_get_last(), this does not matter in practice, as this function is mainly used in combination with the shut-up operator.

If, on the other hand, the handler does not process the error for any reason, it should return false so as not to conceal it.

Example

Here's what the code for a custom error handler that transforms errors into ErrorException exceptions might look like:

set_error_handler(function (int $severity, string $message, string $file, int $line) {
    if (!(error_reporting() & $severity)) {
        return false;
    }

    throw new \ErrorException($message, 0, $severity, $file, $line);
});

What are SameSite Cookies and Why Do We Need Them?

SameSite cookies provide a mechanism to recognize what led to the loading of a page. Whether it was through clicking a link on another website, submitting a form, loading inside an iframe, using JavaScript, etc.

Identifying how a page was loaded is crucial for security. The serious vulnerability known as Cross-Site Request Forgery (CSRF) has been with us for over twenty years, and SameSite cookies offer a systematic way to address it.

A CSRF attack involves an attacker luring a victim to a webpage that inconspicuously makes a request to a web application where the victim is logged in, and the application believes the request was made voluntarily by the victim. Thus, under the identity of the victim, some action is performed without the victim knowing. This could involve changing or deleting data, sending a message, etc. To prevent such attacks, applications need to distinguish whether the request came from a legitimate source, e.g., by submitting a form on the application itself, or from elsewhere. SameSite cookies can do this.

How does it work? Let’s say I have a website running on a domain, and I create three different cookies with attributes SameSite=Lax, SameSite=Strict, and SameSite=None. Name and value do not matter. The browser will store them.

  1. When I open any URL on my website by typing directly into the address bar or clicking on a bookmark, the browser sends all three cookies.
  2. When I access any URL on my website from a page from the same website, the browser sends all three cookies.
  3. When I access any URL on my website from a page from a different website, the browser sends only the cookies with None and in certain cases Lax, see table:
Code on another website   Sent cookies
Link <a href="…"> None + Lax
Form GET <form method="GET" action="…"> None + Lax
Form POST <form method="POST" action="…"> None
iframe <iframe src="…"> None
AJAX $.get('…'), fetch('…') None
Image <img src="…"> None
Prefetch <link rel="prefetch" href="…"> None
  None

SameSite cookies can distinguish only a few cases, but these are crucial for protecting against CSRF.

If, for example, there is a form or a link for deleting an item on my website's admin page and it was sent/clicked, the absence of a cookie created with the Strict attribute means it did not happen on my website but rather the request came from elsewhere, indicating a CSRF attack.

Create a cookie to detect a CSRF attack as a so-called session cookie without the Expires attribute, its validity is essentially infinite.

Domain vs Site

“On my website” is not the same as “on my domain,” it's not about the domain, but about the website (hence the name SameSite). Although the site often corresponds to the domain, for services like github.io, it corresponds to the subdomain. A request from doc.nette.org to files.nette.org is same-site, while a request from nette.github.io to tracy.github.io is already cross-site. Here it is nicely explained.

<iframe>

From the previous lines, it is clear that if a page from my website is loaded inside an <iframe> on another website, the browser does not send Strict or Lax cookies. But there's another important thing: if such a loaded page creates Strict or Lax cookies, the browser ignores them.

This creates a possibility to defend against fraudulent acquisition of cookies or Cookie Stuffing, where until now, systemic defense was also lacking. The trick is that the fraudster collects a commission for affiliate marketing, although the user was not brought to the merchant's website by a user-clicked link. Instead, an invisible <iframe> with the same link is inserted into the page, marking all visitors.

Cookies without the SameSite Attribute

Cookies without the SameSite attribute were always sent during both same-site and cross-site requests. Just like SameSite=None. However, in the near future, browsers will start treating the SameSite=Lax flag as the default, so cookies without an attribute will be considered Lax. This is quite an unusually large BC break in browser behavior. If you want the cookie to continue to behave the same and be transmitted during any cross-site request, you need to set it to SameSite=None. (Unless you develop embedded widgets, etc., you probably won't want this often.) Unfortunately, for last year's browsers, the None value is unexpected. Safari 12 interprets it as Strict, thus creating a tricky problem on older iOS and macOS.

And note: None works only when set with the Secure attribute.

What to Do in Case of an Attack?

Run away! The basic rule of self-defense, both in real life and on the web. A huge mistake made by many frameworks is that upon detecting a CSRF attack, they display the form again and write something like “The CSRF token is invalid. Please try to submit the form again”. By resubmitting the form, the attack is completed. Such protection lacks sense when you actually invite the user to bypass it.

Until recently, Chrome did that during a cross-site request—it displayed the page again after a refresh, but this time sent the cookies with the Strict attribute. So, the refresh eliminated the CSRF protection based on SameSite cookies. Fortunately, it no longer does this today, but it's possible that other or older browsers still do. A user can also “refresh” the page by clicking on the address bar + enter, which is considered a direct URL entry (point 1), and all cookies are sent.

Thus, the best response to detecting CSRF is to redirect with a 302 HTTP code elsewhere, perhaps to the homepage. This rids you of dangerous POST data, and the problematic URL isn't saved to history.

Incompatibilities

SameSite hasn't worked nearly as well as it should have for a long time, mainly due to browser bugs and deficiencies in the specification, which, for example, didn't address redirections or refreshes. SameSite cookies weren't transferred during saving or printing a page, but were transferred after a refresh when they shouldn't have been, etc. Fortunately, the situation is better today. I believe that the only serious shortcomings in current browser versions persist, as mentioned above for Safari.

Addendum: Besides SameSite, the origin of a request can very recently be distinguished also by the Origin header, which is more privacy-respecting and more accurate than the Referer header.

4 years ago v rubrice PHP


How to Properly Set Up CSP and `script-src`

Content Security Policy (CSP) is an additional security feature that tells the browser what external sources a page can load and how it can be displayed. It protects against the injection of malicious code and attacks such as XSS. It is sent as a header composed of a series of directives. However, implementing it is not trivial.

Typically, we want to use JavaScript libraries located outside our server, such as Google Analytics, advertising systems, captchas, etc. Unfortunately, the first version of CSP fails here. It requires a precise analysis of the content loaded and the setting of the correct rules. This means creating a whitelist, a list of all the domains, which is not easy since some scripts dynamically pull other scripts from different domains or are redirected to other domains, etc. Even if you take the effort and manually create the list, you never know what might change in the future, so you must constantly monitor if the list is still up-to-date and correct it. Analysis by Google showed that even this meticulous tuning ultimately results in allowing such broad access that the whole purpose of CSP falls apart, just sending much larger headers with each request.

CSP level 2 approaches the problem differently using a nonce, but only the third version of the solution completed the process. Unfortunately, as of 2019, it does not have sufficient browser support.

Regarding how to assemble the script-src and style-src directives to work correctly even in older browsers and to minimize the effort, I have written a detailed article in the Nette partner section. Essentially, the resulting form might look like this:

script-src 'nonce-XXXXX' 'strict-dynamic' * 'unsafe-inline'
style-src 'nonce-XXXXX' * 'unsafe-inline'

Example of Use in PHP

We generate a nonce and send the header:

$nonce = base64_encode(random_bytes(16));

header("Content-Security-Policy: script-src 'nonce-$nonce' 'strict-dynamic' * 'unsafe-inline'");

And we insert the nonce into the HTML code:

<script nonce="<?=$nonce?>" src="..."></script>

Example of Use in Nette

Since Nette has built-in support for CSP and nonce since version 2.4, simply specify in the configuration file:

http:
	csp:
		script-src: [nonce, strict-dynamic, *, unsafe-inline]
		style-src: [nonce, *, unsafe-inline]

And then use in templates:

<script n:nonce src="..."></script>
<style n:nonce>...</style>

Monitoring

Before you set new rules for CSP, try them out first using the Content-Security-Policy-Report-Only header. This header works in all browsers that support CSP. If a rule is violated, the browser does not block the script but instead sends a notification to the URL specified in the report-uri directive. To receive and analyze these notifications, you might use a service like Report URI.

http:
	cspReportOnly:
		script-src: [nonce, strict-dynamic, *, unsafe-inline]
		report-uri: https://xxx.report-uri.com/r/d/csp/reportOnly

You can use both headers simultaneously, with Content-Security-Policy having verified and active rules and Content-Security-Policy-Report-Only to test their modifications. Of course, you can also monitor failures in the strict rules.

5 years ago v rubrice PHP


Texy 3.0: Perfection Remains Untouched

It's a bit like when you spot a poster for a concert by a band you remember from your youth. Are they still playing? Or did they get back together after years because they need the money? Perhaps to cash in on the strings of nostalgia? [perex]

Texy is my first open-source project. I started writing it fifteen years ago. Texy has survived several version control systems. Numerous web services hosting repositories. Several string encodings. Various markup languages for creating websites. Several of my life relationships. A number of cities I've lived in.

Texy is still here because there is nothing better.

So, I have kept it up-to-date for fifteen years. We started in PHP 4, which was the worst programming language in the world and thus a challenge, then moved on to PHP 5 with relief, a few years later we transitioned to namespaces (Texy::Parser instead of TexyParser, wow), watched PHP stop being the worst language in the world, which frustrated many programmers who then turned to JavaScript, then God created PHP 7 and with it type hints (Texy::process(string $text): string megawow), and strictness came into fashion with declare(strict_types=1) and we honor that.

And so here is Texy 3.0.. It's the same as the previous versions, but with all the bells and whistles of PHP 7.1. It's the same because you don't mess with perfection.

Texy was here when you were born, in programming terms. Someday, Texy might even format your epitaph. And it will insert a non-breaking space between a and room.

5 years ago v rubrice PHP


How to Mock Final Classes?

How to mock classes that are defined as final or some of their methods are final?

Mocking means replacing the original object with its testing imitation that does not perform any functionality and just looks like the original object. And pretending the behavior we need to test.

For example, instead of a PDO with methods like query() etc., we create a mock that pretends working with the database, and instead verifies that the correct SQL statements are called, etc. More e.g. in the Mockery documentation.

And in order to be able to pass mock to methods that use PDO type hint, it is necessary for the mock class to inherit from the PDO. And that can be a stumbling block. If the PDO or method query() were final, it would not be possible.

Is there any solution? The first option is not to use the final keyword at all. This, of course, does not help with the third-party code that it uses, but mainly detracts from the important element of the object design. For example, there is dogma that every class should be either final or abstract.

The second and very handy option is to use BypassFinals, which removes finals from source code on-the-fly and allows mocking of final methods and classes.

Install it using Composer:

composer require dg/bypass-finals --dev

And just call at the beginning of the test:

require __DIR__ . '/vendor/autoload.php';

DG\BypassFinals::enable();

Thats all. Incredibly black magic ?

BypassFinals requires PHP version 5.6 and supports PHP up to 7.2. It can be used together with any test tool such as PHPUnit or Mockery.


This functionality is directly implemented in the “Nette Tester”: https://tester.nette.org version 2.0 and can be enabled this way:

require __DIR__ . '/vendor/autoload.php';

Tester\Environment::bypassFinals();

phpFashion © 2004, 2024 David Grudl | o blogu

Ukázky zdrojových kódů smíte používat s uvedením autora a URL tohoto webu bez dalších omezení.