Na navigaci | Klávesové zkratky

PHP: The Dark Magic of Optimization

I recently managed to speed up a PHP script to a hundredth of its original execution time by changing just a few characters in the source code. How is this possible? The drastic acceleration is due to the appropriate use of references and assignments. I'll let you in on how it works. Don't believe the sensational headline; it's not any kind of black magic. I repeat, you just need to understand how PHP works internally. But don’t worry, it's nothing too complicated.

In-depth Reference Counting

The PHP core stores variable names separately from their values in memory. An anonymous value is described by the structure [zval. Besides raw data, it includes information about the type (boolean, string, etc.) and two additional items: refcount and is_ref. Yes, refcount is exactly the counter for the aforementioned reference counting.

$abc = 'La Trine';

What does this code actually do? It creates a new zval value in memory, whose data section holds the 8 characters La Trine and indicates the type as a string. At the same time, a new entry abc is added to the variable table, referring to this zval.

Additionally, in the zval structure, we initialize the refcount counter to one, because there is exactly one variable ($abc) pointing to it.

// 10MB string
$sA = str_repeat(' ', 1e7);

$sB = $sA;

How does PHP handle the assignment on the second line? Of course, it creates a new record sB in the variable table. Now watch – the record will refer to the same zval that sA already refers to. It also increments the refcount.

This is great! There's no need to take up another 10MB of memory, no time-consuming data copying. The operation is lightning-fast.

But from the perspective of a PHP programmer, these are two different variables. What if I change one?

$sB .= 'the end';

No worries, everything is taken care of. When a write request to the variable occurs, PHP looks at the referenced zval and checks the refcount. If refcount > 1, the entire zval value is duplicated and sB will refer to this copy. Of course, the refcount of the original zval is also reduced.

For completeness, I'll add that the command unset($sB) will remove the sB record from the variable table and decrement the respective refcount. Once the refcount drops to zero, the zval structure is freed from memory – as no variable refers to it anymore.

Classic References, Penetrated in Depth

Is everything clear so far? Let's move on to the second lesson and show how the core deals with classic references.

$a = 'La Trine';
$b = & $a;

You already know how PHP executes the first line. But what happens under the hood in the case of the second line? When I described the zval structure, I mentioned is_ref. It's a boolean, indicating whether the zval value is a reference or not. And right now, its moment to shine has come.

PHP creates the variable $b just as in the example without using a reference, but additionally sets is_ref to true. At this point, both $a and $b (both!) become references, as we know them.

The significant difference comes when we try to change one of the variables. Because is_ref is true, the test on refcount is skipped along with the entire duplication mechanism. The common zval value is directly modified. Although… but we'll get to that soon.

We can create additional references $xyz = & $a, cancel them unset($b), the principle remains the same. The core works with the variable table and updates the refcount.

Is everything still understandable? If not, try reading the article again more slowly. Now, because maximum concentration is needed.

The Charm Slowly Disappears

Think about how PHP executes the following code:

$a = 'La Trine';
$b = & $a;
$c = $a;

Variables $a and $c refer to the same zval, which has is_ref unset. But variables $a and $b need to have is_ref set. This can only be resolved by having two zval values.

In other words, line No. 3 must duplicate the zval value:

The algorithm for creating new variables must therefore be supplemented with a condition: if refcount > 1 and the required is_ref “does not match”, then just duplicate and don't look around.

Similarly, duplication will also occur in this case:

$a = 'I love La Trine :-)';
$b = $a
$c = & $a;

See that? Creating a reference duplicates the variable's value. The copy, with is_ref set, will be referred to by variables $a and $c (just for completeness, refcount = 2).

You might now be wondering, what kind of madness is this, why is the PHP core so poorly designed? Trust me, it's not. It's a common issue of shared vs. exclusive access, just called differently. It could be avoided, but changing the design would complicate variable handling so much that it would be counterproductive globally.

Script Optimization

Finally, I can explain the trick behind the optimization of the mentioned script. It included the following code:

	...
	$arr = &$this->table;
	foreach($ngram as $token) {
//	if(!array_key_exists($token, $arr)) {
//	  $arr[$token] = array();
//	}
	  $arr = &$arr[$token];
	}
	...

It might seem that the success is due to removing the function array_key_exists, which is probably so terribly slow that it dragged everything down. Just for fun, whoever thought that, send me Nutella ? Nope. The problem is buried elsewhere.

Now you know that the passed variable $arr refers to a zval, set with the bit is_ref and a refcount = 2 (the value is referred from $arr and simultaneously by the element of the array itself). What is crucial is that this zval encompasses a huge array.

When assigning to the function array_key_exists, it becomes inevitable – the zval must be duplicated. Which literally pulls the brake on the moving script. If, for example, the function key(), which takes a parameter by reference, were called, or if we violated the forbidden syntax Call-time pass-by-reference and forced the argument by reference array_key_exists($token, &$arr), no copying would occur. And the script would speed up by 600×.

White Magic of Optimization

My goal was to dispel superstitions and myths around references. That they're like pointers, that they speed up code. The truth is that all variables are essentially pointers. They just differ in how the PHP core handles them.

If you understand these principles, you can use them to your advantage (I emphasize the word “can”). You can handle strings or arrays more efficiently. Once they get into your blood, you will use them subconsciously, becoming a Coding Standard.


phpFashion © 2004, 2024 David Grudl | o blogu

Ukázky zdrojových kódů smíte používat s uvedením autora a URL tohoto webu bez dalších omezení.