When writing your own error handler for PHP, it is absolutely
necessary to follow several rules. Otherwise, it can disrupt the behavior of
other libraries and applications that do not expect treachery in the error
handler.
Parameters
The signature of the handler looks like this:
function errorHandler(
int $severity,
string $message,
string $file,
int $line,
array $context = null // only in PHP < 8
): ?bool {
...
}
The $severity parameter contains the error level
(E_NOTICE, E_WARNING, …). Fatal errors such as
E_ERROR cannot be caught by the handler, so this parameter will
never have these values. Fortunately, fatal errors have essentially disappeared
from PHP and have been replaced by exceptions.
The $message parameter is the error message. If the html_errors
directive is enabled, special characters like < are written as
HTML entities, so you need to decode
them back to plain text. However, beware, some characters are not written
as entities, which is a bug. Displaying errors in pure PHP is thus prone to XSS.
The $file and $line parameters represent the name
of the file and the line where the error occurred. If the error occurred inside
eval(), $file will be supplemented with this information.
Finally, the $context parameter contains an array of local
variables, which is useful for debugging, but this has been removed in PHP
8. If the handler is to work in PHP 8, omit this parameter or give it a
default value.
Return Value
The return value of the handler can be null or
false. If the handler returns null, nothing happens.
If it returns false, the standard PHP handler is also called.
Depending on the PHP configuration, this can print or log the error.
Importantly, it also fills in internal information about the last error, which
is accessible by the error_get_last()
function.
Suppressed Errors
In PHP, error display can be suppressed either using the shut-up operator
@ or by error_reporting():
// suppress E_USER_DEPRECATED level errors
error_reporting(~E_USER_DEPRECATED);
// suppress all errors when calling fopen()
$file = @fopen($name, 'r');
Even when errors are suppressed, the handler is still called.
Therefore, it is first necessary to verify whether the error is
suppressed, and if so, we must end our own handler:
if (!($severity & error_reporting())) {
return false;
}
However, in this case, we must end it with return false,
so that the standard error handler is still executed. It will not print or log
anything (because the error is suppressed), but ensures that the error can be
detected using error_get_last().
Other Errors
If our handler processes the error (for example, displays its own message,
etc.), there is no need to call the standard handler. Although then it will not
be possible to detect the error using error_get_last(), this does
not matter in practice, as this function is mainly used in combination with the
shut-up operator.
If, on the other hand, the handler does not process the error for any reason,
it should return false so as not to conceal it.
Example
Here's what the code for a custom error handler that transforms errors into
ErrorException
exceptions might look like:
set_error_handler(function (int $severity, string $message, string $file, int $line) {
if (!(error_reporting() & $severity)) {
return false;
}
throw new \ErrorException($message, 0, $severity, $file, $line);
});
SameSite cookies provide a mechanism to recognize what led to
the loading of a page. Whether it was through clicking a link on another
website, submitting a form, loading inside an iframe, using
JavaScript, etc.
Identifying how a page was loaded is crucial for security. The serious
vulnerability known as Cross-Site
Request Forgery (CSRF) has been with us for over twenty years, and SameSite
cookies offer a systematic way to address it.
A CSRF attack involves an attacker luring a victim to a webpage that
inconspicuously makes a request to a web application where the victim is logged
in, and the application believes the request was made voluntarily by the victim.
Thus, under the identity of the victim, some action is performed without the
victim knowing. This could involve changing or deleting data, sending a message,
etc. To prevent such attacks, applications need to distinguish whether the
request came from a legitimate source, e.g., by submitting a form on the
application itself, or from elsewhere. SameSite cookies can do this.
How does it work? Let’s say I have a website running on a domain, and
I create three different cookies with attributes SameSite=Lax,
SameSite=Strict, and SameSite=None. Name and value do
not matter. The browser will store them.
When I open any URL on my website by typing directly into the address bar
or clicking on a bookmark, the browser sends all three cookies.
When I access any URL on my website from a page from the same
website, the browser sends all three cookies.
When I access any URL on my website from a page from a different
website, the browser sends only the cookies with None and in
certain cases Lax, see table:
Code on another website
Sent cookies
Link
<a href="…">
None + Lax
Form GET
<form method="GET" action="…">
None + Lax
Form POST
<form method="POST" action="…">
None
iframe
<iframe src="…">
None
AJAX
$.get('…'), fetch('…')
None
Image
<img src="…">
None
Prefetch
<link rel="prefetch" href="…">
None
…
None
SameSite cookies can distinguish only a few cases, but these are crucial for
protecting against CSRF.
If, for example, there is a form or a link for deleting an item on my
website's admin page and it was sent/clicked, the absence of a cookie created
with the Strict attribute means it did not happen on my website but
rather the request came from elsewhere, indicating a CSRF attack.
Create a cookie to detect a CSRF attack as a so-called session cookie without
the Expires attribute, its validity is essentially infinite.
Domain vs Site
“On my website” is not the same as “on my domain,” it's not about
the domain, but about the website (hence the name SameSite). Although the site
often corresponds to the domain, for services like github.io, it
corresponds to the subdomain. A request from doc.nette.org to
files.nette.org is same-site, while a request from
nette.github.io to tracy.github.io is already
cross-site. Here it is nicely
explained.
<iframe>
From the previous lines, it is clear that if a page from my website is loaded
inside an <iframe> on another website, the browser does not
send Strict or Lax cookies. But there's another
important thing: if such a loaded page creates Strict or
Lax cookies, the browser ignores them.
This creates a possibility to defend against fraudulent acquisition of
cookies or Cookie
Stuffing, where until now, systemic defense was also lacking. The trick is
that the fraudster collects a commission for affiliate marketing, although the
user was not brought to the merchant's website by a user-clicked link. Instead,
an invisible <iframe> with the same link is inserted into the
page, marking all visitors.
Cookies without the SameSite
Attribute
Cookies without the SameSite attribute were always sent during both same-site
and cross-site requests. Just like SameSite=None. However, in the
near future, browsers will start treating the SameSite=Lax flag as
the default, so cookies without an attribute will be considered
Lax. This is quite an unusually large BC break in browser behavior.
If you want the cookie to continue to behave the same and be transmitted during
any cross-site request, you need to set it to SameSite=None.
(Unless you develop embedded widgets, etc., you probably won't want this often.)
Unfortunately, for last year's browsers, the None value is
unexpected. Safari 12 interprets it as Strict, thus creating a
tricky problem on older iOS and macOS.
And note: None works only when set with the Secure
attribute.
What to Do in Case of an
Attack?
Run away! The basic rule of self-defense, both in real life and on the web.
A huge mistake made by many frameworks is that upon detecting a CSRF attack,
they display the form again and write something like “The CSRF token is
invalid. Please try to submit the form again”. By resubmitting the form,
the attack is completed. Such protection lacks sense when you actually invite
the user to bypass it.
Until recently, Chrome did that during a cross-site request—it displayed
the page again after a refresh, but this time sent the cookies with the
Strict attribute. So, the refresh eliminated the CSRF protection
based on SameSite cookies. Fortunately, it no longer does this today, but
it's possible that other or older browsers still do. A user can also
“refresh” the page by clicking on the address bar + enter, which is
considered a direct URL entry (point 1), and all cookies are sent.
Thus, the best response to detecting CSRF is to redirect with a 302 HTTP
code elsewhere, perhaps to the homepage. This rids you of dangerous POST data,
and the problematic URL isn't saved to history.
Incompatibilities
SameSite hasn't worked nearly as well as it should have for a long time,
mainly due to browser bugs and deficiencies in the specification, which, for
example, didn't address redirections or refreshes. SameSite cookies weren't
transferred during saving or printing a page, but were transferred after a
refresh when they shouldn't have been, etc. Fortunately, the situation is better
today. I believe that the only serious shortcomings in current browser versions
persist, as mentioned above for Safari.
Addendum: Besides SameSite, the origin of a request can very recently be
distinguished also by the Origin
header, which is more privacy-respecting and more accurate than the Referer
header.
Content Security Policy (CSP) is an additional security feature that tells
the browser what external sources a page can load and how it can be displayed.
It protects against the injection of malicious code and attacks such as XSS. It
is sent as a header composed of a series of
directives. However, implementing it is not trivial.
Typically, we want to use JavaScript libraries located outside our server,
such as Google Analytics, advertising systems, captchas, etc. Unfortunately, the
first version of CSP fails here. It requires a precise analysis of the content
loaded and the setting of the correct rules. This means creating a whitelist, a
list of all the domains, which is not easy since some scripts dynamically pull
other scripts from different domains or are redirected to other domains, etc.
Even if you take the effort and manually create the list, you never know what
might change in the future, so you must constantly monitor if the list is still
up-to-date and correct it. Analysis by Google showed that even this meticulous
tuning ultimately results in allowing such broad access that the whole purpose
of CSP falls apart, just sending much larger headers with each request.
CSP level 2 approaches the problem differently using a nonce, but only the
third version of the solution completed the process. Unfortunately, as of 2019,
it does not have sufficient browser support.
Regarding how to assemble the script-src and
style-src directives to work correctly even in older browsers and
to minimize the effort, I have written a detailed
article in the Nette partner section. Essentially, the resulting form might
look like this:
Before you set new rules for CSP, try them out first using the
Content-Security-Policy-Report-Only header. This header works in
all browsers that support CSP. If a rule is violated, the browser does not block
the script but instead sends a notification to the URL specified in the
report-uri directive. To receive and analyze these notifications,
you might use a service like Report
URI.
You can use both headers simultaneously, with
Content-Security-Policy having verified and active rules and
Content-Security-Policy-Report-Only to test their modifications. Of
course, you can also monitor failures in the strict rules.
It's a bit like when you spot a poster for a concert by a band you remember
from your youth. Are they still playing? Or did they get back together after
years because they need the money? Perhaps to cash in on the strings of
nostalgia? [perex]
Texy is my first open-source project. I started writing it fifteen years ago. Texy has survived several
version control systems. Numerous web services hosting repositories. Several
string encodings. Various markup languages for creating websites. Several of my
life relationships. A number of cities I've lived in.
So, I have kept it up-to-date for fifteen years. We started in PHP 4, which
was the worst programming language in the world and thus a challenge, then moved
on to PHP 5 with relief, a few years later we transitioned to namespaces
(Texy::Parser instead of TexyParser, wow), watched PHP
stop being the worst language in the world, which frustrated many programmers
who then turned to JavaScript, then God created PHP 7 and with it type hints
(Texy::process(string $text): string megawow), and strictness came
into fashion with declare(strict_types=1) and we honor that.
And so here is Texy 3.0.. It's the
same as the previous versions, but with all the bells and whistles of PHP
7.1. It's the same because you don't mess with perfection.
Texy was here when you were born, in programming terms. Someday, Texy might
even format your epitaph. And it will insert a non-breaking space between
a and room.
How to mock classes that are defined as final or some of their
methods are final?
Mocking means replacing the original object with its testing imitation that
does not perform any functionality and just looks like the original object. And
pretending the behavior we need to test.
For example, instead of a PDO with methods like query() etc., we create a
mock that pretends working with the database, and instead verifies that the
correct SQL statements are called, etc. More e.g. in the Mockery
documentation.
And in order to be able to pass mock to methods that use PDO
type hint, it is necessary for the mock class to inherit from the PDO. And that
can be a stumbling block. If the PDO or method query() were final, it would not
be possible.
Is there any solution? The first option is not to use the final keyword at
all. This, of course, does not help with the third-party code that it uses, but
mainly detracts from the important element of the object design. For example,
there is dogma that every class should be either final or abstract.
The second and very handy option is to use BypassFinals, which removes
finals from source code on-the-fly and allows mocking of final methods and
classes.
A naming conundrum: how to collectively refer to classes and interfaces? For
instance, what should you call a variable that could contain either a class or
an interface name? What should be used instead of $class?
One might consider the term type ($type), but this is
quite generic because a type can also be a string or an array. From the
perspective of the language, a type could be something more complex, such as
?array. Moreover, it's debatable what constitutes the type of an
object: is it the class name, or is it object?
However, there indeed exists a collective term for classes and interfaces: it
is the word class.
How so?
From a declaration standpoint, an interface is essentially a stripped-down
class. It can only contain public abstract methods, which also implies that
objects cannot be created. Therefore, interfaces are a subset of classes. If
something is a subset, we can refer to it by the name of the superset. Just as a
human is a mammal, an interface is a class.
Nevertheless, there's also the usage perspective. A class can inherit from
only one class but can implement multiple interfaces. However, this limitation
pertains to classes, not to the interfaces themselves. Similarly, a class cannot
inherit from a final class, but we still perceive the final class as a class.
Also, if a class can implement multiple interfaces (i.e., classes, see 1.), we
still regard them as classes.
And what about traits? They simply do not belong here, as they do not exist
from an OOP standpoint.
Thus, the issue of naming classes and interfaces together is resolved.
Let’s simply call them classes.
classes + interfaces = classes
Well, but a new problem has arisen. How to refer to classes that are not
interfaces? That is, their complement. What was referred to at the beginning of
the article as classes. Non-interface? Or “implementations”#Class_vs._type)? 🙂
That's an even bigger nut to crack. It’s a tough nut indeed. You know
what, let's forget that interfaces are also classes and again pretend that
every OOP identifier is either a class or an interface. It will be easier.
And what you won't read in the documentation, including a
security patch and advice on speeding up server response without slowing
it down.
Output buffering allows the output of a PHP script (primarily from the
echo function) to be stored in memory (i.e., a buffer) instead of
being sent immediately to the browser or terminal. This is useful for various
purposes.
Preventing Output to the Screen:
ob_start(); // enables output buffering
$foo->bar(); // all output goes only to the buffer
ob_end_clean(); // clears the buffer and ends buffering
Capturing Output into a Variable:
ob_start(); // enables output buffering
$foo->render(); // output goes only to the buffer
$output = ob_get_contents(); // saves the buffer content into a variable
ob_end_clean(); // clears the buffer and ends buffering
$output = ob_get_clean(); // saves the buffer content into variable and disables buffering
In the given examples, the buffer content did not reach the output at all. If
you want to send it to the output instead, you should use ob_end_flush()
instead of ob_end_clean(). To simultaneously get the buffer
content, send it to the output, and end buffering, there is also a shortcut: ob_get_flush().
You can empty the buffer at any time without ending it using ob_clean()
(clears it) or ob_flush()
(sends it to the output):
ob_start(); // enables output buffering
$foo->bar(); // all output goes only to the buffer
ob_clean(); // clears the buffer content, but buffering remains active
$foo->render(); // output still goes to the buffer
ob_flush(); // sends the buffer to the output
$none = ob_get_contents(); // the buffer content is now an empty string
ob_end_clean(); // disables output buffering
Output written to php://output is also sent to the buffer, while
buffers can be bypassed by writing to php://stdout (or
STDOUT), which is available only under CLI, i.e., when running
scripts from the command line.
Nesting
Buffers can be nested, so while one buffer is active, calling ob_start()
activates a new buffer. Thus, ob_end_flush() and
ob_flush() send the buffer content not to the output but to the
parent buffer. Only when there is no parent buffer does the content get sent to
the actual output, i.e., the browser or terminal.
Therefore, it is important to end buffering, even if an exception occurs
during the process:
ob_start();
try {
$foo->render();
} finally { // finally available from PHP 5.5
ob_end_clean(); // or ob_end_flush()
}
Buffer Size
The buffer can also “speed up page generation (I haven't measured this,
but it sounds logical)” by not sending every single echo to the
browser, but a larger amount of data (e.g., 4kB). Just call at the beginning of
the script:
ob_start(null, 4096);
When the buffer size exceeds 4096 bytes (the so-called
chunk size), a flush is performed automatically, i.e.,
the buffer is emptied and sent out. The same can be achieved by setting the output_buffering
directive. It is ignored in CLI mode.
But beware, starting buffering without specifying the size, i.e.,
simply with ob_start(), will cause the page not to be sent
gradually but only after it is fully rendered, making the server appear
very slow!
HTTP Headers
Output buffering has no effect on sending HTTP headers, which are processed
by a different path. However, thanks to buffering, headers can be sent even
after some output has been printed, as it is still held in the buffer. This is a
side effect you shouldn't rely on, as there is no certainty when the output will
exceed the buffer size and be sent.
Security Hole
When the script ends, all unclosed buffers are outputted. This can be
considered an unpleasant security hole if, for example, you prepare sensitive
data in the buffer not intended for output and an error occurs. The solution is
to use a custom handler:
ob_start(function () { return ''; });
Handlers
You can attach a custom handler to output buffering, i.e., a function that
processes the buffer content before sending it out:
ob_start(
function ($buffer, $phase) { return strtoupper($buffer); }
);
echo 'Hello';
ob_end_flush(); // 'HELLO' is sent to the output
Functions ob_clean() or ob_end_clean() will call
the handler but discard the output without sending it out. The handler can
detect which function is called and respond accordingly. The second parameter
$phase is a bitmask (from PHP 5.4):
PHP_OUTPUT_HANDLER_START when the buffer is opened
PHP_OUTPUT_HANDLER_FINAL when the buffer is closed
PHP_OUTPUT_HANDLER_FLUSH when ob_flush() is called
(but not ob_end_flush() or ob_get_flush())
PHP_OUTPUT_HANDLER_CLEAN when ob_clean(),
ob_end_clean(), and ob_get_clean() are called
PHP_OUTPUT_HANDLER_WRITE when an automatic flush occurs
The start, final, and flush (or clean) phases can occur simultaneously,
distinguished by the binary operator &:
if ($phase & PHP_OUTPUT_HANDLER_START) { ... }
if ($phase & PHP_OUTPUT_HANDLER_FLUSH) { ... }
elseif ($phase & PHP_OUTPUT_HANDLER_CLEAN) { ... }
if ($phase & PHP_OUTPUT_HANDLER_FINAL) { ... }
The PHP_OUTPUT_HANDLER_WRITE phase occurs only if the buffer has
a size (chunk size) and that size was exceeded. This is the
mentioned automatic flush. Note, the constant
PHP_OUTPUT_HANDLER_WRITE has a value of 0, so you can't use a bit
test, but:
if ($phase === PHP_OUTPUT_HANDLER_WRITE) { ... }
A handler doesn't have to support all operations. When activating with
ob_start(), you can specify the bitmask of supported operations as
the third parameter:
PHP_OUTPUT_HANDLER_CLEANABLE – allows calling
ob_clean() and related functions
PHP_OUTPUT_HANDLER_REMOVABLE – buffer can be ended
PHP_OUTPUT_HANDLER_STDFLAGS – combines all three flags, the
default behavior
This applies even to buffering without a custom handler. For example, if
I want to capture the output into a variable, I don't set the
PHP_OUTPUT_HANDLER_FLUSHABLE flag, preventing the buffer from being
(accidentally) sent to the output with ob_flush(). However, it can
still be done with ob_end_flush() or ob_get_flush(),
which somewhat defeats the purpose.
Similarly, not setting the PHP_OUTPUT_HANDLER_CLEANABLE flag
should prevent the buffer from being cleared, but again it doesn't work.
Finally, not setting PHP_OUTPUT_HANDLER_REMOVABLE makes the
buffer user-undeletable; it turns off only when the script ends. An example of a
handler that should be set this way is ob_gzhandler,
which compresses output, thus reducing volume and increasing data transfer
speed. Once this buffer is opened, it sends the HTTP header
Content-Encoding: gzip, and all subsequent output must be
compressed. Removing the buffer would break the page.
The correct usage is:
ob_start(
'ob_gzhandler',
16000, // without chunk size, the server would not send data gradually
PHP_OUTPUT_HANDLER_FLUSHABLE // but not removable or cleanable
);
You can also enable output compression by setting the zlib.output_compression
directive, which turns on buffering with a different handler (not sure how it
differs specifically), but it lacks the flag to be non-removable. Since
it's good to compress the transfer of all text files, not just PHP-generated
pages, it's better to activate compression directly on the HTTP
server side.
Command-line script to convert between array() and
PHP 5.4's short syntax []. It uses native PHP tokenizer, so
conversion is safe. The script was successfully tested against thousands of
PHP files.
The way applications are developed in PHP has dramatically
transformed over the last 5 years. Initially, we moved away from pure PHP and
learned to use frameworks. Later, Composer arrived, enabling library
installations from the command line. Now, we are witnessing the end of
frameworks as we know them.
Monolithic frameworks are gradually disintegrating into separate (decoupled)
components. This transition offers several advantages. While previously using
just one part of a framework was difficult or impossible, today you can simply
install its component. The development cycle of individual components can vary.
They have their own repositories, issue trackers, and can have their own
development teams.
You can update components to new versions continuously, without waiting for
the next version of the entire framework. Alternatively, you may decide not to
update a certain component, perhaps due to a BC break.
The meaning of the word “framework” is shifting; talking about versions
is almost obsolete. Instead of using framework XYZ in version 2.3.1, you use a
set of components in various versions that work together.
Splitting a framework into components is quite complex. For Nette, it took
2 years and was completed last year. The adoption of Composer and the
consistent use of dependency injection were absolutely essential. Nette now
consists of over 20 separate repositories, and the original one retains only a
single
class.
All major frameworks, such as Symfony, Zend, Laravel, or CakePHP, are divided
into components, though one step remains to be completed: splitting into
separate repositories (instead of a workaround like Git subtree split). Zend
promises to do this in version 2.5; we'll see what happens with Symfony.
Composing Nette
Through this long introduction, I wanted to lead you to the idea that
viewing Nette as a framework in any specific version is outdated. It's smarter
to approach it as a set of components.
That is, instead of declaring a dependency on nette/nette, you
should declare dependencies on specific components. This is now being done
by Sandbox. For the foundation of a future application, you can also use the
Nette Web Project, which is a
minimalist version of the Sandbox. Download it using
composer create-project nette/web-project
and remove from composer.json the components you do not need.
This will speed up Composer operations.
Bug fixes will also reach you faster. Once an error is fixed, you can
immediately tag a new version for the relevant component, whereas the release
cycle for the entire framework is much slower.
If you are creating add-ons for Nette, do not hesitate and immediately
replace the dependency on nette/nette with a list of actually
required components.
Of course, new versions of the framework will continue to be released as
before, require nette/nette will still work, and for version 2.3,
distributions in ZIP archives will also be released. But their significance will
gradually diminish.