How to mock classes that are defined as final or some of their
methods are final?
Mocking means replacing the original object with its testing imitation that
does not perform any functionality and just looks like the original object. And
pretending the behavior we need to test.
For example, instead of a PDO with methods like query() etc., we create a
mock that pretends working with the database, and instead verifies that the
correct SQL statements are called, etc. More e.g. in the Mockery
documentation.
And in order to be able to pass mock to methods that use PDO
type hint, it is necessary for the mock class to inherit from the PDO. And that
can be a stumbling block. If the PDO or method query() were final, it would not
be possible.
Is there any solution? The first option is not to use the final keyword at
all. This, of course, does not help with the third-party code that it uses, but
mainly detracts from the important element of the object design. For example,
there is dogma that every class should be either final or abstract.
The second and very handy option is to use BypassFinals, which removes
finals from source code on-the-fly and allows mocking of final methods and
classes.
Install it using Composer:
composer require dg/bypass-finals --dev
And just call at the beginning of the test:
require __DIR__ . '/vendor/autoload.php';
DG\BypassFinals::enable();
Thats all. Incredibly black magic 🙂
BypassFinals requires PHP version 5.6 and supports PHP up to 7.2. It can be
used together with any test tool such as PHPUnit or Mockery.
This functionality is directly implemented in the “Nette Tester”: https://tester.nette.org version 2.0 and
can be enabled this way:
require __DIR__ . '/vendor/autoload.php';
Tester\Environment::bypassFinals();
And what you won't read in the documentation, including a
security patch and advice on speeding up server response without slowing
it down.
Output buffering allows the output of a PHP script (primarily from the
echo
function) to be stored in memory (i.e., a buffer) instead of
being sent immediately to the browser or terminal. This is useful for various
purposes.
Preventing Output to the Screen:
ob_start(); // enables output buffering
$foo->bar(); // all output goes only to the buffer
ob_end_clean(); // clears the buffer and ends buffering
Capturing Output into a Variable:
ob_start(); // enables output buffering
$foo->render(); // output goes only to the buffer
$output = ob_get_contents(); // saves the buffer content into a variable
ob_end_clean(); // clears the buffer and ends buffering
The pair ob_get_contents()
and ob_end_clean()
can be replaced by a single function ob_get_clean()
,
which removes end
from the name but indeed turns off output
buffering:
$output = ob_get_clean(); // saves the buffer content into variable and disables buffering
In the given examples, the buffer content did not reach the output at all. If
you want to send it to the output instead, you should use ob_end_flush()
instead of ob_end_clean()
. To simultaneously get the buffer
content, send it to the output, and end buffering, there is also a shortcut: ob_get_flush()
.
You can empty the buffer at any time without ending it using ob_clean()
(clears it) or ob_flush()
(sends it to the output):
ob_start(); // enables output buffering
$foo->bar(); // all output goes only to the buffer
ob_clean(); // clears the buffer content, but buffering remains active
$foo->render(); // output still goes to the buffer
ob_flush(); // sends the buffer to the output
$none = ob_get_contents(); // the buffer content is now an empty string
ob_end_clean(); // disables output buffering
Output written to php://output
is also sent to the buffer, while
buffers can be bypassed by writing to php://stdout
(or
STDOUT
), which is available only under CLI, i.e., when running
scripts from the command line.
Nesting
Buffers can be nested, so while one buffer is active, calling ob_start()
activates a new buffer. Thus, ob_end_flush()
and
ob_flush()
send the buffer content not to the output but to the
parent buffer. Only when there is no parent buffer does the content get sent to
the actual output, i.e., the browser or terminal.
Therefore, it is important to end buffering, even if an exception occurs
during the process:
ob_start();
try {
$foo->render();
} finally { // finally available from PHP 5.5
ob_end_clean(); // or ob_end_flush()
}
Buffer Size
The buffer can also “speed up page generation (I haven't measured this,
but it sounds logical)” by not sending every single echo
to the
browser, but a larger amount of data (e.g., 4kB). Just call at the beginning of
the script:
ob_start(null, 4096);
When the buffer size exceeds 4096 bytes (the so-called
chunk size
), a flush
is performed automatically, i.e.,
the buffer is emptied and sent out. The same can be achieved by setting the output_buffering
directive. It is ignored in CLI mode.
But beware, starting buffering without specifying the size, i.e.,
simply with ob_start()
, will cause the page not to be sent
gradually but only after it is fully rendered, making the server appear
very slow!
Output buffering has no effect on sending HTTP headers, which are processed
by a different path. However, thanks to buffering, headers can be sent even
after some output has been printed, as it is still held in the buffer. This is a
side effect you shouldn't rely on, as there is no certainty when the output will
exceed the buffer size and be sent.
Security Hole
When the script ends, all unclosed buffers are outputted. This can be
considered an unpleasant security hole if, for example, you prepare sensitive
data in the buffer not intended for output and an error occurs. The solution is
to use a custom handler:
ob_start(function () { return ''; });
Handlers
You can attach a custom handler to output buffering, i.e., a function that
processes the buffer content before sending it out:
ob_start(
function ($buffer, $phase) { return strtoupper($buffer); }
);
echo 'Hello';
ob_end_flush(); // 'HELLO' is sent to the output
Functions ob_clean()
or ob_end_clean()
will call
the handler but discard the output without sending it out. The handler can
detect which function is called and respond accordingly. The second parameter
$phase
is a bitmask (from PHP 5.4):
PHP_OUTPUT_HANDLER_START
when the buffer is opened
PHP_OUTPUT_HANDLER_FINAL
when the buffer is closed
PHP_OUTPUT_HANDLER_FLUSH
when ob_flush()
is called
(but not ob_end_flush()
or ob_get_flush()
)
PHP_OUTPUT_HANDLER_CLEAN
when ob_clean()
,
ob_end_clean()
, and ob_get_clean()
are called
PHP_OUTPUT_HANDLER_WRITE
when an automatic flush occurs
The start, final, and flush (or clean) phases can occur simultaneously,
distinguished by the binary operator &
:
if ($phase & PHP_OUTPUT_HANDLER_START) { ... }
if ($phase & PHP_OUTPUT_HANDLER_FLUSH) { ... }
elseif ($phase & PHP_OUTPUT_HANDLER_CLEAN) { ... }
if ($phase & PHP_OUTPUT_HANDLER_FINAL) { ... }
The PHP_OUTPUT_HANDLER_WRITE
phase occurs only if the buffer has
a size (chunk size
) and that size was exceeded. This is the
mentioned automatic flush. Note, the constant
PHP_OUTPUT_HANDLER_WRITE
has a value of 0, so you can't use a bit
test, but:
if ($phase === PHP_OUTPUT_HANDLER_WRITE) { ... }
A handler doesn't have to support all operations. When activating with
ob_start()
, you can specify the bitmask of supported operations as
the third parameter:
PHP_OUTPUT_HANDLER_CLEANABLE
– allows calling
ob_clean()
and related functions
PHP_OUTPUT_HANDLER_FLUSHABLE
– allows calling
ob_flush()
PHP_OUTPUT_HANDLER_REMOVABLE
– buffer can be ended
PHP_OUTPUT_HANDLER_STDFLAGS
– combines all three flags, the
default behavior
This applies even to buffering without a custom handler. For example, if
I want to capture the output into a variable, I don't set the
PHP_OUTPUT_HANDLER_FLUSHABLE
flag, preventing the buffer from being
(accidentally) sent to the output with ob_flush()
. However, it can
still be done with ob_end_flush()
or ob_get_flush()
,
which somewhat defeats the purpose.
Similarly, not setting the PHP_OUTPUT_HANDLER_CLEANABLE
flag
should prevent the buffer from being cleared, but again it doesn't work.
Finally, not setting PHP_OUTPUT_HANDLER_REMOVABLE
makes the
buffer user-undeletable; it turns off only when the script ends. An example of a
handler that should be set this way is ob_gzhandler
,
which compresses output, thus reducing volume and increasing data transfer
speed. Once this buffer is opened, it sends the HTTP header
Content-Encoding: gzip
, and all subsequent output must be
compressed. Removing the buffer would break the page.
The correct usage is:
ob_start(
'ob_gzhandler',
16000, // without chunk size, the server would not send data gradually
PHP_OUTPUT_HANDLER_FLUSHABLE // but not removable or cleanable
);
You can also enable output compression by setting the zlib.output_compression
directive, which turns on buffering with a different handler (not sure how it
differs specifically), but it lacks the flag to be non-removable. Since
it's good to compress the transfer of all text files, not just PHP-generated
pages, it's better to activate compression directly on the HTTP
server side.
PHP ssh2 thread safe binaries for Microsoft Windows:
Command-line script to convert between array()
and
PHP 5.4's short syntax []
. It uses native PHP tokenizer, so
conversion is safe. The script was successfully tested against thousands of
PHP files.
Download from GitHub
To convert all *.php
and *.phpt
files in whole
directory recursively or to convert a single file use:
convert.php <directory | file>
To convert source code from STDIN and print the output to STDOUT use:
convert.php < input.php > output.php
To convert short syntax []
to older long syntax
array()
use option --reverse
:
convert.php --reverse [<directory | file>]
Composer, the most important tool for
PHP developers, offers three methods to install packages:
- local
composer require vendor/name
- global
composer global require vendor/name
- as a project
composer create-project vendor/name
Local Installation
Local installation is the most common. If I have a project where I want to
use Tracy, I enter in the project's root
directory:
composer require tracy/tracy
Composer will update (or create) the composer.json
file and
download Tracy into the vendor
subfolder. It also generates an
autoloader, so in the code, I just need to include it and can use Tracy
right away:
require __DIR__ . '/vendor/autoload.php';
Tracy\Debugger::enable();
As a Project
A completely different situation arises when, instead of a library whose
classes I use in my project, I install a tool that I only run from the
command line.
An example might be ApiGen for generating
clear API documentation. In such cases, the third method is used:
composer create-project apigen/apigen
Composer will create a new folder (and thus a new project)
apigen
and download the entire tool and install its
dependencies.
It will have its own composer.json
and its own
vendor
subfolder.
This method is also used for installations like Nette Sandbox or CodeChecker. However, testing
tools such as Nette Tester or PHPUnit are not installed this way because we use
their classes in tests, calling Tester\Assert::same()
or inheriting
from PHPUnit_Framework_TestCase
.
Unfortunately, Composer allows tools like ApiGen to be installed using
composer require
without even issuing a warning.
This is equivalent to forcing two developers, who don't even know each other
and who work on completely different projects, to share the same
vendor
folder. To this one might say:
- For heaven's sake, why would they do that?
- It just can't work!
Indeed, there is no reasonable reason to do it, it brings no benefit, and it
will stop working the moment there is a conflict of libraries used. It's just a
matter of time, like building a house of cards that will sooner or later
collapse. One project will require library XY in version 1.0, another in version
2.0, and at that point, it will stop working.
Global Installation
The difference between option 1) and 2), i.e., between
composer require
and composer global require
, is that
it involves not two, but ten different developers and ten unrelated projects.
Thus, it is nonsensical squared.
Because composer global
is a bad solution every time, there is
no use case where it would be appropriate. The only advantage is that if you add
the global vendor/bin
directory to your PATH, you can easily run
libraries installed this way.
Summary
- Use
composer require vendor/name
if you want to use library
classes.
- Never use
composer global require vendor/name
!
- Use
composer create-project vendor/name
for tools called only
from the command line.
Note: npm uses a different philosophy
due to JavaScript's capabilities, installing each library as a “separate
project” with its own vendor
(or node_modules
)
directory. This prevents version conflicts. In the case of npm
,
global installations of tools, like LESS CSS,
are very useful and convenient.
Among the top 5 monstrous quirks of PHP certainly belongs the inability to
determine whether a call to a native function was successful or resulted in an
error. Yes, you read that right. You call a function and you don’t know
whether an error has occurred and what kind it was. [perex]
Now you might be smacking your forehead, thinking: surely I can tell by the
return value, right? Hmm…
Return Value
Native (or internal) functions usually return false
in case of
failure. There are exceptions, such as
"json_decode":http://php.net/manual/en/function.json-decode.php
,
which returns null
if the input is invalid or exceeds the nesting
limit, as mentioned in the documentation, so far so good.
This function is used for decoding JSON and its values, thus calling
json_decode('null')
also returns null
, but as a
correct result this time. We must therefore distinguish null
as a
correct result and null
as an error:
$res = json_decode($s);
if ($res === null && $s !== 'null') {
// an error occurred
}
It's silly, but thank goodness it's even possible. There are functions,
however, where you can't tell from the return value that an error has occurred.
For example, preg_grep
or preg_split
return a partial
result, namely an array, and you can't tell anything at all (more in Treacherous Regular
Expressions).
json_last_error & Co.
Functions that report the last error in a particular PHP extension.
Unfortunately, they are often unreliable and it is difficult to determine what
that last error actually was.
For example, json_decode('')
does not reset the last error flag,
so json_last_error
returns a result not for the last but for some
previous call to json_decode
(see How to encode and decode JSON in
PHP?). Similarly, preg_match('invalidexpression', $s)
does not
reset preg_last_error
. Some errors do not have a code, so they are
not returned at all, etc.
error_get_last
A general function that returns the last error. Unfortunately, it is
extremely complicated to determine whether the error was related to the function
you called. That last error might have been generated by a completely different
function.
One option is to consider error_get_last()
only when the return
value indicates an error. Unfortunately, for example, the mail()
function can generate an error even though it returns true
. Or
preg_replace
may not generate an error at all in case of
failure.
The second option is to reset the “last error” before calling our
function:
@trigger_error('', E_USER_NOTICE); // reset
$file = fopen($path, 'r');
if (error_get_last()['message']) {
// an error occurred
}
The code is seemingly clear, an error can only occur during the call to
fopen()
. But that's not the case. If $path
is an
object, it will be converted to a string by the __toString
method.
If it's the last occurrence, the destructor will also be called. Functions of
URL
wrappers may be called. Etc.
Thus, even a seemingly innocent line can execute a lot of PHP code, which may
generate other errors, the last of which will then be returned by
error_get_last()
.
We must therefore make sure that the error actually occurred during the call
to fopen
:
@trigger_error('', E_USER_NOTICE); // reset
$file = fopen($path, 'r');
$error = error_get_last();
if ($error['message'] && the error['file'] === __FILE__ && $error['line'] === __LINE__ - 3) {
// an error occurred
}
The magic constant 3
is the number of lines between
__LINE__
and the call to fopen
. Please no
comments.
In this way, we can detect an error (if the function emits one, which the
aforementioned functions for working with regular expressions usually do not),
but we are unable to suppress it, i.e., prevent it from being logged, etc. Using
the shut-up operator @
is problematic because it conceals
everything, including any further PHP code that is called in connection with our
function (see the mentioned destructors, wrappers, etc.).
Custom Error Handler
The crazy but seemingly only possible way to detect if a certain function
threw an error with the possibility of suppressing it is by installing a custom
error handler using set_error_handler
. But it's no joke to
do it
right:
- we must also remove the custom handler
- we must remove it even if an exception is thrown
- we must capture only errors that occurred in the incriminated function
- and pass all others to the original handler
The result looks like this:
$prev = set_error_handler(function($severity, $message, $file, $line) use (& $prev) {
if ($file === __FILE__ && $line === __LINE__ + 9) { // magic constant
throw new Exception($message);
} elseif ($prev) { // call the previous user handler
return $prev(...func_get_args());
}
return false; // call the system handler
});
try {
$file = fopen($path, 'r'); // this is the function we care about
} finally {
restore_error_handler();
}
You already know what the magic constant 9
is.
So this is how we live in PHP.
Well-maintained software should have quality API documentation.
Certainly. However, just as the absence of documentation is a mistake, so too is
its redundancy. Writing documentation comments, much like designing an API or
user interface, requires thoughtful consideration.
By thoughtful consideration, I do not mean the process that occurred in the
developer's mind when they complemented the constructor with this comment:
class ChildrenIterator
{
/**
* Constructor.
*
* @param array $data
* @return \Zend\Ldap\Node\ChildrenIterator
*/
public function __construct(array $data)
{
$this->data = $data;
}
Six lines that add not a single piece of new information. Instead, they
contribute to:
- visual noise
- duplication of information
- increased code volume
- potential for errors
The absurdity of the mentioned comment may seem obvious, and I'm glad if it
does. Occasionally, I receive pull requests that try to sneak similar rubbish
into the code. Some programmers even use editors that automatically clutter the
code this way. Ouch.
Or consider another example. Think about whether the comment told you
anything that wasn't already clear:
class Zend_Mail_Transport_Smtp extends Zend_Mail_Transport_Abstract
{
/**
* EOL character string used by transport
* @var string
* @access public
*/
public $EOL = "\n";
Except for the @return
annotation, the usefulness of this
comment can also be questioned:
class Form
{
/**
* Adds group to the form.
* @param string $caption optional caption
* @param bool $setAsCurrent set this group as current
* @return ControlGroup
*/
public function addGroup($caption = null, $setAsCurrent = true)
If you use expressive method and parameter names (which you should), and they
also have default values or type hints, this comment gives you almost nothing.
It should either be reduced to remove information duplication or expanded to
include more useful information.
But beware of the opposite extreme, such as novels in phpDoc:
/**
* Performs operations on ACL rules
*
* The $operation parameter may be either OP_ADD or OP_REMOVE, depending on whether the
* user wants to add or remove a rule, respectively:
*
* OP_ADD specifics:
*
* A rule is added that would allow one or more Roles access to [certain $privileges
* upon] the specified Resource(s).
*
* OP_REMOVE specifics:
*
* The rule is removed only in the context of the given Roles, Resources, and privileges.
* Existing rules to which the remove operation does not apply would remain in the
* ACL.
*
* The $type parameter may be either TYPE_ALLOW or TYPE_DENY, depending on whether the
* rule is intended to allow or deny permission, respectively.
*
* The $roles and $resources parameters may be references to, or the string identifiers for,
* existing Resources/Roles, or they may be passed as arrays of these - mixing string identifiers
* and objects is ok - to indicate the Resources and Roles to which the rule applies. If either
* $roles or $resources is null, then the rule applies to all Roles or all Resources, respectively.
* Both may be null in order to work with the default rule of the ACL.
*
* The $privileges parameter may be used to further specify that the rule applies only
* to certain privileges upon the Resource(s) in question. This may be specified to be a single
* privilege with a string, and multiple privileges may be specified as an array of strings.
*
* If $assert is provided, then its assert() method must return true in order for
* the rule to apply. If $assert is provided with $roles, $resources, and $privileges all
* equal to null, then a rule having a type of:
*
* TYPE_ALLOW will imply a type of TYPE_DENY, and
*
* TYPE_DENY will imply a type of TYPE_ALLOW
*
* when the rule's assertion fails. This is because the ACL needs to provide expected
* behavior when an assertion upon the default ACL rule fails.
*
* @param string $operation
* @param string $type
* @param Zend_Acl_Role_Interface|string|array $roles
* @param Zend_Acl_Resource_Interface|string|array $resources
* @param string|array $privileges
* @param Zend_Acl_Assert_Interface $assert
* @throws Zend_Acl_Exception
* @uses Zend_Acl_Role_Registry::get()
* @uses Zend_Acl::get()
* @return Zend_Acl Provides a fluent interface
*/
public function setRule($operation, $type, $roles = null, $resources = null, $privileges = null,
Zend_Acl_Assert_Interface $assert = null)
Generated API documentation is merely a reference guide, not a book to read
before sleep. Lengthy descriptions truly do not belong here.
The most popular place for expansive documentation is file headers:
<?php
/**
* Zend Framework
*
* LICENSE
*
* This source file is subject to the new BSD license that is bundled
* with this package in the file LICENSE.txt.
* It is also available through the world-wide-web at this URL:
* http://framework.zend.com/license/new-bsd
* If you did not receive a copy of the license and are unable to
* obtain it through the world-wide-web, please send an email
* to license@zend.com so we can send you a copy immediately.
*
* @category Zend
* @package Zend_Db
* @subpackage Adapter
* @copyright Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
* @license http://framework.zend.com/license/new-bsd New BSD License
* @version $Id: Abstract.php 25229 2013-01-18 08:17:21Z frosch $
*/
Sometimes it seems the intention is to stretch the header so long that upon
opening the file, the code itself is not visible. What's the use of a 10-line
information about the New BSD license, which contains key announcements like its
availability in the LICENSE.txt
file, accessible via the
world-wide-web, and if you happen to lack modern innovations like a so-called
web browser, you should send an email to license@zend.com, and they
will send it to you immediately? Furthermore, it's redundantly repeated
4,400 times. I tried sending a request, but the response did not
come 🙂
Also, including the copyright year in copyrights leads to a passion for
making commits like update copyright year to 2014, which changes all
files, complicating version comparison.
Is it really necessary to include copyright in every file? From a legal
perspective, it is not required, but if open source licenses allow users to use
parts of the code while retaining copyrights, it is appropriate to include them.
It's also useful to state in each file which product it originates from,
helping people navigate when they encounter it individually. A good
example is:
/**
* Zend Framework (http://framework.zend.com/)
*
* @link http://github.com/zendframework/zf2 for the canonical source repository
* @copyright Copyright (c) 2005-2014 Zend Technologies USA Inc. (http://www.zend.com)
* @license http://framework.zend.com/license/new-bsd New BSD License
*/
Please think carefully about each line and whether it truly benefits the
user. If not, it's rubbish that doesn't belong in the code.
(Please, commentators, do not perceive this article as a battle of
frameworks; it definitely is not.)
Let's create simple OOP wrapper for encoding and decoding JSON
in PHP:
class Json
{
public static function encode($value)
{
$json = json_encode($value);
if (json_last_error()) {
throw new JsonException;
}
return $json;
}
public static function decode($json)
{
$value = json_decode($json);
if (json_last_error()) {
throw new JsonException;
}
return $value;
}
}
class JsonException extends Exception
{
}
// usage:
$json = Json::encode($arg);
Simple.
But it is very naive. In PHP, there are a ton of bugs (sometime called as
“not-a-bug”) that need workarounds.
json_encode()
is (nearly) the only one function in whole PHP, which behavior is affected by
directive display_errors
. Yes, JSON encoding is affected by
displaying directive. If you want detect error
Invalid UTF-8 sequence
, you must disable this directive. (#52397, #54109, #63004, not fixed).
json_last_error()
returns the last error (if any) occurred during the last JSON encoding/decoding.
Sometimes! In case of error Recursion detected
it returns 0. You
must install your own error handler to catch this error. (Fixed after years in
PHP 5.5.0)
json_last_error()
sometimes doesn't return the last error, but
the last-but-one error. I.e. json_decode('')
with empty string
doesn't clear last error flag, so you cannot rely on error code. (Fixed in
PHP 5.3.7)
json_decode()
returns null if the JSON cannot be decoded or if the encoded data is deeper than
the recursion limit. Ok, but json_encode('null')
return null too.
So we have the same return value for success and failure. Great!
json_decode()
is unable to detect
Invalid UTF-8 sequence
in PHP < 5.3.3 or when PECL
implementation is used. You must check it own way.
json_last_error()
exists since PHP 5.3.0, so minimal required
version for our wrapper is PHP 5.3
json_last_error()
returns only numeric code. If you'd like to
throw exception, you must create own table of messages
(json_last_error_msg()
was added in PHP 5.5.0)
So the simple class wrapper for encoding and decoding JSON now looks
like this:
class Json
{
private static $messages = array(
JSON_ERROR_DEPTH => 'The maximum stack depth has been exceeded',
JSON_ERROR_STATE_MISMATCH => 'Syntax error, malformed JSON',
JSON_ERROR_CTRL_CHAR => 'Unexpected control character found',
JSON_ERROR_SYNTAX => 'Syntax error, malformed JSON',
5 /*JSON_ERROR_UTF8*/ => 'Invalid UTF-8 sequence',
6 /*JSON_ERROR_RECURSION*/ => 'Recursion detected',
7 /*JSON_ERROR_INF_OR_NAN*/ => 'Inf and NaN cannot be JSON encoded',
8 /*JSON_ERROR_UNSUPPORTED_TYPE*/ => 'Type is not supported',
);
public static function encode($value)
{
// needed to receive 'Invalid UTF-8 sequence' error; PHP bugs #52397, #54109, #63004
if (function_exists('ini_set')) { // ini_set is disabled on some hosts :-(
$old = ini_set('display_errors', 0);
}
// needed to receive 'recursion detected' error
set_error_handler(function($severity, $message) {
restore_error_handler();
throw new JsonException($message);
});
$json = json_encode($value);
restore_error_handler();
if (isset($old)) {
ini_set('display_errors', $old);
}
if ($error = json_last_error()) {
$message = isset(static::$messages[$error]) ? static::$messages[$error] : 'Unknown error';
throw new JsonException($message, $error);
}
return $json;
}
public static function decode($json)
{
if (!preg_match('##u', $json)) { // workaround for PHP < 5.3.3 & PECL JSON-C
throw new JsonException('Invalid UTF-8 sequence', 5);
}
$value = json_decode($json);
if ($value === null
&& $json !== '' // it doesn't clean json_last_error flag until 5.3.7
&& $json !== 'null' // in this case null is not failure
) {
$error = json_last_error();
$message = isset(static::$messages[$error]) ? static::$messages[$error] : 'Unknown error';
throw new JsonException($message, $error);
}
return $value;
}
}
This implementation is used in Nette
Framework. There is also workaround for another bug, the JSON bug. In fact, JSON is not subset of
JavaScript due characters \u2028
and \u2029
. They must
be not used in JavaScript and must be encoded too.
(In PHP, detection of errors in JSON encoding/decoding is hell, but it is
nothing compared to detection of errors in PCRE
functions.)
How not to get burned when replacing occurrences of one string
with another. Search & Replace tricks.
The basic function for replacing strings in PHP is str_replace:
$s = "Lorem ipsum";
echo str_replace('ore', 'ide', $s); // returns "Lidem ipsum"
Thanks to cleverly designed UTF-8 encoding, it can be reliably used even for
strings encoded this way. Additionally, the first two arguments can be arrays,
and the function will then perform multiple replacements. Here we encounter the
first trick to be aware of. Each replacement goes through the string
again, so if we wanted to swap dá
<⇒ pá
in
the phrase pánské dárky
to get dánské párky
(a
Swedish delicacy!), no order of arguments will achieve this:
// returns "dánské dárky"
echo str_replace(array('dá', 'pá'), array('pá', 'dá'), "pánské dárky");
// returns "pánské párky"
echo str_replace(array('pá', 'dá'), array('dá', 'pá'), "pánské dárky");
The sought-after function that goes through the string just once and prevents
collisions is strtr:
// returns "dánské párky", hooray
echo strtr("pánské dárky", array('pá' => 'dá', 'dá' => 'pá'));
If we need to find occurrences according to more complex rules, we use
regular expressions and the function preg_replace. It also allows for
multiple replacements and behaves similarly to str_replace
. Now,
however, I am heading elsewhere. I need to replace all numbers in the string
with the word hafo
, which is easy:
$s = "Radek says he has an IQ of 151. Quite the collector's item!";
echo preg_replace('#\d+#', 'hafo', $s);
Let's generalize the code so it can replace numbers with anything we pass in
the variable $replacement
. Many programmers will use:
return preg_replace('#\d+#', $replacement, $s); // wrong!
Unfortunately, that’s not right. It's important to realize that certain
characters have special meanings in the replaced string (specifically the slash
and dollar), so we must “escape” them: escaping definitive guide. The
correct general solution is:
return preg_replace('#\d+#', addcslashes($replacement, '$\\'), $s); // ok
Do any other replacement tricks come to mind?
Here are some well-intentioned tips on how to design the
structure of namespaces and class names.
Namespaces are probably the best-known new feature of PHP version 5.3. Their
main purpose is to prevent name conflicts and to allow shortening (aliasing) of
class names for use within a single file. In practice, it has been shown that
conflicts can also be avoided by using a 1–2 letter prefix, just as I have
never used class names like
Zend_Service_DeveloperGarden_Response_ConferenceCall_AddConferenceTemplateParticipantResponseType
(97 characters, I wonder how they adhere to their maximum line length rule of
80 characters 🙂 ). However, PHP follows in the footsteps of Java, and so we
have namespaces. How should we handle them?
Benefits of Namespaces
Perhaps the most complex question you need to answer is: what is the benefit
of renaming a class:
sfForm
→ Symfony\Component\Form\Form
This question is a proven starter for endless flame wars. From the
programmer's comfort, intuitiveness, and memorability perspective, the original
concise and descriptive sfForm
is more appropriate. It corresponds
to how programmers colloquially refer to it, i.e., “form in Symfony”. The
new and longer name is correct from other aspects, which I am not sure if the
average user will appreciate.
How to Layout Namespaces?
The syntactic aspect of using namespaces is described
in the documentation, but finding the right patterns requires practice,
which there hasn’t been enough time for yet. Spaces in PHP have their
specifics due to a number of factors, so it is not ideal to copy conventions
used in Java or .NET exactly. However, they can be a good starting point.
More will be discussed in the individual naming rules.
1)
A class should have a descriptive name even without mentioning the NS
The name of each class, even without the namespace, must capture its essence.
It would be inappropriate to rename the class ArrayIterator
→
Spl\Iterators\Array
, as one would not expect an iterator under the
name Array
(ignoring the fact that a class cannot be named a
keyword). And beware, even from the name Spl\Iterators\Array
, it is
not clear that it is an iterator, because you cannot assume that the namespace
Spl\Iterators
only contains iterators. Here are a few examples:
- unsuitable:
Nette\Application\Responses\Download
– it is not
obvious that Download is a response
- unsuitable:
Zend\Validator\Date
– you would expect
Date
to be a date, not a validator
- unsuitable:
Zend\Controller\Request\Http
– you would expect
Http
to be a request
Therefore, in addition to specializing classes, it is appropriate to keep a
level of generality in the name:
- better:
Nette\Application\Responses\DownloadResponse
- better:
Zend\Validator\DateValidator
- better:
Zend\Controller\Request\HttpRequest
The ideal is if there is a one-word yet descriptive name. This can be
particularly conceived for classes that represent something from the
real world:
- best:
Nette\Forms\Controls\Button
– two-word
ButtonControl
not necessary (however, HiddenControl
cannot be shortened to Hidden
)
2) The namespace
should have a descriptive name
Naturally, the name of the namespace itself must be descriptive, and it is
advantageous to have a shorter name without redundancies. Such a redundancy to
me seems like Component
in Symfony\Component\Routing
,
because the name would not suffer without it.
In some situations, you need to decide between singular and plural (e.g.,
Zend\Validator
vs Zend\Validators
), which is a
similarly undecided issue as when choosing singular and plural numbers for
database tables.
3) Distinguish between
namespaces and classes
Naming a class the same as a namespace (i.e., having classes
Nette\Application
and Nette\Application\Request
) is
technically possible, but it might confuse programmers and it is better to avoid
it. Also, consider how well the resulting code will read or how you would
explain the API to someone.
4) Limit
unnecessary duplications (+ partial namespace)
Ideally, the name of the class and the name of the space should not contain
the same information redundantly.
- instead of
Nette\Http\HttpRequest
prefer
Nette\Http\Request
- instead of
Symfony\Component\Security\Authentication\AuthenticationTrustResolver
prefer the class TrustResolver
The class Nette\Http\Request
does not violate rule No. 1 about
the descriptive name of the class even without mentioning the namespace, on the
contrary, it allows us to elegantly use the partial namespace:
use Nette\Http; // alias for namespace
// all classes via Http are available:
$request = new Http\Request;
$response = new Http\Response;
// and additionally, Http\Response is more understandable than just Response
If we understand namespaces as packages, which is common, it leads to
unfortunate duplication of the last word:
Zend\Form\Form
Symfony\Component\Finder\Finder
Nette\Application\Application
Namespaces also literally encourage grouping classes (e.g., various
implementations of the same interface, etc.) into their own spaces, which again
creates duplications:
Nette\Caching\Storages\FileStorage
– i.e., all storages in a
separate space Storages
Zend\Form\Exception\BadMethodCallException
– all exceptions
in Exception
Symfony\Component\Validator\Exception\BadMethodCallException
–
again all exceptions in Exception
Grouping namespaces lengthen the name and create duplication in it because it
is often impossible to remove the generality from the class name (rule 1). Their
advantage may be better orientation in the generated API documentation (although
this could be achieved differently) and easier access when using full-fledged
IDEs with prompting. However, I recommend using them cautiously. For example,
for exceptions, it is not very suitable.
5) Unmistakable classes
from multiple spaces
According to point 1), a class should have a descriptive name, but that does
not mean it has to be unique within the entire application. Usually, it is
enough that it is unique within the namespace. However, if two classes from
different spaces are often used next to each other in the code, or if they have
some other significant connection, they should not have the same name. In other
words, it should not be necessary to use AS in the USE clause.
6) One-way dependencies
Consider what dependencies should exist between classes from different
namespaces. I try to maintain:
- if a class from the namespace A\B has a dependency on a class from the
namespace A\C, no class from A\C should have a dependency on A\B
- classes from the namespace A\B should not have dependencies on a class from
the space A\B\C (take this with a grain of salt)
p.s.: Please do not take this article as dogma, it is just a capture of
current thoughts