» Strana 6 » phpFashion

Best Practices for Namespaces in PHP

Here are some well-intentioned tips on how to design the structure of namespaces and class names.

Namespaces are probably the best-known new feature of PHP version 5.3. Their main purpose is to prevent name conflicts and to allow shortening (aliasing) of class names for use within a single file. In practice, it has been shown that conflicts can also be avoided by using a 1–2 letter prefix, just as I have never used class names like Zend_Service_DeveloperGarden_Response_ConferenceCall_AddConferenceTemplateParticipantResponseType (97 characters, I wonder how they adhere to their maximum line length rule of 80 characters 🙂 ). However, PHP follows in the footsteps of Java, and so we have namespaces. How should we handle them?

Benefits of Namespaces

Perhaps the most complex question you need to answer is: what is the benefit of renaming a class:

sfForm → Symfony\Component\Form\Form

This question is a proven starter for endless flame wars. From the programmer's comfort, intuitiveness, and memorability perspective, the original concise and descriptive sfForm is more appropriate. It corresponds to how programmers colloquially refer to it, i.e., “form in Symfony”. The new and longer name is correct from other aspects, which I am not sure if the average user will appreciate.

How to Layout Namespaces?

The syntactic aspect of using namespaces is described in the documentation, but finding the right patterns requires practice, which there hasn’t been enough time for yet. Spaces in PHP have their specifics due to a number of factors, so it is not ideal to copy conventions used in Java or .NET exactly. However, they can be a good starting point.

More will be discussed in the individual naming rules.

1) A class should have a descriptive name even without mentioning the NS

The name of each class, even without the namespace, must capture its essence. It would be inappropriate to rename the class ArrayIterator → Spl\Iterators\Array, as one would not expect an iterator under the name Array (ignoring the fact that a class cannot be named a keyword). And beware, even from the name Spl\Iterators\Array, it is not clear that it is an iterator, because you cannot assume that the namespace Spl\Iterators only contains iterators. Here are a few examples:

unsuitable: Nette\Application\Responses\Download – it is not obvious that Download is a response
unsuitable: Zend\Validator\Date – you would expect Date to be a date, not a validator
unsuitable: Zend\Controller\Request\Http – you would expect Http to be a request

Therefore, in addition to specializing classes, it is appropriate to keep a level of generality in the name:

better: Nette\Application\Responses\DownloadResponse
better: Zend\Validator\DateValidator
better: Zend\Controller\Request\HttpRequest

The ideal is if there is a one-word yet descriptive name. This can be particularly conceived for classes that represent something from the real world:

best: Nette\Forms\Controls\Button – two-word ButtonControl not necessary (however, HiddenControl cannot be shortened to Hidden)

2) The namespace should have a descriptive name

Naturally, the name of the namespace itself must be descriptive, and it is advantageous to have a shorter name without redundancies. Such a redundancy to me seems like Component in Symfony\Component\Routing, because the name would not suffer without it.

In some situations, you need to decide between singular and plural (e.g., Zend\Validator vs Zend\Validators), which is a similarly undecided issue as when choosing singular and plural numbers for database tables.

3) Distinguish between namespaces and classes

Naming a class the same as a namespace (i.e., having classes Nette\Application and Nette\Application\Request) is technically possible, but it might confuse programmers and it is better to avoid it. Also, consider how well the resulting code will read or how you would explain the API to someone.

4) Limit unnecessary duplications (+ partial namespace)

Ideally, the name of the class and the name of the space should not contain the same information redundantly.

instead of Nette\Http\HttpRequest prefer Nette\Http\Request
instead of Symfony\Component\Security\Authentication\AuthenticationTrustResolver prefer the class TrustResolver

The class Nette\Http\Request does not violate rule No. 1 about the descriptive name of the class even without mentioning the namespace, on the contrary, it allows us to elegantly use the partial namespace:

use Nette\Http; // alias for namespace

// all classes via Http are available:
$request = new Http\Request;
$response = new Http\Response;
// and additionally, Http\Response is more understandable than just Response

If we understand namespaces as packages, which is common, it leads to unfortunate duplication of the last word:

Zend\Form\Form
Symfony\Component\Finder\Finder
Nette\Application\Application

Namespaces also literally encourage grouping classes (e.g., various implementations of the same interface, etc.) into their own spaces, which again creates duplications:

Nette\Caching\Storages\FileStorage – i.e., all storages in a separate space Storages
Zend\Form\Exception\BadMethodCallException – all exceptions in Exception
Symfony\Component\Validator\Exception\BadMethodCallException – again all exceptions in Exception

Grouping namespaces lengthen the name and create duplication in it because it is often impossible to remove the generality from the class name (rule 1). Their advantage may be better orientation in the generated API documentation (although this could be achieved differently) and easier access when using full-fledged IDEs with prompting. However, I recommend using them cautiously. For example, for exceptions, it is not very suitable.

5) Unmistakable classes from multiple spaces

According to point 1), a class should have a descriptive name, but that does not mean it has to be unique within the entire application. Usually, it is enough that it is unique within the namespace. However, if two classes from different spaces are often used next to each other in the code, or if they have some other significant connection, they should not have the same name. In other words, it should not be necessary to use AS in the USE clause.

6) One-way dependencies

Consider what dependencies should exist between classes from different namespaces. I try to maintain:

if a class from the namespace A\B has a dependency on a class from the namespace A\C, no class from A\C should have a dependency on A\B
classes from the namespace A\B should not have dependencies on a class from the space A\B\C (take this with a grain of salt)

p.s.: Please do not take this article as dogma, it is just a capture of current thoughts

Are these URLs the same?

A question that many webmasters ask: do search engines perceive these URLs as the same? How should they be treated?

http://example.com/article
http://example.com/article/
http://example.com/Article
https://example.com/article
http://www.example.com/article
http://example.com/article?a=1&b=2
http://example.com/article?b=2&a=1

The short answer would be: “URLs are different.” However, a more detailed analysis is needed.

From a user's perspective, these addresses differ only in minor details which they generally disregard. Thus, they perceive them as the same, although technically, they are different addresses. Let's call them similar addresses. For the sake of “user experience”, two principles should be adhered to:

Do not allow different content on similar addresses. As I will show soon, this would not only confuse users but also search engines.
Allow users access through similar addresses.

If the addresses differ in protocol http / https or with www domain or without, search engines consider them different. Not so for users. It would be a fatal mistake to place different content on such similar addresses. However, it would also be a mistake to prevent access through a similar address. The address with www and without www must both function, with SEO recommending sticking to one variant and redirecting the others to it using a 301 HTTP code. This can be managed for the www subdomain with a .htaccess file:

# redirection to the non-www variant
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule ^.*$   http://example.com/$0  [R=301,NE,L]

# redirection to the www variant
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^.*$   http://www.example.com/$0  [R=301,NE,L]

Immediately test whether your servers redirect, including the full address and correct parameter passing. Don't forget variants like www.subdomain.example.cz. Because some browsers can bypass missing redirections, try a low-level service like Web-Sniffer.

URLs are case-sensitive except for the scheme and domain. However, users do not differentiate and therefore, it is unfortunate to offer different content on addresses differing only by letter case. A poor example can be seen in Wikipedia:

http://en.wikipedia.org/wiki/Acid about acids
http://en.wikipedia.org/wiki/ACID about database transactions

Bing amusingly suffers from an error, returning the same URL whether you search for acid or a database (although the textual description is correct). Google and Yahoo do not have this issue.

Bing does not differentiate between acid and database

Some services (webmails, ICQ) convert uppercase letters in URLs to lowercase, which are all reasons to avoid distinguishing letter size, even in parameters. Better adhere to the convention that all letters in URLs should be lowercase.

Distinguishing some similar addresses is also a challenge for search engines. I conducted an experiment by placing different content on URLs differing in details like the presence of a trailing slash or parameter order. Only Google was able to index them as different. Other search engines could always handle only one of the variants.

Only Google can index these pages as different

As for trailing slashes, the web server usually redirects to the canonical form for you; if you access a directory without a trailing slash, it adds one and redirects. Of course, this does not apply when you manage URIs on your own (Cool URIs, etc.)

Finally: does the order of parameters really matter? There should be no difference between article?a=1&b=2 and article?b=2&a=1. However, there are situations where this is not the case, especially when passing complex structures such as arrays. For instance, ?sort[]=name&sort[]=city might be different from ?sort[]=city&sort[]=name. Nevertheless, redirecting if parameters are not in the specified order would be considered unnecessary overcorrection.

p.s. Nette Framework automatically handles redirection to canonical URLs on its own.

Treacherous Regular Expressions in PHP

In PHP, there are three libraries available for regular expressions: PCRE, Oniguruma, and POSIX Regex. The second one may not always be available, and the third is deprecated, so you should exclusively use the more adept and faster PCRE library. Unfortunately, its implementation suffers from quite unpleasant flaws across all PHP versions.

The operation of the preg_* functions can be divided into two steps:

compilation of the regular expression
execution (searching, replacing, filtering, …)

It is advantageous that PHP maintains a cached version of compiled regular expressions, meaning they are only compiled once. Therefore, it is appropriate to use static regular expressions, i.e., not to generate them parametrically.

Now for the unpleasant issues. If an error is discovered during compilation, PHP will issue an E_WARNING error, but the return value of the function is inconsistent:

preg_filter, preg_replace_callback, preg_replace return null
preg_grep, preg_match_all, preg_match, preg_split return false

It is good to know that functions returning an array $matches by reference (i.e., preg_match_all and preg_match) do not nullify the argument upon a compilation error, thus validating the test of the return value.

Since version 5.2.0, PHP has the function preg_last_error returning the code of the last error. However, beware, this only applies to errors that occur during execution! If an error occurs during compilation, the value of preg_last_error is not reset and returns the previous value. If the return value of a preg_* function is not null or false (see above), definitely do not rely on what preg_last_error returns.

What kind of errors can occur during execution? The most common case is exceeding pcre.backtrack_limit or invalid UTF-8 input when using the u modifier. (Note: invalid UTF-8 in the regular expression itself is detected during compilation.) However, the way PHP handles such an error is utterly inadequate:

it generates no message (silent error)
the return value of the function may indicate that everything is fine
the error can only be detected by calling preg_last_error later

Let's talk about the return value, which is probably the biggest betrayal. The process is executed until an error occurs, then it returns a partially processed result. And this is done completely silently. However, even this is not always the case, for example, the trio of functions preg_filter, preg_replace_callback, preg_replace can return null even during execution errors.

Whether an error occurred during execution can only be determined by calling preg_last_error. But as you know, this function returns a nonsensical result if, on the contrary, a compilation error occurred, so we must distinguish both situations by considering the return value of the function, whether it is null or false. And since functions that return null during a compilation error can also return null during an execution error, it can be stated only that PHP is undoubtedly a messed-up language.

What would safe use of PCRE functions look like? For example, like this:

function safeReplaceCallback($pattern, $callback, $subject)
{
	// we must verify the callback ourselves
	if (!is_callable($callback)) {
		throw new Exception('Invalid callback.');
	}

	// test the expression on an empty string
	if (preg_match($pattern, '') === false) { // compilation error?
		$error = error_get_last();
		throw new Exception($error['message']);
	}

	// call PCRE
	$result = preg_replace_callback($pattern, $callback, $subject);

	// execution error?
	if ($result === null && preg_last_error()) {
		throw new Exception('Error processing regular expression.', preg_last_error());
	}

	return $result;
}

The provided code transforms errors into exceptions but does not attempt to suppress warning outputs.

Safe processing of regular expressions is implemented in the class Nette\Utils\Strings.

Hackers Will Attack Your Website

Every now and then, a security vulnerability is reported on another significant website (Alza, Mapy.cz, BontonLand) or is exploited. Try searching for XSS vulnerability to understand why Cross Site Scripting (XSS) is currently one of the most widespread and dangerous vulnerabilities.

This is a distressing issue for website operators and perhaps even more so for suppliers. It can damage reputations, lead to fines, lawsuits, or simply spoil relationships with clients. How to defend against XSS? By so-called string escaping. Unfortunately, most experts are not well-versed in this area. (I don’t mean to be tactless or offend anyone, but of the “Czechoslovak IT celebrities,” I only know one person who deeply understands this issue.) Thus, even articles on this topic on well-known websites are, let’s say, inaccurate.

Moreover, this escaping is usually done in the template, falling on the coder’s shoulders. Thus, the most critical area requiring high expertise is handled by someone unqualified. How can this end? We know all too well – see the first paragraph.

Nette Framework Will Save You

I would like to introduce you to a killer feature of the Latte templating system in the Nette Framework. It's such a fundamental feature that it alone is a reason to choose this framework. Or at least to use its templates.

the bigger your company, the more crucial this feature is
no competing framework has it to date ¹⁾

The Nette Framework automatically escapes in templates. Its Context-aware escaping feature recognizes which part of the document you are in and chooses the appropriate escaping method accordingly.

Let's dive into more technical details. You can see how it works best with an example. Consider a variable $var and this template:

<p onclick="alert({$var})">{$var}</p>

<script>
document.title = {$var};
</script>

The notation {$var} means printing the variable. However, each print must be explicitly secured, even differently at each location. A coder must (for example, in Smarty) add the appropriate modifiers, must not make a mistake, and especially not omit anything.

In the Nette Framework, nothing needs to be manually secured. Everything is done automatically, correctly, and consistently!

If we assign $var = 'Width 1/2"' to the variable, the framework generates the HTML code:

<p onclick="alert(&quot;Width 1\/2\&quot;&quot;)">Width 1/2&quot;</p>

<script>
document.title = "Width 1\/2\"";
</script>

Of course, situations where you need to print a variable without escaping it are also considered, for example, because it contains article text including HTML tags. In such cases, you use the notation {$var|noescape}.

End of the technical digression. Thanks to Latte, it suddenly means that

the template remains simple
you don’t have to worry that a coder will overlook something
and at the same time, you don’t need to have a top expert on escaping ;)
the work is much easier

You can find more information about Latte’s smart templates in the documentation.

¹⁾ About half a year after Nette, Google introduced a similar feature for its library in C++, and as far as I know, no framework in PHP, Ruby, or Python has anything similar yet.

Escaping – The Definitive Guide

One of the evergreen topics in programming is the confusion and misunderstandings around escaping. Ignorance causes the simplest methods of compromising websites, such as Cross Site Scripting (XSS) or SQL injection, to remain unfortunately widespread.

Escaping is the substitution of characters that have a special meaning in a given context with other corresponding sequences.

Example: To write quotes within a string enclosed by quotes, you need to replace them because quotes have a special meaning in the context of a string, and writing them plainly would be interpreted as ending the string. The specific substitution rules are determined by the context.

Prerequisites

Each escaping function assumes that the input is always a “raw string” (unmodified) in a certain encoding (character set).

Storing strings already escaped for HTML output in the database and similar is entirely counterproductive.

What contexts do we encounter?

As mentioned, escaping converts characters that have a special meaning in a certain context. Different escaping functions are used for each context. This table is only indicative, and it is necessary to read the notes below.

Context	Escaping Function	Reverse Function
HTML	htmlspecialchars	html_entity_decode
XML	htmlspecialchars	—
regular expression	preg_quote	—
PHP strings	var_export	—
MySQL database	mysql_real_escape_string	—
MySQL improved	mysqli_real_escape_string	—
SQLite database	sqlite_escape_string	—
PostgreSQL database	pg_escape_string	—
PostgreSQL, bytea type	pg_escape_bytea	pg_unescape_bytea
JavaScript, JSON	json_encode	json_decode
CSS	addcslashes	—
URL	rawurlencode	urldecode

Explanation of the following notes:

many contexts have their subcontexts where escaping differs. Unless otherwise stated, the specified escaping function is applicable universally without further differentiation of subcontexts.
the term usual character set refers to a character set with 1-byte or UTF-8 encoding.

HTML

In HTML contexts, the characters < & " ' collectively have a special meaning, and the corresponding sequences are < & " '. However, the exception is an HTML comment, where only the pair -- has special meaning.

For escaping, use:

$s = htmlspecialchars($s, ENT_QUOTES);

It works with any usual character set. However, it does not consider the subcontext of HTML comments (i.e., it cannot replace the pair -- with something else).

Reverse function:

$s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');

XML / XHTML

XML 1.0 differs from HTML in that it prohibits the use of C0 control characters (including writing in the form of an entity) except for the tabulator, line feed, and space. XML 1.1 allows these banned characters, except NUL, in the form of entities, and further mandates C1 control characters, except NEL, also to be written as entities. Additionally, in XML, the sequence ]]> has a special meaning, so one of these characters must also be escaped.

For XML 1.0 and any usual character set, use:

$s = preg_replace('#[\x00-\x08\x0B\x0C\x0E-\x1F]+#', '', $s);
$s = htmlspecialchars($s, ENT_QUOTES);

Regular Expression

In Perl regular expressions, characters . \ + * ? [ ^ ] $ ( ) { } = ! < > | : - and the so-called delimiter, which is a character delimiting the regular expression (e.g., for the expression '#[a-z]+#i' it is #), collectively have special meaning. They are escaped with the character \.

$s = preg_quote($s, $delimiter);

In the string replacing the searched expression (e.g., the 2nd parameter of the preg_replace function), the backslash and dollar sign have special meaning:

$s = addcslashes($replacement, '$\\');

The encoding must be either 1-byte or UTF-8, depending on the modifier in the regular expression.

PHP Strings

PHP distinguishes these types of strings:

in single quotes, where special meaning can have characters \ '
in double quotes, where special meaning can have characters \ " $
NOWDOC, where no character has special meaning
HEREDOC, where special meaning can have characters \ $

Escape is done with the character \. This is usually done by the programmer when writing code, for PHP code generators, you can use the var_export function.

Note: because the mentioned regular expressions are usually written within PHP strings, both types of escaping need to be combined. E.g., the character \ for a regular expression is written as \\ and in a quoted string it needs to be written as \\\\.

SQL and Databases

Each database has its own escaping function, see the table above. Almost always, however, only a function for escaping strings is available, and it cannot be used for anything else, especially there are no functions for escaping wildcard characters used in LIKE constructions (in MySQL these are % _) or identifiers, such as table or column names. Databases do not require removing escaping on output! (Except, for example, for bytea type.)

For character sets with unusual multi-byte encoding, it is necessary to set the function mysql_set_charset or mysqli_set_charset in MySQL.

I recommend using a database layer (e.g., dibi, Nette Database, PDO) or parameterized queries, which take care of escaping for you.

JavaScript, JSON

As a programming language, JavaScript has a number of very different subcontexts. For escaping strings, you can use the side effect of the function

$s = json_encode((string) $s);

which also encloses the string in quotes. Strictly requires UTF-8.

JavaScript written inside HTML attributes (e.g., onclick) must still be escaped according to HTML rules, but this does not apply to JavaScript inside <script> tags, where only the potential occurrence of the ending tag </script> inside the string needs to be treated. However, json_encode ensures this, as JSON escapes the slash /. However, it does not handle the end of an HTML comment --> (which does not matter in HTML) or an XML CDATA block ]]>, which the script is wrapped in. For XML/XHTML, the solution is

$s = json_encode((string) $s);
$s = str_replace(']]>', ']]\x3E', $s);

Since JSON uses a subset of JavaScript syntax, the reverse function json_decode is fully usable only for JSON, limitedly for JavaScript.

CSS

In CSS contexts, the range of valid characters is precisely defined, for escaping identifiers, for example, you can use this function:

$s = addcslashes($s, "\x00..\x2C./:;<=>?@[\\]^`{|}~");

For CSS within HTML code, the same applies as stated about JavaScript and its escaping within HTML attributes and tags (here it is about the style attributes and <style> tags).

URL

In the context of a URL, everything except the letters of the English alphabet, digits, and characters - _ . is escaped by replacing them with % + the hexadecimally expressed byte.

$s = rawurlencode($s);

According to RFC 2718 (from 1999) or RFC 3986 (from 2005), writing characters in UTF-8 encoding is preferred.

The reverse function in this case is urldecode, which also recognizes the + character as meaning space.

If you find the whole topic too complicated, don't despair. Soon you will realize that it is actually about simple transformations, and the whole trick is in realizing which context I am in and which function I need to choose for it. Or even better, try using an intelligent templating system that can recognize contexts itself and apply proper escaping: Latte.

Is Singleton Evil?

Singleton is one of the most popular design patterns. Its purpose is to ensure the existence of only one instance of a certain class while also providing global access to it. Here is a brief example for completeness:

class Database
{
    private static $instance;

    private function __construct()
    {}

    public static function getInstance()
    {
        if (self::$instance === null) {
            self::$instance = new self;
        }
        return self::$instance;
    }

    ...
}

// singleton is globally accessible
$result = Database::getInstance()->query('...');

Typical features include:

A private constructor, preventing the creation of an instance outside the class
A static property $instance where the unique instance is stored
A static method getInstance(), which provides access to the instance and creates it on the first call (lazy loading)

Simple and easy to understand code that solves two problems of object-oriented programming. Yet, in dibi or Nette Framework, you won’t find any singletons. Why?

Apparent Uniqueness

Let's look closely at the code – does it really ensure only one instance exists? I’m afraid not:

$dolly = clone Database::getInstance();

// or
$dolly = unserialize(serialize(Database::getInstance()));

// or
class Dolly extends Database {}
$dolly = Dolly::getInstance();

There is a defense against this:

    final public static function getInstance()
    {
        // final getInstance
    }

    final public function __clone()
    {
        throw new Exception('Clone is not allowed');
    }

    final public function __wakeup()
    {
        throw new Exception('Unserialization is not allowed');
    }

The simplicity of implementing a singleton is gone. Worse – with every additional singleton, we repeat the same piece of code. Moreover, the class suddenly fulfills two completely different tasks: besides its original purpose, it takes care of being quite single. Both are warning signals that something is not right and the code deserves refactoring. Bear with me, I’ll get back to this soon.

Global = Ugly?

Singletons provide a global access point to objects. There is no need to constantly pass the reference around. However, critics argue that such a technique is no different from using global variables, and those are pure evil.

(If a method works with an object that was explicitly passed to it, either as a parameter or as an object variable, I call it “wired connection”. If it works with an object obtained through a global point (e.g., through a singleton), I call it “wireless connection”. Quite a nice analogy, right?)

Critics are wrong in one respect – there is nothing inherently bad about “global”. It’s important to realize that the name of each class and method is nothing more than a global identifier. There is no fundamental difference between the trouble-free construction $obj = new MyClass and the criticized $obj = MyClass::getInstance(). This is even less significant in dynamic languages like PHP, where you can “write in PHP 5.3” $obj = $class::getInstance().

However, what can cause headaches are:

Hidden dependencies on global variables
Unexpected use of “wireless connections”, which are not apparent from the API of classes (see Singletons are Pathological Liars)

The first issue can be eliminated if singletons do not act like global variables, but rather as global functions or services. Consider google.com – a nice example of a singleton as a global service. There is one instance (a physical server farm somewhere in the USA) globally accessible through the identifier www.google.com. (Even clone www.google.com does not work, as Microsoft discovered, they have it figured out.) Importantly, this service does not have hidden dependencies typical for global variables – it returns responses without unexpected connections to what someone else searched for moments ago. On the other hand, the seemingly inconspicuous function strtok suffers from a serious dependency on a global variable, and its use can lead to very hard-to-detect errors. In other words – the problem is not “globality”, but design.

The second point is purely a matter of code design. It is not wrong to use a “wireless connection” and access a global service, the mistake is doing it unexpectedly. A programmer should know exactly which object uses which class. A relatively clean solution is to have a variable in the object referring to the service object, which initializes to the global service unless the programmer decides otherwise (the convention over configuration technique).

Uniqueness May Be Harmful

Singletons come with a problem that we encounter no later than when testing code. And that is the need to substitute a different, test object. Let's return to Google as an exemplary singleton. We want to test an application that uses it, but after a few hundred tests, Google starts protesting We're sorry… and where are we? We are somewhere. The solution is to substitute a fictitious (mock) service under the identifier www.google.com. We need to modify the hosts file – but (back from the analogy to the world of OOP) how to achieve this with singletons?

One option is to implement a static method setInstance($mockObj). But oops! What exactly do you want to pass to that method when no other instance, other than that one and only, exists?

Any attempt to answer this question inevitably leads to the breakdown of everything that makes a singleton a singleton.

If we remove the restrictions on the existence of only one instance, the singleton stops being single and we are only addressing the need for a global repository. Then the question arises, why repeat the same method getInstance() in the code and not move it to an extra class, into some global registry?

Or we maintain the restrictions, only replacing the class identifier with an interface (Database → IDatabase), which raises the problem of the impossibility to implement IDatabase::getInstance() and the solution again is a global registry.

A few paragraphs above, I promised to return to the issue of repetitive code in all singletons and possible refactoring. As you can see, the problem has resolved itself. The singleton has died.

Twitter for PHP

Twitter for PHP is a very small and easy-to-use library for sending messages to Twitter and receiving status updates with OAuth support.

Download Twitter for PHP 3.5

It requires PHP (version 5 or newer) with CURL extension and is licensed under the New BSD License. You can obtain the latest version from our GitHub repository or install it via Composer:

php composer.phar require dg/twitter-php

Twitter requires SSL/TLS as of January 14th, 2014. Update to the last version.

Getting started

Sign in to the http://twitter.com and register an application from the http://dev.twitter.com/apps page. Remember
to never reveal your consumer secrets. Click on My Access Token link from the sidebar and retrieve your own access
token. Now you have consumer key, consumer secret, access token and access token secret.

Create object using application and request/access keys:

$twitter = new Twitter($consumerKey, $consumerSecret,
	$accessToken, $accessTokenSecret);

Posting

The send() method posts your status. The message must be encoded in UTF-8:

$twitter->send('I am fine today.');

You can append picture:

$twitter->send('This is my photo', $imageFile);

Displaying

The load() method returns the 20 most recent status updates posted in the last 24 hours by you:

$statuses = $twitter->load(Twitter::ME);

or posted by you and your friends:

$statuses = $twitter->load(Twitter::ME_AND_FRIENDS);

or most recent mentions for you:

$statuses = $twitter->load(Twitter::REPLIES);

Extracting the information from the channel is easy:

<ul>
<?php foreach ($statuses as $status): ?>
	<li><a href="http://twitter.com/<?= $status->user->screen_name ?>">
		<?= htmlspecialchars($status->user->name) ?></a>:

		<?= Twitter::clickable($status) ?>

		<small>at <?= date("j.n.Y H:m", strtotime($status->created_at)) ?></small>
	</li>
<?php endforeach ?>
</ul>

The static method Twitter::clickable() makes links in status clickable. In addition to regular links, it links @username to the user’s Twitter profile page and links hashtags to a Twitter search on that hashtag.

Searching

The search() method provides searching in twitter statuses:

$results = $twitter->search('#nette');

The returned result is a again array of statuses.

Error handling

All methods throw a TwitterException on error:

try {
	$statuses = $twitter->load(Twitter::ME);
} catch (TwitterException $e) {
	echo "Error: ", $e->getMessage();
}

Additional features

The authenticate() method tests if user credentials are valid:

if (!$twitter->authenticate()) {
	die('Invalid name or password');
}

Other commands

You can use all commands defined by Twitter API 1.1. For example GET statuses/retweets_of_me returns the array of most recent tweets authored by the authenticating user:

$statuses = $twitter->request('statuses/retweets_of_me', 'GET', array('count' => 20));

Finally, the Truth About XHTML and HTML

I recently participated in a discussion that reminded me (again) of the deeply entrenched myths regarding the differences between HTML and XHTML. The campaign for the formats with the letter “X” was accompanied by great emotions, which usually do not go hand in hand with a clear head. Although the enthusiasm has long since faded, a significant part of the professional community and authors still believe a number of misconceptions.

In this article, I will attempt to bury the biggest of these myths in the following way. This article will contain only facts. I will save my opinions and your comments for a second article.

In the text below, by HTML I mean the version HTML 4.01, and by XHTML I mean the version XHTML 1.0 Second Edition. For completeness, I add that HTML is an application of the SGML language, while XHTML is an application of the XML language.

Myth: HTML allows tag crossing

Not at all. Tag crossing is directly prohibited in SGML, and consequently in HTML. This fact is mentioned, for example, in the W3C recommendation: “…overlapping is illegal in SGML…”. All these markup languages perceive the document as a tree structure, and therefore it is not possible to cross tags.

I am also responding to a reformulation of the myth: “The advantage of XHTML is the prohibition of crossing tags.” This is not the case; tags cannot be crossed in any existing version of HTML or XHTML.

Myth: XHTML banned presentation elements and introduced CSS

Not at all. XHTML contains the same sort of elements as HTML 4.01. This is mentioned right in the first paragraph of the XHTML specification: “The meaning of elements and their attributes is defined in the W3C recommendation for HTML 4.” From this perspective, there is no difference between XHTML and HTML.

Some elements and attributes were deprecated already in HTML 4.01. Presentation elements are forbidden in favor of CSS, which also answers the second part of the myth: the arrival of cascading styles with XHTML is unrelated, having occurred earlier.

Myth: HTML parser must guess tag endings

Not at all. In HTML, for a defined group of elements, the ending or starting tag can optionally be omitted. This is for elements where omitting the tag cannot cause ambiguity. As an example, take the ending tag for the p element. Since the standard states that a paragraph cannot be inside another paragraph, it is clear by writing…

<p>....
<p>....

…that by opening the second paragraph, the first must close. Therefore, stating the ending tag is redundant. However, for example, the div element can be nested within itself, so both the starting and ending tags are required.

Myth: HTML attribute notation is ambiguous

Not at all. XHTML always requires enclosing attribute values in quotes or apostrophes. HTML also requires this, except if the value consists of an alphanumeric string. For completeness, I add that even in these cases, the specification recommends using quotes.

Thus, in HTML it is permissible to write <textarea cols=20 rows=30>, which is formally as unambiguous as <textarea cols="20" rows="30">. If the value contained multiple words, HTML insists on using quotes.

Myth: HTML document is ambiguous

Not at all. The reasons given for ambiguity are either the possibility of crossing tags, ambiguity in writing attributes without quotes, which are already debunked myths, or also the possibility of omitting some tags. Here I repeat that the group of elements where tags can be omitted is chosen so as to omit only redundant information.

Thus, an HTML document is always unambiguously determined.

Myth: Only in XHTML is the ‘&’ character written as ‘&’

Not at all – it must also be written that way in HTML. For both languages, the characters < and & have a specific meaning. The first opens a tag and the second an entity. To prevent them from being understood in their meta-meaning, they must be written as an entity. Thus also in HTML, as stated by the specification.

Myth: HTML allows ‘messes’ that would not pass in XHTML

Not at all. This view is rooted in a series of myths that I have already refuted above. I haven't yet mentioned that XHTML, unlike HTML, is case sensitive for element and attribute names. However, this is a completely legitimate feature of the language. In this way, Visual Basic differs from C#, and it cannot objectively be said that one or the other approach is worse. HTML code can be made confusing by inappropriately mixing upper and lower case (<tAbLe>), XML code can also be confusing by using strings like id, ID, Id for different attributes.

The clarity of the notation in no way relates to the choice of one language over the other.

Myth: Parsing XHTML is much easier

Not at all. Comparing them would be subjective and therefore has no place in this article, but objectively, there is no reason why one parser should have a significantly easier time. Each has its own set of challenges.

Parsing HTML is conditioned by the fact that the parser must know the document type definition. The first reason is the existence of optional tags. Although their addition is unambiguous (see above) and algorithmically easy to handle, the parser must know the respective definition. The second reason concerns empty elements. That an element is empty is known to the parser only from the definition.

Parsing XHTML is complicated by the fact that the document can (unlike HTML) contain an internal subset DTD with the definition of its own entities (see example). I add that an “entity” does not have to represent a single character, but any lengthy segment of XHTML code (possibly containing further entities). Without processing the DTD and verifying its correctness, we cannot talk about parsing XHTML. Furthermore, syntactically, DTD is essentially the opposite of XML language.

In summary: both HTML and XHTML parsers must know the document type definition. The XHTML parser additionally must be able to read it in DTD language.

Myth: Parsing XHTML is much faster

In terms of the syntactic similarity of both languages, the speed of parsing is only determined by the skill of the programmers of the individual parsers. The time required for machine processing of a typical web page (whether HTML or XHTML) on a regular computer is imperceptible to human perception.

Myth: HTML parser must always cope

Not at all. The HTML specification does not dictate how an application should behave in case of processing an erroneous document. Due to competitive pressures in the real world, browsers have become completely tolerant of faulty HTML documents.

It is different in the case of XHTML. The specification, by referring to XML dictates that the parser must not continue processing the logical structure of the document in case of an error. Again, due to competitive pressures in the real world, RSS readers have become tolerant of faulty XML documents (RSS is an application of XML, just like XHTML).

If we were to deduce something negative about HTML from the tolerance of web browsers, then we must necessarily deduce something negative about XML from the tolerance of RSS readers. Objectively, the draconian approach of XML to errors in documents is utopian.

Conclusion?

If your mind is no longer burdened by any of the myths mentioned above, you can better perceive the difference between HTML and XHTML. Or rather, you can better perceive that there is no difference. The real difference occurs a level higher: it is the departure from SGML and the transition to the new XML.

Unfortunately, it cannot be said that XML only solves the problems of SGML and adds no new ones. I have encountered two in this article alone. One of them is the draconian processing of errors in XML, which is not in line with practice, and the other is the existence of a different DTD language inside XML, which complicates parsing and the understandability of XML documents. Moreover, the expressive capability of this language is so small that it cannot formally cover even XHTML itself, so some features must be defined separately. For a language not bound by historical shackles, this is a sad and striking finding. However, criticism of XML is a topic for a separate article.

(If I encounter more myths, I will gradually update the article. If you want to refer to them, you can take advantage of the fact that each headline has its own ID)

The stretched buttons problem in IE

As you might know, web forms have to by styled with care, since their native look often is the best you can achieve.

That said, sometimes even default look has its bugs. A truly flagrant mistake concerns buttons in Internet Explorer (including version 7) in Windows XP. If the button's caption is too long, the browser produces such a nasty thing:

…pokračování

Texy2 – Even More Sexy!

Texy2 is a huge leap forward. More perfect, cleverer, highly customizable. And above all – even more sexy! Web application developers can chuckle in contentment.

Initially, Texy2 wasn’t even supposed to be released. But let's not get ahead of ourselves…

How Software Is Designed

The best analysis of a program is done by programming it. Only then do you realize what you really need from it. And only then can you write it perfectly.

I was aware of this while writing Texy 1. I didn’t want to write API documentation, I didn’t translate the website into other languages. I knew that was just a rehearsal for the real Texy.

The first version was a labor because I had to crack a ton of nuts. Figure out how to even do it. It’s not a joke. For instance, you might say: “Texy will insert non-breaking spaces between a preposition and a word.” And one might think a regular expression that finds v lese and replaces it with v lese would suffice.

But, can it handle this too: v <strong>lese</strong>? Yes, a non-breaking space belongs there too. Why wouldn’t it? Should we filter strings in angle brackets? Okay, but what about this input:

v <span title="3 > 2">lese</span>

You'd suggest more cunning HTML tag filtering? Wait, but if there’s a <br> tag, then the non-breaking space shouldn’t be there. So no filtering, but analysis instead.

Or… or just consider this! 🙂

&#x76; <span title="les > obora"> &#x0020;
<!-- hehe --></span> &#32; &#x6C;ese

It’s still about the letter v followed by a space and the word lese. Now, just try to design that regular expression in rough outline ;)

Texy2 of course can do it. And that's just one of thousands of features.

However, the precision of conversion is not the main attraction of Texy2. Nope, that’s just a manifestation of maturing older ideas. The real bombshell is the maximum customizability.

Texy is Flexible and Billable

Now you can easily change the behavior of any document element. Need to build a wiki over Texy2? I.e., control all the links on the page? It took me just a few lines of code.

Need to generate content based on headings? Want to insert flash animations using [* movie.swf *]? Want to automatically add a CSS class to all phrases "hello .(description)"? You can! And extremely easily.

Some solutions are found directly in the distribution, but mostly in the documentation, which is not yet available 🙂 At least there’s a brief changelog. I’m sorry, I have such important tasks now that there’s no time to write the manual. However, the Texy website is now designed so that creating documentation does not have to depend only on me.

Texy2 is Here

Texy2 wasn’t meant to be released. I realized that I had no motivation to release my software as open source. It comes with many limitations, in the comfort zone you won’t find (i.e., in the Czech Republic), everyone bothers with support, you encounter idiots. If it weren’t for the Giraffe & co. at the last HBWBH, I would’ve probably kept it to myself.

The revision released today with the beautiful number 111 is the first official beta version of Texy2. Download it, play around, test it.

novější články starší články