JavaScript offers three ways to declare variables: var,
let, and const. Many programmers aren't entirely clear
on when to use which one, and most tutorials and linters force you to use them
incorrectly. Let's see how to write cleaner and more understandable code
without unnecessary rules that don't actually help us.
Let's Start with the Most
Dangerous Part
JavaScript has one treacherous quirk: by simply omitting a variable
declaration, you can unknowingly use a global variable. All it takes is
forgetting var, let, or const:
function calculatePrice(amount) {
price = amount * 100; // Omission! Missing 'let'
return price; // We're using a global variable 'price'
}
function processOrder() {
price = 0; // We're using the same global variable!
// ... some code calling calculatePrice()
return price; // We're returning a completely different value than expected
}
This is every developer's nightmare – the code appears to work correctly
until something mysteriously starts failing elsewhere in the application.
Debugging such errors can take hours because a global variable can be
overwritten anywhere in the application.
That's why it's absolutely crucial to always declare variables using
let or const.
Forget About var
The var keyword has been in JavaScript since its inception in
1995 and carries some problematic properties that were considered features at
the time of the language's creation but proved to be a source of many bugs over
time. After twenty years of language development, JavaScript's authors decided
to address these problems – not by fixing var (to maintain
backward compatibility) but by introducing the new let keyword in
ES2015.
You can find plenty of articles on the internet dissecting the problems with
var in the finest detail. But you know what? There's no need to
get bogged down in the details. Let's just treat var as a relic of
the past and focus on modern JavaScript.
When to Use let
let is the modern way to declare variables in JavaScript.
The nice thing is that the variable only exists within the code block
(between curly braces) where it was defined. This makes the code more
predictable and safer.
if (someCondition) {
let temp = calculateSomething();
// temp is only available here
}
// temp no longer exists here
In loops, the declaration is technically placed before the curly braces, but
don't let that confuse you – the variable only exists within the loop:
for (let counter = 0; counter < 10; counter++) {
// The counter variable only exists in the loop
}
// counter is no longer accessible here
When to Use const
const is used to declare constants. These are typically
important values at the module or application level that should never
change:
However, it's important to understand one key detail: const only prevents
assigning a new value to the variable – it doesn't control what happens
with the value itself. This distinction is particularly evident with objects and
arrays (an array is also an object) – const doesn't make them
immutable objects, i.e., it doesn't prevent changes inside the object:
If you need a truly immutable object, you need to freeze
it first.
The let vs const
Dilemma
Now we come to a more interesting question. While the situation with
var vs let is clear, the use of const is
the subject of many community discussions. Most tutorials, style guides, and
linters promote the rule “use const wherever you can.” So we
commonly see const used in function or method bodies.
Let's explain why this popular “best practice” is actually an
anti-pattern that makes code less readable and unnecessarily restrictive.
The approach “if a variable's value isn't reassigned in the code, it
should be declared as const” seems logical at first glance. Why
else would const even exist? The more “constants,” the safer
and more predictable the code, right? And faster too, because the compiler can
better optimize it.
However, this entire approach fundamentally misunderstands the purpose of
constants. It's primarily about communicating intent – are we truly
trying to signal to other developers that this variable should never be
reassigned, or do we just happen not to reassign it in our current
implementation?
// Real constants - values that are constant by their nature
const PI = 3.14159;
const DAYS_IN_WEEK = 7;
const API_ENDPOINT = 'https://api.example.com';
// vs.
function processOrder(items) {
// These AREN'T constants, we just happen to not reassign them
const total = items.reduce((sum, item) => sum + item.price, 0);
const tax = total * 0.21;
const shipping = calculateShipping(total);
return { total, tax, shipping };
}
In the first case, we have values that are constants by their nature –
they express immutable properties of our system or important configuration data.
When we see PI or API_ENDPOINT somewhere in the code,
we immediately understand why these values are constants.
In the second case, we're using const just because we happen to
not reassign the values right now. But that's not their essential
characteristic – these are regular variables that we might want to change
in the next version of the function. And when we want to do that,
const will unnecessarily prevent us.
In the days when JavaScript was one big global code, it made sense to try to
secure variables against reassignment. But today we write code in modules and
classes. Today it's common and correct that the scope is a small function, and
within its scope, it makes no sense to worry about the difference between
let and const.
Because it creates completely unnecessary mental overhead:
The programmer has to think while writing: “Will I change this value? No?
Then I must use const…”
It distracts readers! When they see const in the code, they
wonder: “Why is this a constant? Is this some important value? Does it have
any significance?”
In a month we need to change the value and have to deal with: “Can
I change const to let? Is someone relying on this?”
Simply use let and you don't have to deal with these
questions at all.
It's even worse when this decision is made automatically by a linter. That
is, when the linter “fixes” variables to const because it only sees one
assignment. The code reader then unnecessarily wonders: “Why must these
variables be constants here? Is it somehow important?” And yet it's not
important – it's just a coincidence. Don't use the prefer-const
rule in ESLint!
By the way, the optimization argument is a myth. Modern JavaScript engines
(like V8) can easily detect whether a variable is reassigned or not, regardless
of whether it was declared using let or const. So
using const provides no performance benefit.
Implicit Constants
In JavaScript, there are several constructs that implicitly create constants
without us having to use the const keyword:
// imported modules
import { React } from 'react';
React = something; // TypeError: Assignment to constant variable
// functions
function add(a, b) { return a + b; }
add = something; // TypeError: Assignment to constant variable
// classes
class User {}
User = something; // TypeError: Assignment to constant variable
This makes sense – these constructs define the basic building blocks of
our code, and overwriting them could cause chaos in the application. That's why
JavaScript automatically protects them against reassignment, just as if they
were declared using const.
Constants in Classes
Classes were added to JavaScript relatively recently (in ES2015), and their
functionality is still gradually maturing. For example, private members marked
with # didn't arrive until 2022. JavaScript is still waiting for
class constant support. For now, you can use static, but it's far
from the same thing – it marks a value shared between all class instances,
not an immutable one.
Conclusion
Don't use var – it's outdated
Use const for real constants at the module level
In functions and methods, use let – it's more readable and
clearer
Don't let the linter automatically change let to
const – it's not about the number of assignments, but
about intent
You know the situation – you create a query
WHERE street = '', but the system doesn't return all the records
you'd expect. Or your LEFT JOIN doesn't work as it should. The reason is a
common problem in databases: inconsistent use of empty strings and NULL values.
Let's see how to solve this chaos once and for all.
When to Use NULL
and When to Use an Empty String?
In theory, the difference is clear: NULL means “value is not set”, while
an empty string means “value is set and is empty”. Let's look at a real
example from an e-commerce site, where we have an orders table. Each order has a
required delivery address and an optional billing address for cases where the
customer wants to bill to a different location (typical checkbox “Bill to a
different address”):
CREATE TABLE orders (
id INT PRIMARY KEY,
delivery_street VARCHAR(255) NOT NULL,
delivery_city VARCHAR(255) NOT NULL,
billing_street VARCHAR(255) NULL,
billing_city VARCHAR(255) NULL
);
The billing_city and billing_street fields are
nullable because the billing address is optional. But there's a difference
between them. While a street can be legitimately empty (villages without street
names) or unset (delivery address is used), the city must always be filled in if
a billing address is used. So either billing_city contains a city
name, or it's NULL – in which case the delivery address is used.
The Reality of Large Databases
In practice, both approaches often end up being mixed in the database. There
can be several reasons:
Changes in application logic over time (e.g., switching from one ORM to
another)
Different teams or programmers using different conventions
Buggy data migrations when merging databases
Legacy code that behaves differently than new code
Application bugs that occasionally let through an empty string instead of
NULL or vice versa
This leads to situations where we have a mix of values in the database and
need to write complex conditions:
SELECT * FROM tbl
WHERE foo = '' OR foo IS NULL;
Even worse is that NULL behaves unintuitive when comparing:
SELECT * FROM tbl WHERE foo = ''; -- doesn't include NULL
SELECT * FROM tbl WHERE foo <> ''; -- also doesn't include NULL
-- we must use
SELECT * FROM tbl WHERE foo IS NULL;
SELECT * FROM tbl WHERE foo <=> NULL;
This inconsistency in comparison operators' behavior is another reason why
it's better to use only one way of representing empty values in the
database.
Why Avoid the Dual Approach
A similar situation exists in JavaScript, where we have null
and undefined. After years of experience, many JavaScript
developers concluded that distinguishing between these two states brings more
problems than benefits and decided to use only the system-native
undefined.
In the database world, the situation is similar. Instead of constantly
dealing with whether something is an empty string or NULL, it's often simpler
to choose one approach and stick to it. For example, Oracle database essentially
equates empty strings and NULL values, thus elegantly avoiding this problem.
It's one of the places where Oracle deviates from the SQL standard, but it
simplifies working with empty/NULL values.
How can we achieve something similar in MySQL?
What Do We Actually Want to
Enforce?
For required fields (NOT NULL), we want to enforce that they
always contain meaningful values. That means preventing empty strings (or
strings containing only spaces)
For optional fields (NULL), we want to prevent storing empty
strings. When a field is optional, NULL should be the only representation of an
“unfilled value”. Mixing both approaches in one column leads to problems
with querying and JOIN operations, as we showed above.
Solution in MySQL
Historically in MySQL, it made sense to use exclusively empty strings ('')
instead of NULL values. It was the only approach that could be enforced using
the NOT NULL constraint. If we wanted an automatically consistent
database, this was the only way.
However, there's one important case where this approach fails – when we
need a unique index on the column. MySQL considers multiple empty strings as the
same value, while multiple NULL values are considered different.
However, since MySQL version 8.0.16, we can use CHECK constraints and have
more control over what values we allow. We can, for example, enforce that a
column will either be NULL or contain a non-empty string:
CREATE TABLE users (
id INT PRIMARY KEY,
-- Required field - must contain some non-empty text
email VARCHAR(255) NOT NULL UNIQUE
CONSTRAINT email_not_empty -- rule name
CHECK (email != ''),
-- Optional field - either NULL or non-empty text
nickname VARCHAR(255)
CONSTRAINT nickname_not_empty
CHECK (nickname IS NULL OR nickname != '')
);
When creating a CHECK constraint, it's important to give it a meaningful
name using the CONSTRAINT keyword. This way, we get a meaningful error message
Check constraint ‘nickname_not_empty’ is violated instead of a
generic constraint violation notice. This significantly helps with debugging and
application maintenance.
The problem isn't just empty strings, but also strings containing only
spaces. We can improve the CHECK constraint solution using the TRIM
function:
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE
CONSTRAINT email_not_empty
CHECK (TRIM(email) != ''),
...
);
Now these validation bypass attempts won't work either:
INSERT INTO users (email) VALUES (' '); -- all spaces
Practical Solution in Nette
Framework
A consistent approach to empty values needs to be handled at the application
level too. If you're using Nette Framework, you can use an elegant solution
using the setNullable() method:
$form = new Form;
$form->addText('billing_street')
->setNullable(); // empty input transforms to NULL
Recommendations for Practice
At the start of the project, decide on one approach:
Either use only NULL for missing values
Or use only empty strings for empty/missing values
Document this decision in the project documentation
Use CHECK constraints to enforce consistency
For existing projects:
Conduct an audit of the current state
Prepare a migration script to unify the approach
Don't forget to adjust application logic
With this approach, you'll avoid many problems with comparing, indexing, and
JOIN operations that arise from mixing NULL and empty strings. Your database
will be more consistent and queries simpler.
Renaming values in a MySQL ENUM column can be tricky. Many
developers attempt a direct change, which often results in data loss or errors.
We'll show you the correct and safe way to do it.
Imagine a typical scenario: You have an orders table in your
database with a status column of type ENUM. It contains the values
waiting_payment, processing, shipped, and
cancelled. The requirement is to rename
waiting_payment to unpaid and shipped to
completed. How can this be done without risk?
What Doesn't Work
First, let's look at what does not work. Many developers try this
straightforward approach:
-- THIS DOES NOT WORK!
ALTER TABLE orders
MODIFY COLUMN status ENUM(
'unpaid', -- previously 'waiting_payment'
'processing', -- unchanged
'completed', -- previously 'shipped'
'cancelled' -- unchanged
);
This approach is a recipe for disaster. MySQL will attempt to map existing
values to the new ENUM, and since the original values are no longer in the
definition, it will either replace them with an empty string or return the error
Data truncated for column 'status' at row X. In a production
database, this would mean losing important data.
Backup First!
Before making any structural changes to your database, it is absolutely
crucial to create a data backup. Use MySQL-dump or another
trusted tool.
The Correct Approach
The correct approach consists of three steps:
First, extend the ENUM with new values.
Update the data.
Finally, remove the old values.
Let's go through it step by step:
1. The first step is to add the new values to the ENUM while keeping the
original ones:
ALTER TABLE orders
MODIFY COLUMN status ENUM(
'waiting_payment', -- original value
'processing', -- unchanged
'shipped', -- original value
'cancelled', -- unchanged
'unpaid', -- new value (replaces waiting_payment)
'completed' -- new value (replaces shipped)
);
2. Now we can safely update the existing data:
UPDATE orders SET status = 'unpaid' WHERE status = 'waiting_payment';
UPDATE orders SET status = 'completed' WHERE status = 'shipped';
3. Finally, once all data has been converted to the new values, we can
remove the old ones:
ALTER TABLE orders
MODIFY COLUMN status ENUM(
'unpaid',
'processing',
'completed',
'cancelled'
);
Why Does This Work?
This works because of how MySQL handles ENUM values. When performing an
ALTER TABLE modification on an ENUM column, MySQL tries to map
existing values based on their textual representation. If the original value
does not exist in the new ENUM definition, MySQL will either throw an error (if
STRICT_ALL_TABLES is enabled in sql_mode) or replace
it with an empty string.
That's why it's crucial to have both old and new values present in the ENUM
simultaneously during the transition phase. In our case, this ensures that every
record in the database retains its exact textual equivalent. Only after
executing the UPDATE queries—when we are sure that all data is
using the new values—can we safely remove the old ones.
What if I told you your PHP objects could be cleaner, more elegant, and
easier to work with? Well, that dream is now a reality! PHP 8.4 introduces
revolutionary features called property hooks and asymmetric
visibility that completely transform object-oriented programming as we know
it. Say goodbye to clunky getters and setters – we now have a modern,
intuitive way to control object data access. Let's explore how these features
can revolutionize your code.
Property hooks provide a smart way to define what happens when you read from
or write to object properties – and they're much cleaner and more efficient
than the traditional magic methods __get/__set. Think of it as
getting all the power of magic methods without any of their usual drawbacks.
Let's look at a real-world example that shows why property hooks are so
valuable. Consider a common Person class with a public
age property:
class Person
{
public int $age = 0;
}
$person = new Person;
$person->age = 25; // OK
$person->age = -5; // OK, but that makes no sense!
While PHP ensures the age will be an integer thanks to the int
type (available since PHP 7.4), what about that negative age? In the past, we'd
need getters and setters, make the property private, and write a bunch of
boilerplate code. With hooks, there's a much more elegant solution:
class Person
{
public int $age = 0 {
set => $value >= 0 ? $value : throw new InvalidArgumentException;
}
}
$person->age = -5; // Oops! InvalidArgumentException warns us about the invalid value
The beauty lies in its simplicity – from the outside, the property behaves
exactly like before. You can read and write directly through
$person->age, but now you have complete control over what
happens during the write operation. And that's just scratching the surface!
We can take it further and create hooks for reading too. Hooks can have
attributes, and they can contain complex logic beyond simple expressions. Check
out this example of working with names:
class Person
{
public string $first;
public string $last;
public string $fullName {
get {
return "$this->first $this->last";
}
set(string $value) {
[$this->first, $this->last] = explode(' ', $value, 2);
}
}
}
$person = new Person;
$person->fullName = 'James Bond';
echo $person->first; // outputs 'James'
echo $person->last; // outputs 'Bond'
Here's something crucial to understand: hooks are always used whenever a
property is accessed (even within the Person class itself). The only exception
is when you directly access the actual variable inside the hook code.
A Blast from the
Past: Lessons from SmartObject
For those familiar with Nette Framework, here's an interesting historical
perspective. The framework offered similar functionality 17 years ago
through SmartObject,
which significantly enhanced object handling at a time when PHP was quite
limited in this area.
I remember the initial wave of overwhelming enthusiasm where developers used
properties everywhere, followed by a complete reversal where they avoided them
entirely. Why? There weren't clear guidelines about when to use methods versus
properties. But today's native solution is in a different league altogether.
Property hooks and asymmetric visibility are fully-fledged tools that provide
the same level of control as methods. This makes it much easier to determine
when a property is truly the right choice.
Picture this: data that's as stable as bedrock – set it once, and it
stays that way forever. That's exactly what PHP 8.1 delivered with readonly
properties. Think of it as giving your objects a safety vault – keeping their
data secure from accidental changes. Let's explore how this powerful feature
can streamline your code and what gotchas you need to watch out for.
Here's a quick taste of what we're talking about:
class User
{
public readonly string $name;
public function setName(string $name): void
{
$this->name = $name; // First assignment - all OK
}
}
$user = new User;
$user->setName('John'); // Great, name is set
echo $user->name; // "John"
$user->setName('Jane'); // BOOM! Exception: Cannot modify readonly property
Once that name is set, it's locked in place. No accidental changes, no
sneaky updates.
When is uninitialized
really uninitialized?
Here's a common misconception: many developers think readonly properties
must be set in the constructor. But PHP is actually much more flexible than
that – you can set them at any point in an object's lifecycle, with one
crucial rule: only once! Before that first assignment, they exist in a special
‘uninitialized’ state – think of it as a blank slate waiting for its
first and only value.
Here's an interesting twist – readonly properties can't have default
values. Why? Think about it: if they had default values, they'd essentially be
constants – set at object creation and unchangeable from that point on.
Types are mandatory
When using readonly properties, you must explicitly declare their type. This
isn't just PHP being picky – the ‘uninitialized’ state only works with
typed variables. No type declaration means no readonly variable. Don't know the
exact type? No worries – you can always fall back on mixed.
Do you know what you should NEVER, and I mean NEVER, say to open-source
project authors? “I don't have time.” These two words can destroy a
developer’s motivation faster than an iPhone battery drains while scrolling
TikTok.
“I don't have time to write a fix.”
“I don't have time to create a bug report.”
“This should be in the documentation, but I don’t have time to
write it.”
Really? REALLY?!
Imagine you're at a party, and someone says to you: “Hey, you with the
beer! Make me a sandwich. I don’t have time to make it myself, I’m too busy
eating chips.” How would you feel? Like a vending machine with a face?
That’s exactly how I feel when I read words like that. My motivation to
help vanishes instantly, and I feel the urge to do anything else — even
absolutely nothing.
You see, we open source developers are a peculiar breed. We spend hours of
our free time creating software that we then make available to everyone. For
free. Voluntarily. It’s like Santa Claus handing out gifts every day of the
year, not just on Christmas. We enjoy it. But that doesn’t give anyone the
right to boss us around like we’re some kind of digital slaves. So, when
someone comes with a request for a new feature but “doesn’t have time” to
contribute, it immediately raises the question, “Why should I have the time
then?” It’s like asking Michelangelo to paint your living room because you
“don’t have time” to do it yourself — as if he has nothing better
to do.
Over the years, I’ve accumulated dozens of issues across various projects
where I’ve asked, “Could you prepare a pull request?” and the reply was,
“I could, but I don’t have time this week.” If that poor soul hadn’t
written that sentence, I probably would’ve solved the issue long ago. But by
saying that, they basically told me they don’t value my time. So, did they fix
it themselves a week later? Not at all… 99% of the things people
promised to do were never delivered, which is why 99% of those issues remain
unresolved. They hang there like digital monuments to human laziness.
So, dear users, before you write “I don’t have time,” think again.
What you’re really saying is, “Hey, you! Your free time is worthless. Drop
everything you’re doing and deal with MY problem!” Instead, try this:
Find the time. Trust me, it’s there. It might be hiding between episodes
of your favorite show or in the time you spend scrolling through
social media.
Offer a solution. You don’t need to submit a full patch. Just show that
you’ve given it some real thought.
Motivate open source maintainers to take up your issue. For example, by
showing how the change will be useful not just for you, but for the whole of
humanity and the surrounding universe.
Next time you find a bug, request a new feature, or notice something missing
from the documentation, try to help out the community in some way. Because in
the open-source world, we’re all in the same boat. And to keep it moving
forward, we all need to row. So don’t just sit there complaining that you
“don’t have time” to paddle — grab an oar and do your part. Saying
“I don’t have time” is the fastest way to kill the motivation of those
who are giving you free software. Try to carve out those few minutes or hours.
Your karma will thank you.
SQL, which emerged in the 1970s, represented a revolutionary breakthrough in
human-computer interaction. Its design aimed to make queries as readable and
writable as possible, resembling plain English. For instance, a query to fetch
names and salaries of employees in SQL might look like this:
SELECT name, salary FROM employee – simple and comprehensible,
right? This made databases accessible to a broader audience, not just
computer nerds.
Although this intention was commendable, it soon became clear that writing
SQL queries still required experts. Moreover, because a universal application
interface never emerged, using natural language for modern programming became
more of a burden. Programmers today communicate with databases by writing
complex SQL command generators, which databases then decode.
Enter GPT. These advanced language models bring a similar revolution in the
era of artificial intelligence as SQL did for databases in its time. They enable
interaction with databases on an entirely new level, 50 years advanced.
Although there is currently no database directly controlled by a language model,
we can use GPT to generate SQL. And it works great.
How to do it? Just copy the structure of your database, i.e., a dump with
CREATE TABLE commands, into ChatGPT and include the name of the
database server you are using. You can then ask questions directly
in Czech.
If you want to make using this technique even easier, use the prepared SQL Wizard, where you
just need to copy the database structure and you can start asking questions
right away without further explanation.
Let's see a practical example. We will use two database tables linked
through a foreign key. The users table will contain information
about users, and the orders table will store information about
orders made by users. Here are the definitions of these tables:
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
created_at DATE,
amount DECIMAL(10, 2),
FOREIGN KEY (user_id) REFERENCES users(id)
);
Copy these definitions into ChatGPT and you can immediately ask naturally and
in Czech, for example, about all orders from user Jan Novák. Specifically,
I would ask: “Write SQL that lists all orders from user Jan Novák.” And
GPT might respond like this:
SELECT orders.order_id, orders.created_at, orders.amount
FROM orders
JOIN users ON orders.user_id = users.id
WHERE users.name = 'Jan Novák';
Many programmers can, of course, write such an SQL query faster than logging
into ChatGPT 🙂 But let's try a slightly more complex example: “List users
who have only one order.”
SELECT users.id, users.name, users.email
FROM users
JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name, users.email
HAVING COUNT(orders.order_id) = 1;
Or “List users who have no orders.”
SELECT users.*
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE orders.order_id IS NULL;
Here the fundamental difference in approach is evident. While with GPT,
I specify what data I want to get, in SQL, I have to describe how to get it.
And that requires knowledge and experience.
Thanks to GPT, you can focus on what you want to get, instead of how to get
it, which is especially valuable for those who are not SQL experts.
Finally, let's demonstrate how to use GPT to create database triggers. For
example, you can write: “Add a column order_count to the
users table that will track the number of orders. Initialize it to
the value corresponding to the current number of orders for each user. And
create a trigger that will update the value when an order is created or
deleted.”
GPT will generate queries for adding and initializing the
order_count column:
ALTER TABLE users ADD COLUMN order_count INT DEFAULT 0;
UPDATE users SET order_count = (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id);
and the required triggers to keep its value up to date:
CREATE TRIGGER update_order_count_after_insert
AFTER INSERT ON orders
FOR EACH ROW
BEGIN
UPDATE users
SET order_count = order_count + 1
WHERE id = NEW.user_id;
END;
CREATE TRIGGER update_order_count_after_delete
AFTER DELETE ON orders
FOR EACH ROW
BEGIN
UPDATE users
SET order_count = order_count - 1
WHERE id = OLD.user_id;
END;
GPT offers a way to work effectively and intuitively with databases, even for
those who are not SQL experts. It's a revolutionary tool that truly makes
advanced database operations accessible to the general public. However, it is
still crucial to carefully check each output to ensure data correctness and
security.
Software development often presents dilemmas, such as how to
handle situations when a getter has nothing to return. In this article, we'll
explore three strategies for implementing getters in PHP, which affect the
structure and readability of code, each with its own specific advantages and
disadvantages. Let's take a closer look.
Universal Getter with a
Parameter
The first solution, used in Nette, is to create a single getter method that
can either return null or throw an exception if the value is not
available, depending on a boolean parameter. Here is an example of what the
method might look like:
public function getFoo(bool $need = true): ?Foo
{
if (!$this->foo && $need) {
throw new Exception("Foo not available");
}
return $this->foo;
}
The main advantage of this approach is that it eliminates the need to have
several versions of the getter for different use cases. A former disadvantage
was the poor readability of user code using boolean parameters, but this has
been resolved with the introduction of named parameters, allowing you to write
getFoo(need: false).
However, this approach may cause complications in static analysis, as the
signature implies that getFoo() can return null under
any circumstances. Tools like PHPStan allow explicit documentation of method
behavior through special annotations, improving code understanding and its
correct analysis:
/** @return ($need is true ? Foo : ?Foo) */
public function getFoo(bool $need = true): ?Foo
{
}
This annotation clearly defines what return types the method
getFoo() can generate depending on the value of the parameter
$need. However, for instance, PhpStorm does not understand it.
Pair of Methods:
hasFoo() and getFoo()
Another option is to divide the responsibility into two methods:
hasFoo() to verify the existence of the value and
getFoo() to retrieve it. This approach enhances code clarity and is
intuitively understandable.
public function hasFoo(): bool
{
return (bool) $this->foo;
}
public function getFoo(): Foo
{
return $this->foo ?? throw new Exception("Foo not available");
}
The main problem is redundancy, especially in cases where the availability
check itself is a complex process. If hasFoo() performs complex
operations to verify if the value is available, and then this value is retrieved
again using getFoo(), these operations are repeated.
Hypothetically, the state of the object or data might change between the calls
to hasFoo() and getFoo(), leading to inconsistencies.
From a user's perspective, this approach may be less convenient as it forces
calling a pair of methods with repeating parameters. It also prevents the use of
the null-coalescing operator.
The advantage is that some static analysis tools allow defining a rule that
after a successful call to hasFoo(), no exception will be thrown in
getFoo().
Methods getFoo() and
getFooOrNull()
The third strategy is to split the functionality into two methods:
getFoo() to throw an exception if the value does not exist, and
getFooOrNull() to return null. This approach minimizes
redundancy and simplifies logic.
public function getFoo(): Foo
{
return $this->getFooOrNull() ?? throw new Exception("Foo not available");
}
public function getFooOrNull(): ?Foo
{
return $this->foo;
}
An alternative could be a pair getFoo() and
getFooIfExists(), but in this case, it might not be entirely
intuitive to understand which method throws an exception and which returns
null. A slightly more concise pair would be
getFooOrThrow() and getFoo(). Another possibility is
getFoo() and tryGetFoo().
Each of these approaches to implementing getters in PHP has its place
depending on the specific needs of the project and the preferences of the
development team. When choosing a suitable strategy, it's important to consider
the impact on readability, maintenance, and performance of the application. The
choice should reflect an effort to make the code as understandable and efficient
as possible.
Let's once and for all crack this eternal question that
divides the programming community. I decided to dive into the dark waters of
regular expressions to bring an answer (spoiler: yes, it's possible).
So, what exactly does an HTML document contain? It's a mix of text,
entities, tags, comments, and the special doctype tag. Let's first explore each
ingredient separately.
Entities
The foundation of an HTML page is text, which consists of ordinary characters
and special sequences called HTML entities. These can be either named, like
for a non-breaking space, or numerical, either in
decimal   or hexadecimal   format.
A regular expression capturing an HTML entity would look like this:
(?<entity>
&
(
[a-z][a-z0-9]+ # named entity
|
\#\d+ # decimal number
|
\#x[0-9a-f]+ # hexadecimal number
)
;
)
All regular expressions are written in extended mode, ignore case, and a
dot represents any character. That is, the modifier six.
Tags
These iconic elements make HTML what it is. A tag starts with
<, followed by the tag name, possibly a set of attributes, and
closes with > or />. Attributes can optionally
have a value, which can be enclosed in double, single, or no quotes. A regular
expression capturing an attribute would look like this:
(?<attribute>
\s+ # at least one white space before the attribute
<a href="#fns" class="footnote">[\s"'<>=`/]</a>+ # attribute name
(
\s* = \s* # equals sign before the value
(
" # value enclosed in double quotes
(
<a href="#fn" class="footnote">["]</a> # any character except double quote
|
(?&entity) # or HTML entity
)*
"
|
' # value enclosed in single quotes
(
<a href="#fn" class="footnote">[']</a> # any character except single quote
|
(?&entity) # or HTML entity
)*
'
|
<a href="#fns" class="footnote">[\s"'<>=`]</a>+ # value without quotes
)
)? # value is optional
)
Notice that I am referring to the previously defined
entity group.
Elements
An element can represent either a standalone tag (so-called void element) or
paired tags. There is a fixed list of void element names by which they are
recognized. A regular expression for capturing them would look like this:
(?<void_element>
< # start of the tag
( # element name
img|hr|br|input|meta|area|embed|keygen|source|base|col
|link|param|basefont|frame|isindex|wbr|command|track
)
(?&attribute)* # optional attributes
\s*
/? # optional /
> # end of the tag
)
Other tags are thus paired and captured by this regular expression (I use a
reference to the content group, which we will define later):
(?<element>
< # starting tag
(?<element_name>
[a-z]<a href="#fns" class="footnote">[\s/>]</a>* # element name
)
(?&attribute)* # optional attributes
\s*
> # end of the starting tag
(?&content)*
</ # ending tag
(?P=element_name) # repeat element name
\s*
> # end of the ending tag
)
A special case is elements like <script>, whose content
must be processed differently from other elements:
(?<special_element>
< # starting tag
(?<special_element_name>
script|style|textarea|title # element name
)
(?&attribute)* # optional attributes
\s*
> # end of the starting tag
(?> # atomic group
.*? # smallest possible number of any characters
</ # ending tag
(?P=special_element_name)
)
\s*
> # end of the ending tag
)
The lazy quantifier .*? ensures that the expression stops at the
first ending sequence, and the atomic group ensures that this stop is
definitive.
Comments
A typical HTML comment starts with the sequence <!-- and
ends with -->. A regular expression for HTML comments might
look like this:
(?<comment>
<!--
(?> # atomic group
.*? # smallest possible number of any characters
-->
)
)
The lazy quantifier .*? again ensures that the expression stops
at the first ending sequence, and the atomic group ensures that this stop is
definitive.
Doctype
This is a historical relic that exists today only to switch the browser to
so-called standard mode. It usually looks like
<!doctype html>, but can contain other characters as well.
Here is the regular expression that captures it:
(?<doctype>
<!doctype
\s
<a href="#fn" class="footnote">[>]</a>* # any character except '>'
>
)
Putting It All Together
With the regular expressions ready for each part of HTML, it's time to
create an expression for the entire HTML 5 document:
\s*
(?&doctype)? # optional doctype
(?<content>
(?&void_element) # void element
|
(?&special_element) # special element
|
(?&element) # paired element
|
(?&comment) # comment
|
(?&entity) # entity
|
<a href="#fn" class="footnote">[<]</a> # character
)*
We can combine all the parts into one complex regular expression. This is
it, a superhero among regular expressions with the ability to parse
HTML 5.
Final Notes
Even though we have shown that HTML 5 can be parsed using regular
expressions, the provided example is not useful for processing an HTML
document. It will fail on invalid documents. It will be slow. And so on. In
practice, regular expressions like the following are more commonly used (for
finding URLs of images):
<img.+?src=["'](.+?)["'].*?>
But this is a very unreliable solution that can lead to errors. This regexp
incorrectly matches custom tags
such as <imgs-tag src="image.jpg">, custom attributes like
<img data-src="custom info">, or fails when the attribute
contains a quote <img src="mcdonald's.jpg">. Therefore, it is
recommended to use specialized libraries. In the world of PHP, we're unlucky
because the DOM extension supports only the ancient, decaying HTML
4. Fortunately, PHP 8.4 promises an HTML 5 parser.
A video from Microsoft, intended to be a dazzling
demonstration of Copilot's capabilities, is instead a tragically comedic
presentation of the decline in programming craftsmanship.
I'm referring to this video.
It's supposed to showcase the abilities of GitHub Copilot, including how to use
it to write a regular expression for searching <img> tags
with the hero-image class. However, the original code being
modified is as holey as Swiss cheese, something I would be embarrassed to use.
Copilot gets carried away and instead of correcting, continues in the
same vein.
The result is a regular expression that unintentionally matches other
classes, tags, attributes, and so on. Worse still, it fails if the
src attribute is listed before class.
I write about this because this demonstration of shoddy work, especially
considering the official nature of the video, is startling. How is it possible
that none of the presenters or their colleagues noticed this? Or did they notice
and decide it didn't matter? That would be even more disheartening. Teaching
programming requires precision and thoroughness, without which incorrect
practices can easily be propagated. The video was meant to celebrate the art of
programming, but I see in it a bleak example of how the level of programming
craftsmanship is falling into the abyss of carelessness.
Just to give a bit of a positive spin: the video does a good job of showing
how Copilot and GPT work, so you should definitely give it a look 🙂