You know the situation – you create a query
WHERE street = ''
, but the system doesn't return all the records
you'd expect. Or your LEFT JOIN doesn't work as it should. The reason is a
common problem in databases: inconsistent use of empty strings and NULL values.
Let's see how to solve this chaos once and for all.
When to Use NULL and When to Use an Empty String?
In theory, the difference is clear: NULL means “value is not set”, while an empty string means “value is set and is empty”. Let's look at a real example from an e-commerce site, where we have an orders table. Each order has a required delivery address and an optional billing address for cases where the customer wants to bill to a different location (typical checkbox “Bill to a different address”):
CREATE TABLE orders (
id INT PRIMARY KEY,
delivery_street VARCHAR(255) NOT NULL,
delivery_city VARCHAR(255) NOT NULL,
billing_street VARCHAR(255) NULL,
billing_city VARCHAR(255) NULL
);
The billing_city
and billing_street
fields are
nullable because the billing address is optional. But there's a difference
between them. While a street can be legitimately empty (villages without street
names) or unset (delivery address is used), the city must always be filled in if
a billing address is used. So either billing_city
contains a city
name, or it's NULL – in which case the delivery address is used.
The Reality of Large Databases
In practice, both approaches often end up being mixed in the database. There can be several reasons:
- Changes in application logic over time (e.g., switching from one ORM to another)
- Different teams or programmers using different conventions
- Buggy data migrations when merging databases
- Legacy code that behaves differently than new code
- Application bugs that occasionally let through an empty string instead of NULL or vice versa
This leads to situations where we have a mix of values in the database and need to write complex conditions:
SELECT * FROM tbl
WHERE foo = '' OR foo IS NULL;
Even worse is that NULL behaves unintuitive when comparing:
SELECT * FROM tbl WHERE foo = ''; -- doesn't include NULL
SELECT * FROM tbl WHERE foo <> ''; -- also doesn't include NULL
-- we must use
SELECT * FROM tbl WHERE foo IS NULL;
SELECT * FROM tbl WHERE foo <=> NULL;
This inconsistency in comparison operators' behavior is another reason why it's better to use only one way of representing empty values in the database.
Why Avoid the Dual Approach
A similar situation exists in JavaScript, where we have null
and undefined
. After years of experience, many JavaScript
developers concluded that distinguishing between these two states brings more
problems than benefits and decided to use only the system-native
undefined
.
In the database world, the situation is similar. Instead of constantly dealing with whether something is an empty string or NULL, it's often simpler to choose one approach and stick to it. For example, Oracle database essentially equates empty strings and NULL values, thus elegantly avoiding this problem. It's one of the places where Oracle deviates from the SQL standard, but it simplifies working with empty/NULL values.
How can we achieve something similar in MySQL?
What Do We Actually Want to Enforce?
- For required fields (
NOT NULL
), we want to enforce that they always contain meaningful values. That means preventing empty strings (or strings containing only spaces) - For optional fields (
NULL
), we want to prevent storing empty strings. When a field is optional, NULL should be the only representation of an “unfilled value”. Mixing both approaches in one column leads to problems with querying and JOIN operations, as we showed above.
Solution in MySQL
Historically in MySQL, it made sense to use exclusively empty strings ('')
instead of NULL values. It was the only approach that could be enforced using
the NOT NULL
constraint. If we wanted an automatically consistent
database, this was the only way.
However, there's one important case where this approach fails – when we need a unique index on the column. MySQL considers multiple empty strings as the same value, while multiple NULL values are considered different.
However, since MySQL version 8.0.16, we can use CHECK constraints and have more control over what values we allow. We can, for example, enforce that a column will either be NULL or contain a non-empty string:
CREATE TABLE users (
id INT PRIMARY KEY,
-- Required field - must contain some non-empty text
email VARCHAR(255) NOT NULL UNIQUE
CONSTRAINT email_not_empty -- rule name
CHECK (email != ''),
-- Optional field - either NULL or non-empty text
nickname VARCHAR(255)
CONSTRAINT nickname_not_empty
CHECK (nickname IS NULL OR nickname != '')
);
When creating a CHECK constraint, it's important to give it a meaningful name using the CONSTRAINT keyword. This way, we get a meaningful error message Check constraint ‘nickname_not_empty’ is violated instead of a generic constraint violation notice. This significantly helps with debugging and application maintenance.
The problem isn't just empty strings, but also strings containing only
spaces. We can improve the CHECK constraint solution using the TRIM
function:
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE
CONSTRAINT email_not_empty
CHECK (TRIM(email) != ''),
...
);
Now these validation bypass attempts won't work either:
INSERT INTO users (email) VALUES (' '); -- all spaces
Practical Solution in Nette Framework
A consistent approach to empty values needs to be handled at the application
level too. If you're using Nette Framework, you can use an elegant solution
using the setNullable()
method:
$form = new Form;
$form->addText('billing_street')
->setNullable(); // empty input transforms to NULL
Recommendations for Practice
- At the start of the project, decide on one approach:
- Either use only NULL for missing values
- Or use only empty strings for empty/missing values
- Document this decision in the project documentation
- Use CHECK constraints to enforce consistency
- For existing projects:
- Conduct an audit of the current state
- Prepare a migration script to unify the approach
- Don't forget to adjust application logic
With this approach, you'll avoid many problems with comparing, indexing, and JOIN operations that arise from mixing NULL and empty strings. Your database will be more consistent and queries simpler.
Leave a comment