How to Deal with the Chaos of Empty Strings and NULL Values in MySQL?

You know the situation – you create a query WHERE street = '', but the system doesn't return all the records you'd expect. Or your LEFT JOIN doesn't work as it should. The reason is a common problem in databases: inconsistent use of empty strings and NULL values. Let's see how to solve this chaos once and for all.

When to Use NULL and When to Use an Empty String?

In theory, the difference is clear: NULL means “value is not set”, while an empty string means “value is set and is empty”. Let's look at a real example from an e-commerce site, where we have an orders table. Each order has a required delivery address and an optional billing address for cases where the customer wants to bill to a different location (typical checkbox “Bill to a different address”):

CREATE TABLE orders (
    id INT PRIMARY KEY,
    delivery_street VARCHAR(255) NOT NULL,
    delivery_city VARCHAR(255) NOT NULL,
    billing_street VARCHAR(255) NULL,
    billing_city VARCHAR(255) NULL
);

The billing_city and billing_street fields are nullable because the billing address is optional. But there's a difference between them. While a street can be legitimately empty (villages without street names) or unset (delivery address is used), the city must always be filled in if a billing address is used. So either billing_city contains a city name, or it's NULL – in which case the delivery address is used.

The Reality of Large Databases

In practice, both approaches often end up being mixed in the database. There can be several reasons:

Changes in application logic over time (e.g., switching from one ORM to another)
Different teams or programmers using different conventions
Buggy data migrations when merging databases
Legacy code that behaves differently than new code
Application bugs that occasionally let through an empty string instead of NULL or vice versa

This leads to situations where we have a mix of values in the database and need to write complex conditions:

SELECT * FROM tbl
WHERE foo = '' OR foo IS NULL;

Even worse is that NULL behaves unintuitive when comparing:

SELECT * FROM tbl WHERE foo = ''; -- doesn't include NULL
SELECT * FROM tbl WHERE foo <> ''; -- also doesn't include NULL

-- we must use
SELECT * FROM tbl WHERE foo IS NULL;
SELECT * FROM tbl WHERE foo <=> NULL;

This inconsistency in comparison operators' behavior is another reason why it's better to use only one way of representing empty values in the database.

Why Avoid the Dual Approach

A similar situation exists in JavaScript, where we have null and undefined. After years of experience, many JavaScript developers concluded that distinguishing between these two states brings more problems than benefits and decided to use only the system-native undefined.

In the database world, the situation is similar. Instead of constantly dealing with whether something is an empty string or NULL, it's often simpler to choose one approach and stick to it. For example, Oracle database essentially equates empty strings and NULL values, thus elegantly avoiding this problem. It's one of the places where Oracle deviates from the SQL standard, but it simplifies working with empty/NULL values.

How can we achieve something similar in MySQL?

What Do We Actually Want to Enforce?

For required fields (NOT NULL), we want to enforce that they always contain meaningful values. That means preventing empty strings (or strings containing only spaces)
For optional fields (NULL), we want to prevent storing empty strings. When a field is optional, NULL should be the only representation of an “unfilled value”. Mixing both approaches in one column leads to problems with querying and JOIN operations, as we showed above.

Solution in MySQL

Historically in MySQL, it made sense to use exclusively empty strings ('') instead of NULL values. It was the only approach that could be enforced using the NOT NULL constraint. If we wanted an automatically consistent database, this was the only way.

However, there's one important case where this approach fails – when we need a unique index on the column. MySQL considers multiple empty strings as the same value, while multiple NULL values are considered different.

However, since MySQL version 8.0.16, we can use CHECK constraints and have more control over what values we allow. We can, for example, enforce that a column will either be NULL or contain a non-empty string:

CREATE TABLE users (
    id INT PRIMARY KEY,

    -- Required field - must contain some non-empty text
    email VARCHAR(255) NOT NULL UNIQUE
        CONSTRAINT email_not_empty      -- rule name
        CHECK (email != ''),

    -- Optional field - either NULL or non-empty text
    nickname VARCHAR(255)
        CONSTRAINT nickname_not_empty
        CHECK (nickname IS NULL OR nickname != '')
);

When creating a CHECK constraint, it's important to give it a meaningful name using the CONSTRAINT keyword. This way, we get a meaningful error message Check constraint ‘nickname_not_empty’ is violated instead of a generic constraint violation notice. This significantly helps with debugging and application maintenance.

The problem isn't just empty strings, but also strings containing only spaces. We can improve the CHECK constraint solution using the TRIM function:

CREATE TABLE users (
    id INT PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE
        CONSTRAINT email_not_empty
        CHECK (TRIM(email) != ''),
   ...
);

Now these validation bypass attempts won't work either:

INSERT INTO users (email) VALUES ('   ');  -- all spaces

Practical Solution in Nette Framework

A consistent approach to empty values needs to be handled at the application level too. If you're using Nette Framework, you can use an elegant solution using the setNullable() method:

$form = new Form;
$form->addText('billing_street')
    ->setNullable(); // empty input transforms to NULL

Recommendations for Practice

At the start of the project, decide on one approach:
- Either use only NULL for missing values
- Or use only empty strings for empty/missing values
Document this decision in the project documentation
Use CHECK constraints to enforce consistency
For existing projects:
- Conduct an audit of the current state
- Prepare a migration script to unify the approach
- Don't forget to adjust application logic

With this approach, you'll avoid many problems with comparing, indexing, and JOIN operations that arise from mixing NULL and empty strings. Your database will be more consistent and queries simpler.