FK, CHECK, ENUM or DOMAIN. That is the question.

We have a customer that recently asked me to comment on which I would use for a particular problem. This is a simple validating lookup. For example, CHECK(VALUE IN ('foo','bar')). Should we use a CHECK constraint, FK, ENUM or DOMAIN?

A CHECK constraint is easy to apply and has simple syntax. It is also extremely flexible in solving other types of validating problems. If your valid values change you must DROP CONSTRAINT and ADD CONSTRAINT. You can not add an element to the CHECK.

A Foreign Key creates the requirement of a lookup table. It also offers the easiest management of valid values. You just INSERT, UPDATE or DELETE from the lookup table. If you are a smart monkey and using natural keys versus artificial ones, you can avoid the JOIN on SELECT as well.

ENUM registers as a type in PostgreSQL. This means if you use an ENUM extensively you are basically locking yourself into the use of the type. In short if you need to modify an ENUM you drop the ENUM and recreate it. You can't drop an ENUM if a relation is using it. There are some interesting functions available with ENUM but I am having a hard time seeing a use case for the type as a whole. An ENUM type in theory lends itself specifically to this type of problem so I have included it.

A DOMAIN for this problem suffers from the same problems as ENUM as it registers as a type. However a DOMAIN is more flexible as you can apply complex logic to the validation (just as you can with a CHECK). For example a DOMAIN could contain the regex to validate if a email address is correctly formed. I have used domains many times in the past to create complex validating types. They are useful.

So what does all this boil down to? I have listed the pros and cons of managing each method above but what I haven't mentioned is performance. What is the particular performance bottleneck for each method? Read on, to find out for yourself. First I created a table for the CHECK constraint test:

CREATE TABLE check_test (
   foo text CHECK(foo IN ('text','html')), 
   bar int);
Then the tables for the FK test:
CREATE TABLE fk (foo text PRIMARY KEY);
CREATE TABLE fk_test (
   foo text REFERENCES fk(foo),
   bar int);
I then created a series of 10000 queries. Each query executed 5000 times individually.
INSERT INTO check_test VALUES('text',5);
INSERT INTO check_test VALUES('html',5);
CHECK Test: 1
real	0m10.144s
user	0m0.200s
sys	0m0.292s
CHECK Test: 2
real	0m11.667s
user	0m0.356s
sys	0m0.256s
O.k. so what about Foreign Keys? FK Test 1:
real	0m11.106s
user	0m0.356s
sys	0m0.252s
FK Test 2:
real	0m11.566s
user	0m0.256s
sys	0m0.272s
O.k. about the same. What about if all 10000 are in a single transaction? CHECK Test: 3 single transaction
real	0m1.143s
user	0m0.184s
sys	0m0.180s
FK Test: 3 single transaction
real	0m1.476s
user	0m0.184s
sys	0m0.228s
O.k. this is closer than I thought it would be. I expected an FK to be much slower and in my individual tests it actually is. Just out of curiousity, what about ENUM?
CREATE TYPE content_type AS ENUM('text','html');
CREATE TABLE enum_test (foo content_type, bar int);
ENUM Test: 1
real	0m9.124s
user	0m0.288s
sys	0m0.196s
ENUM Test: 2 single transaction
real	0m1.025s
user	0m0.152s
sys	0m0.192s
O.k. one last test... what about a DOMAIN?
CREATE DOMAIN d_content_type AS text CHECK(VALUE IN ('text','html'));
CREATE TABLE domain_test (foo d_content_type, bar int);
DOMAIN Test: 1
real	0m10.860s
user	0m0.340s
sys	0m0.260s
Domain Test: 2 single transaction
real	0m1.316s
user	0m0.172s
sys	0m0.188s