Fixing foreign key deadlocks, part 2

In the previous article, I explained the problem with foreign key checks checks obtaining too strong of a lock, and promised that we would be attempting to fix it.

Here is my proposal:

  1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK
  2. This will acquire a new type of lock in the tuple, dubbed a "keylock".
  3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE.
  4. It also conflicts with UPDATE if the UPDATE modifies an indexed attribute.

That's the gist of it. The end effect is that you are allowed to UPDATE a tuple that's being used in a foreign key check, as long as you don't change any indexed columns.

This idea was suggested by Simon Riggs in the pgsql-hackers thread referenced in my previous article, and further debugged and improved by the developers in the ensuing discussion to a reasonably workable level — though it remains ticklish.

The interesting implementation details are:

  1. We need to use a new bit in t_infomask. 0x0010 is currently unused so we will grab that.
  2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the Xmax to the locker. If the tuple is already key-locked, a MultiXactId needs to be created from the original locker(s) and the new transaction.
  3. The original tuple needs to be marked with the Cmax of the locking command, to prevent it from being seen in the same transaction.
  4. A non-conflicting update to the tuple must carry forward some fields from the original tuple into the updated copy. Those include Xmax, XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag.

If you're curious about also carrying forward COMBO_CID: at first I thought this wasn't necessary, because the only transaction that might care about those bits is the one creating the tuple, thus no transaction can do the necessary UPDATE. However, if a transaction creates the tuple, then modifies it in a subtransaction, then aborts the subtransaction, then key-locks the tuple, and finally updates it again, the last version of the tuple needs to have the correct CommandId information. This is fairly corner case and I would be surprised to see it happen in reality. But this is no excuse for not supporting the case.

(Offhand, I don't see any other fields that need to be carried forward, but I'm open to the possibility that I'm missing some.)

Note that the lock would be even more granular if instead of checking for an attribute of any index, we were to check for the particular UNIQUE index that implements the foreign key being verified. We choose not to do that for now, because it brings excessive modularisation breakage and possibly extra locking considerations.