how to upsert

Supposed you want to update or insert a row in a database table using SQL. Naturally, you begin with some matching criteria to see if the row exists. Then, if you find it, you update it. If you don't find a matching row, then you insert a new one. This type of action is known as an upsert. (The word upsert is a contraction of the words update and insert.)

The problem with upserts is that, in a multiuser universe, they must be completed within a single transaction. You certainly wouldn't want two clients searching simultaneously for the same row, both determining that the row doesn't exist, and both trying to insert it.

Fortunately, SQL provides a way to perform upserts. It's called "MERGE". While I'm not quite ready to tell you all about what MERGE is, I can share what is perhaps the most commonly enjoyed pattern for using it.

Suppose you wish to record the age of a person. You know his ID and name, but you don't know whether he already has a record in the person table. This is where MERGE comes to our rescue. To get started, we first identify the table that we want to merge into:

MERGE INTO Person o

We alias the Person table with "o", signifying the "old" or "original" data.

Next, we declare the data we would like to merge:

USING (
  SELECT
    :person_id person_id,
    :name      name,
    :age       age
  FROM
    DUAL
) n

The inner select generates a single row result set. We might justifiably call this inner select a "table literal," because every column in the result set is a pseudo-column, or, in other words, an evaluated column that doesn't exist in any database or view. We alias the inner select with "n", signifying the "new" data.

So what is "DUAL" all about, anyway? Due to SQL grammar restrictions, all SELECT statements must have a FROM clause, even if the selected fields don't reference a table. To trivially satisfy this requirement, we use DUAL, which is a 1-row, 1-column dummy table provided by Oracle. (Other database implementations have similar dummy tables.) It really doesn't matter what table you use, as long as it has exactly one row.

Now that you have declared your old and new data, the next step is to tell your database how to match the two data sets. This will help the database determine whether the final action will be an update or an insert. Here's how we provide the matching criteria:

ON (
  o.person_id = n.person_id
)

Pretty simple, right? Next, we tell the database what to do if it finds a match:

WHEN MATCHED THEN UPDATE SET
  age = n.age

The syntax of the above clause is a degenerate form of a typical UPDATE statement. Its main difference from a normal UPDATE statement is that the table is not named. This is not necessary because it was already declared at the beginning of the MERGE statement.

Finally, we must also tell the database how to manage the insert case, in case it can't find a match:

WHEN NOT MATCHED THEN INSERT (
  person_id,
  name,
  age
)
VALUES (
  n.person_id,
  n.name,
  n.age
)

Note that this clause also resembles a normal INSERT statement, except that the "INTO table-name" part is missing. Again, this isn't needed because the affected table has already been named elsewhere.

Well, that was the last part. Here's the whole thing, put together:

MERGE INTO Person o
USING (
  SELECT
    :person_id person_id,
    :name      name,
    :age       age
  FROM
    DUAL
) n
ON (
  o.person_id = n.person_id
)
WHEN MATCHED THEN UPDATE SET
  age = n.age
WHEN NOT MATCHED THEN INSERT (
  person_id,
  name,
  age
)
VALUES (
  n.person_id,
  n.name,
  n.age
)

This format will satisfy most of the upsert requirements you ever have, and you can stop here if you want to. However, if you want to know how to make your MERGE statement even prettier, keep reading.

a shorter format

The previously demonstrated format is great for explaining how MERGE statements work, but it is more verbose than necessary. When no subquerying is required to determine what data will be merged, we can refactor a MERGE to entirely avoid an inner SELECT.

MERGE INTO Person
USING DUAL
ON (
  person_id = :person_id
)
WHEN MATCHED THEN UPDATE SET
  age = :age
WHEN NOT MATCHED THEN INSERT (
  person_id,
  name,
  age
)
VALUES (
  :person_id,
  :name,
  :age
)

appendix

MySQL has an analogue called "INSERT ... ON DUPLICATE KEY UPDATE".


Please share your thoughts!