Supposed you want to update or insert a row in a database table using SQL. Naturally, you begin with some matching criteria to see if the row exists. Then, if you find it, you update it. If you don't find a matching row, then you insert a new one. This type of action is known as an upsert. (The word upsert is a contraction of the words update and insert.)
The problem with upserts is that, in a multiuser universe, they must be completed within a single transaction. You certainly wouldn't want two clients searching simultaneously for the same row, both determining that the row doesn't exist, and both trying to insert it.
Fortunately, SQL provides a way to perform upserts.
It's called "MERGE
".
While I'm not quite ready to tell you all about what MERGE
is, I can share what is perhaps the most commonly enjoyed pattern for using it.
Suppose you wish to record the age of a person.
You know his ID and name, but you don't know whether he already has a record in the person
table.
This is where MERGE
comes to our rescue.
To get started, we first identify the table that we want to merge into:
MERGE INTO Person o
We alias the Person
table with "o
", signifying the "old" or "original" data.
Next, we declare the data we would like to merge:
USING ( SELECT :person_id person_id, :name name, :age age FROM DUAL ) n
The inner select generates a single row result set.
We might justifiably call this inner select a "table literal," because every column in the result set is a pseudo-n
", signifying the "new" data.
So what is "DUAL
" all about, anyway?
Due to SQL grammar restrictions, all SELECT
statements must have a FROM
clause, even if the selected fields don't reference a table.
To trivially satisfy this requirement, we use DUAL
, which is a 1-row, 1-column dummy table provided by Oracle.
(Other database implementations have similar dummy tables.)
It really doesn't matter what table you use, as long as it has exactly one row.
Now that you have declared your old and new data, the next step is to tell your database how to match the two data sets. This will help the database determine whether the final action will be an update or an insert. Here's how we provide the matching criteria:
ON ( o.person_id = n.person_id )
Pretty simple, right? Next, we tell the database what to do if it finds a match:
WHEN MATCHED THEN UPDATE SET age = n.age
The syntax of the above clause is a degenerate form of a typical UPDATE
statement.
Its main difference from a normal UPDATE
statement is that the table is not named.
This is not necessary because it was already declared at the beginning of the MERGE
statement.
Finally, we must also tell the database how to manage the insert case, in case it can't find a match:
WHEN NOT MATCHED THEN INSERT ( person_id, name, age ) VALUES ( n.person_id, n.name, n.age )
Note that this clause also resembles a normal INSERT
statement, except that the "INTO table-name
" part is missing.
Again, this isn't needed because the affected table has already been named elsewhere.
Well, that was the last part. Here's the whole thing, put together:
MERGE INTO Person o USING ( SELECT :person_id person_id, :name name, :age age FROM DUAL ) n ON ( o.person_id = n.person_id ) WHEN MATCHED THEN UPDATE SET age = n.age WHEN NOT MATCHED THEN INSERT ( person_id, name, age ) VALUES ( n.person_id, n.name, n.age )
This format will satisfy most of the upsert requirements you ever have, and you can stop here if you want to.
However, if you want to know how to make your MERGE
statement even prettier, keep reading.
The previously demonstrated format is great for explaining how MERGE
statements work, but it is more verbose than necessary.
When no subquerying is required to determine what data will be merged, we can refactor a MERGE
to entirely avoid an inner SELECT
.
MERGE INTO Person USING DUAL ON ( person_id = :person_id ) WHEN MATCHED THEN UPDATE SET age = :age WHEN NOT MATCHED THEN INSERT ( person_id, name, age ) VALUES ( :person_id, :name, :age )
MySQL has an analogue called "INSERT ... ON DUPLICATE KEY UPDATE
".
Please share your thoughts!