Database administrators and developers have a love-hate relationship with the SQL update from select pattern. You've been there. You have two tables. One is a messy production table, and the other is a temporary table full of the "correct" values you spent three hours cleaning. You run the update. You wait. The terminal says "Query OK," but when you check the data, nothing changed. Or worse, every single row in your table now has the exact same name: "John Doe."
It’s frustrating.
The reality is that while updating one table based on values from another seems like a Day 1 SQL task, the syntax is a total nightmare because it isn't standardized. PostgreSQL does it one way. MySQL does it another. SQL Server? They have their own special FROM clause that makes sense only after three cups of coffee. If you're switching between environments, it’s incredibly easy to write a query that looks logically sound but executes like a disaster.
👉 See also: Finding the Right Samsung Galaxy A9 Plus Case Without Getting Scammed
The Syntax Gap: Why "Update From Select" is So Confusing
Most people start with a basic UPDATE statement. You know the drill: UPDATE table SET column = value WHERE id = 1. Simple. But as soon as you need to pull that value from a secondary table—let's say a pricing_updates table—everything falls apart.
In a perfect world, SQL would be universal. It isn't.
If you're working in SQL Server (T-SQL), you’re likely using a JOIN inside your update statement. It feels natural to a lot of devs because it mirrors how we write SELECT queries. You're basically telling the engine, "Hey, join these two tables on the ID, and then pipe the data from Table B into Table A."
But try that same logic in Oracle, and the parser will throw a syntax error faster than you can hit enter. Oracle traditionally prefers correlated subqueries or the MERGE statement. If you're coming from a SQL Server background and landing in an Oracle environment, your first week is going to be spent yelling at your IDE.
MySQL uses a multi-table update syntax. It’s actually quite readable, but it differs just enough from the others to cause "muscle memory" errors. You list both tables right after the UPDATE keyword. It’s concise, but it lacks the explicit FROM clause that T-SQL users crave.
PostgreSQL and the "UPDATE FROM" Quirk
Postgres is usually the "clean" one in the family, but even it has a specific way of handling the SQL update from select logic. In Postgres, you use the FROM clause.
Here is a common trap: people include the target table in the FROM clause again.
Don't do that.
If you write UPDATE users SET status = updates.status FROM users JOIN updates..., you’ve just created a self-join that can result in a Cartesian product. The database gets confused about which "users" table you're actually trying to change. The correct way in Postgres is to list the source table in the FROM clause and then link it to the target table in the WHERE clause.
It feels a bit backwards if you're used to standard joins. It's more like a filtered cross-join than a traditional inner join.
The Dangerous Case of the Correlated Subquery
Before modern JOIN syntax became the norm for updates, everyone used correlated subqueries. Honestly, some people still do. It looks like this:
UPDATE products SET price = (SELECT new_price FROM price_list WHERE price_list.id = products.id)
This is dangerous. Really dangerous.
What happens if the subquery doesn't find a match for a specific row in the products table? In many SQL dialects, the subquery returns NULL. The update statement then happily proceeds to set your product price to NULL.
Imagine doing that to a 10-million-row production database. You’ve just wiped out your entire pricing catalog because the update list was missing five items.
If you must use this method, you have to wrap it in a WHERE EXISTS clause to ensure you’re only touching rows that actually have a corresponding match in your source data. It’s an extra step that people forget 90% of the time.
Why Your Performance is Tanking
Large-scale updates are expensive. When you perform a SQL update from select on a massive dataset, you aren't just changing data; you're triggering a cascade of background tasks.
- Transaction Logs: Every single row change is written to a log. If you update 5 million rows in one go, your log file might explode.
- Indexes: Every index on the columns you're updating must be recalculated.
- Locks: The database will likely place a lock on the table. If this is a live web app, your users are now seeing "Connection Timed Out" because your update is hogging the table.
Expert DBAs usually suggest batching. Instead of updating 1 million rows, update 5,000 at a time in a loop. It sounds slower, but it prevents the database from locking up and keeps the transaction log manageable.
Real-World Examples and Nuance
Let’s look at how this actually plays out in the wild. Suppose you're a developer at an e-commerce company. The marketing team gave you a CSV of 500 products that need a "Winter Sale" discount. You’ve imported that CSV into a table called winter_discounts.
In SQL Server, your query looks like this:UPDATE p SET p.Price = d.NewPrice FROM Products p INNER JOIN winter_discounts d ON p.ProductID = d.ProductID
Notice the alias p used right after UPDATE. It’s a bit weird, but it works.
In MySQL, you’d do this:UPDATE Products p, winter_discounts d SET p.Price = d.NewPrice WHERE p.ProductID = d.ProductID
It's cleaner, but it’s essentially doing the same thing.
What if you have duplicate rows in your source table? This is where things get messy. If winter_discounts has two different prices for the same ProductID, which one wins?
The database won't tell you there's an error. It will just pick one—often the one it finds first in the physical storage—and move on. This is "non-deterministic" behavior. It’s a silent killer of data integrity. Always, always run a SELECT with a GROUP BY ... HAVING COUNT(*) > 1 on your source data before you even think about running an update.
The "MERGE" Alternative
If you’re on SQL Server or Oracle, you have access to the MERGE statement. It’s often called "upsert" because it can handle updates and inserts in one go.
MERGE INTO Products AS target USING winter_discounts AS source ON (target.ProductID = source.ProductID) WHEN MATCHED THEN UPDATE SET target.Price = source.NewPrice;
It’s wordy. It’s verbose. But it’s incredibly powerful. It forces you to be explicit about what happens when a match is found and, crucially, what happens when it isn't. Many senior devs prefer MERGE for SQL update from select operations because it feels more intentional and less like a "hack" of the UPDATE syntax.
Testing Your Update Before It's Too Late
Never run an update on production without testing. That sounds like basic advice, but "cowboy coding" is alive and well.
The easiest way to test is to turn your UPDATE into a SELECT. If you're using a join, just change the UPDATE...SET part to SELECT target.old_value, source.new_value. Look at the results. Does it look right? Are there NULL values where there shouldn't be?
Another pro tip: wrap your update in a transaction.
BEGIN TRANSACTION;UPDATE...SELECT * FROM table WHERE ... (Verify the change)-- If it's wrong: ROLLBACK;-- If it's right: COMMIT;
This gives you a safety net. If you see that you’ve accidentally updated 100,000 rows instead of 100, you can undo the damage instantly. Once you COMMIT, there’s no going back without pulling out the backup tapes.
Actionable Steps for Clean Data Updates
To master the SQL update from select process and avoid the most common pitfalls, follow this workflow every time:
- Deduplicate your source: Run a query to ensure your "Select" table has unique keys. If it doesn't, your update will be unpredictable.
- Match your types: Ensure the data types in your source table match the target. Trying to shove a
VARCHARinto aDECIMALcolumn will either fail or lead to weird rounding errors. - Filter explicitly: Use a
WHEREclause to limit the update to only the rows that actually need changing. Updating a row to the same value it already has is a waste of resources. - Check the row count: Before you run the update, run a
SELECT COUNT(*)with the same join logic. If the count is much higher than you expect, stop and re-evaluate your join conditions. - Use Aliases: Table aliases (
pforproducts,uforusers) make your queries readable and prevent the engine from getting confused about which column belongs to which table. - Index the Join Keys: If you're updating a large table, ensure that the columns used in the
JOINorWHEREclause (usually IDs) are indexed. Without indexes, the database has to perform a full table scan for every single row, which can turn a 5-second update into a 5-hour nightmare.
Reliable data management isn't about knowing the fanciest commands. It's about being defensive. The "update from select" pattern is a sharp tool—it's incredibly useful for synchronizing data across your ecosystem, but it'll cut you if you don't respect the syntax differences between engines.