10 SQL Tricks That I Like
A list of 10 SQL tricks I have picked up over my time working with databases that could help improve your queries
At my previous job, I wrote raw SQL a lot for our massive internal database. Over time, I picked up a few little tricks that allowed me to condense my queries or improve their performance. Recently, I've been getting the itch to revisit some SQL stuff and I'm going to use this blog post as an excuse to write some raw SQL without the help of Ecto
or ActiveRecord
or anything like that. Hopefully, this exercise of me goofing around will help you learn something. Guess we'll see. While none of these tricks are particularly complicated, developers less familiar with SQL may not be familiar with them.
1. USING
#
SELECT friend_id, e.name AS entree, d.name AS dessert
FROM entrees e
INNER JOIN desserts d USING (friend_id);
USING
is helpful for simplifying your join when you are joining two tables on columns with the same name. In the above example, you have two tables which are lists of entrees and desserts and the ID of the friend who knows how to prepare them. If you have offers from multiple friends to come over for dinner, you want to know what possible combinations of entree and dessert each friend can make for you before you decide which offer to accept, so you run this query. USING
makes it so you do not have to write out ON e.friend_id = d.friend_id
, but what I find particularly helpful is that you no longer have to qualify which friend_id
you are referring to. This prevents the ever-frustrating error ERROR: column reference "friend_id" is ambiguous
when you forget to put e.
or d.
in front of friend_id
.
2. COALESCE
#
SELECT c.id as client_id, COALESCE(c.email, c.phone) AS contact_method
FROM clients c;
COALESCE
takes a list of columns (or other info) and returns the first non-NULL
one. In the example, you have a table of your business's clients and you want to get a list of them with a singular way to contact them. You prefer to contact them over email, so if they provide you with an email address, return the email address. Otherwise, you will settle for returning the phone number if the email address is not present. Thanks to COALESCE
you now just have one column and you do not need to select both columns individually and compare them.
3. CASE
#
SELECT CASE
WHEN c.country = 'US' THEN c.state
ELSE c.country END AS region
FROM clients c;
CASE
operates similarly to if, else if, and else statements. It returns what comes after THEN
for the first WHEN
statement that is true. If none of the WHEN
statements are true, it returns what is under the ELSE
statement. In the example, you want to send a gift to each of your clients because of how much you appreciate them, but first, you want to approximate shipping costs so you need to find out where they all live. If they don't live in the United States, you're fine with just calculating the shipping costs based on the country, but otherwise, you want to know which specific state they live in. CASE
allows you to check the country condition and return the shipping region based off that check.
4. Semi Joins (EXISTS
/IN
) #
SELECT u.id as user_id
FROM users u
WHERE EXISTS (
SELECT 1
FROM models m
WHERE m.user_id = u.id
AND m.updated_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
);
SELECT u.id as user_id
FROM users u
WHERE u.id IN (
SELECT m.user_id
FROM models m
WHERE m.updated_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
);
Semi Joins such as EXISTS
and IN
allow you to check for the existence of matching rows in other tables without having to join to the table itself. In these two examples, which accomplish the same thing, you want to get a list of your software's users who have worked on one of their models within the last 30 days so you know who to reach out to for feedback on some of the newer features. You could just do an INNER JOIN
paired with DISTINCT
or GROUP BY
to get the same result, but the semi joins provide a more performant solution since they only have to find one matching row before they can return while the INNER JOIN
finds all the matching rows which are then filtered down by the DISTINCT
or GROUP BY
into one row. Worth noting that if you are joining on a unique column, regular joins are equally performant to these semi joins.
5. String Pattern Matching (LIKE
/ILIKE
/~
/~*
) #
SELECT b.title, b.author
FROM books b
WHERE b.title LIKE '%Pirate%';
SELECT b.title, b.author
FROM books b
WHERE b.title ~ 'Pirate';
SELECT b.title, b.author
FROM books b
WHERE b.title ILIKE '%pirate%';
SELECT b.title, b.author
FROM books b
WHERE b.title ~* 'pirate';
If you need to pattern match on a string, you are provided with quite a few options. The most performant option available to you is LIKE
, which uses the built-in SQL matching including %
for 0 or more characters. You also have the Postgres-exclusive ~
which has the power of regex behind it, if you need a more complicated match. Then, you have the case insensitive versions of both, ILIKE
and ~*
respectively. For the example, we see four versions of trying to find a book that mentions "Pirate" in the title, since you want to read an old school high seas adventure. Personally, I have always loved the ~*
for quick queries where I just need to find something in a table quickly and the performance is not much of an issue, but I would recommend using LIKE
or ILIKE
for production code if possible.
6. UNION
(or UNION ALL
) #
SELECT s.pricing_id, s.price
FROM snacks s
UNION
SELECT t.pricing_id, t.price
FROM tickets t
UNION
SELECT m.pricing_id, m.price
FROM memberships m
ORDER BY price;
UNION
allows you to combine the results of multiple queries into one result set. In the example, we have a theater which has three types of products that they store in separate tables due to the different information required for each product type. They have snacks from the snack bar, tickets for the shows, and memberships that allow you to support the theater while also getting discounted prices on other purchases. The theater wants a list of all the prices of their products along with the ID used by their POS system for record-keeping purposes. UNION
allows them to take the results from all three tables and bring the distinct ones together. This works as long as each SELECT
returns the same number of columns and they columns have similar types. If you are willing to not worry about making each row distinct, you can use UNION ALL
instead.
7. FILTER
#
SELECT m.id as member_id, COUNT(*) as member_count, COUNT(*) FILTER(WHERE m.expiration_date > current_date) as active_member_count
FROM members_m;
FILTER
gives you the ability to run an aggregate function over a subset of the overall result set. Let's go back to the theater from the previous example for this one. Now, they want to know how many total members they have ever had and how many active members they currently have. To get overall total, you can just run COUNT
, but you can run COUNT
again with the additional FILTER
to only get the members you have not hit their expiration date yet.
8. Windows (OVER
) #
WITH info AS (
SELECT b.label, b.price, b.category,
SUM(b.price) OVER (PARTITION BY b.category) as total_category_price
FROM bills b
)
SELECT i.label, i.price, i.category,
(i.price::float / i.total_category_price) * 100 AS percentage
FROM info i
WITH info AS (
SELECT b.label, b.price, b.category,
SUM(b.price) OVER w
FROM bills b
WINDOW w AS (PARTITION BY b.category)
)
SELECT i.label, i.price, i.category,
(i.price::float / i.total_category_price) * 100 AS percentage
FROM info i
Windows provide a variety of new functionalities to you (I'll touch on some of these in the next couple of points), but one of the things they allow you to do is have a column call an aggregate function without having to run a GROUP BY
. In these two examples, you see two different ways of defining a window. One is inline and the other is separated out at the bottom of the internal query. In both examples, we are looking at a table of our bills and we want a list of the bills alongside what percentage of our budget for each billing category (Food, Entertainment, etc.) we are spending on that bill so we know what will be most beneficial to cancel. The window allows us to find the total of the category by using PARTITION BY
in the same query so we can just easily calculate the percentage without any joins to multiple subqueries. Windows also support using ORDER BY
which we will see more examples of coming up.
9. LAG
/LEAD
#
SELECT us.user_id,
us.log_in_timestamp - LAG(us.log_out_timestamp, 1) OVER (PARTITION BY us.user_id ORDER BY us.log_in_timestamp)
FROM user_sessions us
LAG
gives you the ability to access data from a previous row (as determined by the window) of the result set. LEAD
does the opposite, giving you data from an upcoming row. The second parameter is how many rows in the past or future you are traveling to get the information. In the example, you want to find out how much time a user is putting your software down before they log back in again and since you save log in and log out times for some reason, you can do that. You just need to get the log out time from the previous row determined by the ORDER BY
in the window and compare it to the current log in time. You also want to make sure you are not getting the times from different users mixed in together so you PARTITION BY
the user in the window as well. You could also do the opposite by using LEAD
to get the log in timestamp from the next row and comparing it to the log out timestamp from the current row. Overall, I find these helpful when comparing data sequentially.
10. ROW_NUMBER
#
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY m.user_id ORDER BY m.updated_at DESC) AS rn
FROM models m
HAVING rn >= 2
ROW_NUMBER
does basically what you would expect it to do. It assigns each row in the result set a number based off the window. If you do not define a window, it will just assign the number based off the ordering of the overall query. In this example, we only want our users to be able to have two models so we don't crash our database with too many, so we want to get a list of all the users' models that are not one of their two most recently updated in order to delete them. ROW_NUMBER
allows us to PARTITION BY
the user and order the numbering based on when the model was updated. Then, we can use the HAVING
clause to filter out all the rows that are one of the two most recently edited based on the returned row number.
The End #
I hope you find some of this helpful or learned something by reading this. At the very least, I enjoyed coming up with examples and writing this, so I'll take that as a win.