After practicing consistently how to write SQL queries, I feel relatively comfortable with writing subqueries, window functions and common expression tables (CTEs) now but one of the more difficult things to grasp is the order of execution of the commands and how that affects performance.
The goal of this SQL query is to get the highest salary from each department and the corresponding employee’s name.
I find CTEs the most intuitive to write since they are the most readable as well, especially in combination with a window function. However, this might affect the performance - depending on the database and dataset size, using RANK()
can be slower due to the need for sorting and ranking. Subqeuries might offer better performance. What ends up happening is that I end up writing window functions without necessarily needing to.
Option 1: Use Common Table Expression (CTE) with RANK()
WITH cte AS (
SELECT first_name, department,
RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS ranked,
salary
FROM employee
)
SELECT MAX(salary) AS MaxSalary, department, first_name
FROM cte
WHERE ranked = 1
GROUP BY department;
Option 2: Use a Subquery with IN
Clause
For smaller datasets or simpler cases, this might perform better due to less overhead compared to ranking. Also, if multiple employees from the same department have the highest salary, this method will return all of them. In any case, performance also depends on the SQL database’s query optimizer after all. If only the highest salary is needed and we don’t need to handle ties or more complex ranking logic, the second approach with the IN
clause is more efficient and simpler. But if ranking is needed, the CTE plus using RANK()
is more suitable.
SELECT department,
first_name AS employee_name,
salary
FROM employee
WHERE (department, salary) IN (
SELECT department, MAX(salary)
FROM employee
GROUP BY department
);