I wanted to try out something that first seemed to be quite simple, but then I had to see that I didn’t manage to get the desired result in one single query. Maybe one of you has an idea how to do it.
Here’s the table that you need for this example and a little Stored Procedure to quickly fill the table:
CREATE TABLE tt (
id int unsigned NOT NULL auto_increment,
d1 int unsigned NOT NULL,
d2 int unsigned NOT NULL,
PRIMARY KEY (id),
KEY d1 (d1)
) ENGINE=InnoDB;
… and the Stored Procedure …
DELIMITER //
DROP PROCEDURE IF EXISTS insertIntoTT //
CREATE PROCEDURE insertIntoTT(in _rows int)
begin
declare counter int default 0;
while counter < _rows do
set @a := floor(rand() * 20) + 1;
set @b := floor(rand() * 1000000);
insert into tt (d1, d2) values (@a, @b);
set counter := counter + 1;
end while;
end //
DELIMITER ;
With
CALL insertIntoTT(10000);
you can insert 10,000 rows and end up with a table like this:
mysql> SELECT * FROM tt LIMIT 10;
+----+----+--------+
| id | d1 | d2 |
+----+----+--------+
| 1 | 16 | 578121 |
| 2 | 12 | 209260 |
| 3 | 6 | 772868 |
| 4 | 1 | 801153 |
| 5 | 19 | 266106 |
| 6 | 11 | 845643 |
| 7 | 13 | 672265 |
| 8 | 9 | 162255 |
| 9 | 11 | 32461 |
| 10 | 14 | 156472 |
+----+----+--------+
10 rows in set (0.00 sec)
The numbers in column d1 are in the range from 1 to 20 and those in d2 are from 0 to 999,999.
Now I wanted to get the top 3 values from the d2 column for each value in d1, the output should look like this (at the bottom of this article, I’ll show you the Stored Procedure that produced the output):
mysql> CALL getOutput();
+------+----+--------+
| id | d1 | d2 |
+------+----+--------+
| 8590 | 1 | 992748 |
| 5378 | 1 | 991187 |
| 8010 | 1 | 989804 |
| 4271 | 2 | 995870 |
| 6825 | 2 | 995212 |
| 7195 | 2 | 993449 |
| 7518 | 3 | 998659 |
| 327 | 3 | 994499 |
| 7523 | 3 | 992530 |
| 9637 | 4 | 997258 |
| 9505 | 4 | 986400 |
| 3577 | 4 | 984371 |
| 4474 | 5 | 998324 |
| 9616 | 5 | 990006 |
| 7856 | 5 | 987046 |
| 8149 | 6 | 995602 |
| 5339 | 6 | 994606 |
| 4791 | 6 | 991693 |
| 5956 | 7 | 999078 |
| 6360 | 7 | 995711 |
| 1280 | 7 | 994592 |
| 1449 | 8 | 997588 |
| 4993 | 8 | 995742 |
| 3966 | 8 | 994909 |
| 3755 | 9 | 996587 |
| 2537 | 9 | 996559 |
| 5372 | 9 | 994984 |
| 4907 | 10 | 998818 |
| 2512 | 10 | 997335 |
| 1156 | 10 | 996677 |
| 1653 | 11 | 999390 |
| 9901 | 11 | 998331 |
| 8307 | 11 | 998075 |
| 2526 | 12 | 998190 |
| 5922 | 12 | 996840 |
| 3005 | 12 | 991681 |
| 2632 | 13 | 991479 |
| 906 | 13 | 991448 |
| 2023 | 13 | 990165 |
| 3903 | 14 | 998612 |
| 7725 | 14 | 993921 |
| 5169 | 14 | 993021 |
| 6371 | 15 | 999566 |
| 5788 | 15 | 996766 |
| 7331 | 15 | 996545 |
| 8483 | 16 | 998472 |
| 1061 | 16 | 997509 |
| 6350 | 16 | 994854 |
| 2927 | 17 | 998590 |
| 8081 | 17 | 993340 |
| 6595 | 17 | 993104 |
| 4960 | 18 | 999275 |
| 6432 | 18 | 998478 |
| 2029 | 18 | 996769 |
| 1971 | 19 | 999992 |
| 1067 | 19 | 997117 |
| 3812 | 19 | 996488 |
| 9869 | 20 | 999248 |
| 8045 | 20 | 999102 |
| 3520 | 20 | 998395 |
+------+----+--------+
60 rows in set (0.11 sec)
Query OK, 0 rows affected (0.11 sec)
My question is, can I get the same output without a Stored Procedure, only with a normal query (including a subquery or a join). I tried three approaches.
My first one:
mysql> SELECT id, d1, d2
-> FROM tt AS a
-> WHERE id IN
-> (
-> SELECT id
-> FROM tt
-> WHERE d1 = a.d1
-> ORDER BY d2 DESC
-> LIMIT 3
-> )
-> ORDER BY d1, d2 DESC;
ERROR 1235 (42000): This version of MySQL doesn't
yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
The subquery should retrieve the IDs of the rows that have values in the top 3 for column d2 – the where clause should make the connection. Unfortunately, the LIMIT clause is not allowed here.
My second attempt:
mysql> SELECT id, d1, d2
-> FROM tt AS a
-> WHERE d2 >=
-> (
-> SELECT d2
-> FROM tt
-> WHERE d1 = a.d1
-> ORDER BY d2 DESC
-> LIMIT 3, 1
-> )
-> ORDER BY d1, d2 DESC;
ERROR 1242 (21000): Subquery returns more than 1 row
This time, I tried to get the third highest d2 value for each d1 value inside the subquery and get all columns, where d2 is greater or equal the retrieved value. Even though the subquery can only result in one row (because of the LIMIT clause), MySQL complains that it returns more than one row.
My third attempt was a bit desperate and might look a bit strange:
mysql> SELECT a.id, a.d1, a.d2
-> FROM tt AS a INNER JOIN
-> (
-> SELECT id
-> FROM tt
-> WHERE d1 = a.d1
-> ORDER BY d2 DESC
-> LIMIT 3
-> ) AS b;
ERROR 1054 (42S22): Unknown column 'a.d1' in
'where clause'
The derived second table in this join should again retrieve the IDs of all the top 3 rows. But the WHERE clause in the subquery can’t get the connection to the outer query.
Any ideas how to get the result in one query?
So, here’s the promised Stored Procedure that solved the problem in quite a complex manner:
DELIMITER //
DROP PROCEDURE IF EXISTS getOutput //
CREATE PROCEDURE getOutput()
BEGIN
DECLARE _found INT DEFAULT 0;
DECLARE _d1 INT DEFAULT 0;
DECLARE cur CURSOR FOR
SELECT DISTINCT d1
FROM tt
ORDER BY d1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET _found := 1;
DROP TEMPORARY TABLE IF EXISTS tt_temp;
CREATE TEMPORARY TABLE tt_temp (
id INT UNSIGNED NOT NULL,
d1 INT UNSIGNED NOT NULL,
d2 INT UNSIGNED NOT NULL);
OPEN cur;
d1_loop:LOOP
FETCH cur INTO _d1;
IF _found = 1 THEN
LEAVE d1_loop;
END IF;
INSERT INTO tt_temp
SELECT id, d1, d2
FROM tt
WHERE d1 = _d1
ORDER BY d2 DESC
LIMIT 3;
END LOOP;
SELECT * FROM tt_temp;
END //
DELIMITER ;