Tuesday, August 27, 2013

Select first row in each GROUP BY group (perform order by after group by)

PROBLEM

So you want to retrieve or select the top row of each set of rows in a group by clause after you have been given the ability to order your results.
Example:

SELECT * FROM orders
 
id | customer   | quantity
----------------------- 
 1 | Stanley    | 5
 2 | Martins    | 1
 3 | Stanley    | 3
 4 | Martins    | 8

If you want the query to return the largest quantity
ordered by each customer to have something like;

SELECT customer, TOP(quantity) 
       FROM orders 
       GROUP BY customer 
       ORDER BY quantity DESC;
 
 customer  |   TOP(id)   | TOP(quantity)
 Stanley   |   1         | 5
 Martins   |   4         | 8

But there is no function I can think of that does TOP for and ORDER BY
does not working before GROUP BY and behave differently after GROUP BY.
Note that this cannot be resolved by regular MAX/MIN aggregate functions because you don't know if (id) is necessarily bigger/smaller



SOLUTION 1 (with postgres array & subquery)
SELECT customer,
       (array_agg(id::text))[1] as id
       (array_agg(quantity::text))[1] as quantity
       FROM(
             (SELECT customer, quantity
                     FROM orders
                     ORDER BY quantity DESC ) as tmp
       )
       GROUP BY customer

 
What the above query does is to order results with a sub-query and then
make an array out of every row and select the first which is what you want.


SOLUTION 2
WITH summary AS (
    SELECT p.id
           p.customer, 
           p.quantity, 
           ROW_NUMBER() OVER(PARTITION BY p.customer 
                                 ORDER BY p.quantity DESC) AS rk
      FROM orders p)
SELECT s.*
  FROM summary s
 WHERE s.rk = 1
What the above query does is to order such that the first row number for every grouping is 1.





Thanks hope this helps.