Additional fields in a group request
Oleg Petrushchenko, 24 Jun 14:14
Consider the solution of simple problem. There are several sites that host articles on various subjects, that has attracted the interest of readers. Readers have the opportunity to leave their comments to any article. There is a common table comments for all sites with the following columns: id, page_id, user_name, body, site, hidden, checked. You must select the last n approved (checked) and visible (not hidden) comments for a particular site (site), and commented article (page_id) should not be repeated.
Make a simple request:
Well, let's get rid of complex queries, we will use an ORM and scopes. It turns out something like this:
Let's try to continue the search for optimal solutions in google. As a result, we find how you can make such a request exclusively by means of sql:
Finally, we linked the id field between tables, which we call c and maxresults. As a result, we got a request that allow to find out ids of recent comments and moreover the names of authors and texts of these comments. Read more...
Make a simple request:
SELECT max (id) as max_id, page_id FROM comments
WHERE site = '# {site}' AND hidden! = 't' AND checked = 't'
GROUP BY page_id ORDER BY max_id DESC LIMIT # {n}
We got a sample of the latest comments (max_id) for each page_id, sorted in descending order and limit the first n results. It's pretty simple.
The situation becomes more complicated when we want to know the name of the author (user_name) and text (body) of comment. These fields are not present in our sample. Let's try to include body and user_name into SELECT. Unfortunately, postgres will throw an error and advises to include these fields in the GROUP BY or to apply to them an aggregate function. When these fields in the group we get a few lines for each page_id (that does not suit us), and aggregate functions (such as max(body) or avg(user_name)) for these fields have no meaning.Well, let's get rid of complex queries, we will use an ORM and scopes. It turns out something like this:
last_comments = []
for_site( site ).visible.checked.order( "id desc" ).each do |c|
unless last_comments.map( &:page_id ).include?( c.page_id )
last_comments << c
break if last_comments.size >= n.to_i
end
end
This solution works, but not too fast, and does not look too good.Let's try to continue the search for optimal solutions in google. As a result, we find how you can make such a request exclusively by means of sql:
SELECT c.page_id, c.user_name, c.body FROM comments c,
(SELECT max (id) as max_id, page_id FROM comments
WHERE site = '# {site}' AND hidden != 't' AND checked = 't'
GROUP BY page_id) maxresults WHERE c.id = maxresults.max_id
ORDER BY max_id DESC LIMIT # {n}
Let me explain what we did. First, we assigned the first occurrence of the table comments in query alias c. Then, we have included two components in expression FROM. The first component is the table comments (with the alias c), the second component is the expression SELECT:
SELECT max (id) as max_id, page_id FROM comments
WHERE site = '# {site}' AND hidden != 't' AND checked = 't'
GROUP BY page_id
We assigned the max(id) max_id alias and all SELECT expression maxresults alias. Finally, we linked the id field between tables, which we call c and maxresults. As a result, we got a request that allow to find out ids of recent comments and moreover the names of authors and texts of these comments. Read more...