Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialdrstrangequark
8,273 PointsI am SO CONFUSED about Subqueries
In SQL Playgrounds for Beginner SQL -> Querying Relational Databases -> Subqueries -> Review and Practice, the first assignment is:
-- Generate a report that lists the book titles from both locations and count the total number of books with the same title.
I have tried a million different iterations but I'm STILL not getting it. Could anyone please help me understand what I'm doing wrong? Here is my current version of the query even though I know it's incorrect:
SELECT bn.title AS "North Titles" FROM books_north AS bn GROUP BY title UNION SELECT bs.title AS "South Titles" FROM books_south AS bs GROUP BY title INNER JOIN ( SELECT COUNT(bn.title), COUNT(bs.title) FROM books_north AS bn INNER JOIN books_south AS bs GROUP BY title ON bn.id = bs.id ) AS t ON bn.id = bs.id;
2 Answers
Katie Wood
19,141 PointsHi there,
To work through this step by step, let's figure out where the subquery goes. In this case, you know you want the title and count, but the titles need to come from a combination of the titles from both locations, rather than any one location's table. That tells us we're probably going to want a subquery inside the FROM. We also know we want the final results to be grouped by the title, so we can start framing our query like this:
SELECT
all_books.title AS Title,
COUNT(*) AS Count
FROM (
) AS all_books
GROUP BY Title
Rather than having some columns from one table and some from the other, we want the same columns listing data from both tables, which tells us we want a UNION. We also know that we want all entries listed, not just unique ones (otherwise, all of our counts will be 1) - that means we want UNION ALL. We just need the titles from both tables, so our subquery can be something like this:
SELECT
all_books.title AS Title,
COUNT(*) AS Count
FROM (
SELECT title FROM books_north
UNION ALL
SELECT title FROM books_south
) AS all_books
GROUP BY Title
Hope this helps!
Steven Parker
231,236 PointsYou've got a few syntax errors here, but it doesn't need to be nearly so complicated. Here's some hints:
- try confining the
UNION
to the sub-query - remember to use
UNION ALL
when you want to see all records (not just unique ones) - limit the sub-query to returning only titles
- count rows in the outer query
- also use grouping only in the outer query
Steven Parker
231,236 PointsSteven Parker
231,236 PointsI was hoping you'd get something similar to this from the hints, but even this is still more complicated than necessary. In particular, you don't need a column alias for the title or a table alias for the sub-query.
drstrangequark
8,273 Pointsdrstrangequark
8,273 PointsThank you both for the help! Steven, the hints you gave were great but I still was struggling with putting them into use.
The code that Katie provided worked but I want to unpack it a bit so that I understand exactly what is going on here.
The main question that I have is: I would have expected UNION ALL to provide multiple lines of the same book, if it appeared twice, giving each instance a count of 1. Instead it combined them and increased the book count. Why did it do that? Was it because we added the GROUP BY title? If I had grouped by a different column, would it have turned out the other way (BTW I tried that and it gave me an error saying the newly specified column didn't exist. Why is that?)
Also, Steven, you were correct. I got rid of the table alias for the sub-query and the column alias for title and it still worked just fine.
Katie Wood
19,141 PointsKatie Wood
19,141 PointsYou're right, they're not necessary - it just looked like the intent of the query in the question was to include them so I put them in - you absolutely can have a working query without them. The subquery alias in particular was just personal preference, and I hoped it might make it easier to follow. Apologies if it complicated things!
Katie Wood
19,141 PointsKatie Wood
19,141 PointsAlso, in response to your question, drstrangequark!
Yes, the count increases because of the GROUP BY line. If you take this query, and take out the COUNT(*) in the selection and the GROUP BY line (I'll paste that below to save you some time, if you're interested), you'll see just a list of all the titles, including duplicates. In my example, I just changed GROUP BY to ORDER BY so it would sort the titles and make it easier to see the duplicates.
In response to the rest of your question, you do need to group by the title in this query. The GROUP BY is basically telling COUNT() how to function - when you say you want to group by title, it's essentially telling COUNT() to increase the count when there is a duplicate title. If you had a more complex query and you chose another column, you'd be telling COUNT() to count duplicates in whichever other column you specified. If you just tried to change the GROUP BY in this query, it would likely throw an error because our FROM just contains a selection of the title column from two tables - there is nothing else to group by, at least in this case.
Hope that cleared things up!