I have a query that I want to add a colum to. It should contain the amount of how many times a value of a specific column appears in that table.
My current attempt
SELECT cleaning.*,
(SELECT COUNT(*)
FROM cleaning
WHERE column_b = cleaning.column_b)
AS multicleanamount
FROM cleaning
is counting every single row as it is just comparing a column with same column – I understand the reason.
So the current (wrong) result is
| column_a | column_b | multicleanamount |
|---|---|---|
| 12 | 300 | 7 |
| 13 | 321 | 7 |
| 14 | 300 | 7 |
| 15 | 330 | 7 |
| 16 | 330 | 7 |
| 17 | 351 | 7 |
| 18 | 330 | 7 |
What I am missing now is how do I tell the sub-Select to compare with the current value of cleaning.column_b from the very first line? (which is in the cleaning.*)
2bh it was pretty easy in my head at first and it also would be if I’d compare to a static value like '300' but that table has 74 K entries. This has to be dynamic.
My desired output is
| column_a | column_b | multicleanamount |
|---|---|---|
| 12 | 300 | 2 |
| 13 | 321 | 1 |
| 14 | 300 | 2 |
| 15 | 330 | 3 |
| 16 | 330 | 3 |
| 17 | 351 | 1 |
| 18 | 330 | 3 |
I hope that is understandable. If not please tell me and I will try to specify even more.
>Solution :
Use table aliases so you can distinguish the cleaning table in the subquery from the table in the main query. In your query, WHERE column_b = cleaning.column_b is comparing the column with itself, so the condition is always true and you’re counting all the rows.
SELECT c1.*,
(SELECT COUNT(*)
FROM cleaning AS c2
WHERE c1.column_b = c2.column_b)
AS multicleanamount
FROM cleaning AS c1
It may also be better to write this as a JOIN instead of correlated subquery:
SELECT c1.*, c2.multicleanamount
FROM cleaning AS c1
JOIN (
SELECT column_b, COUNT(*) AS multicleanamount
FROM cleaning
GROUP BY column_b
) AS c2 ON c1.column_b = c2.column_b