Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Google BigQuery: How to filter out rows by a particular column's value frequency

Say that I only want to return rows where a column value occurs at least twice.

I would do something like

SELECT 
table1.columnA
from table1
GROUP BY 
table1.columnA
HAVING COUNT(*) > 2

That works for just one column, but if I want to do return several columns but only have the filter apply to one column, it doesn’t work. My attempt is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

SELECT 
table1.columnA,
table1.columnB,
from table1
GROUP BY 
table1.columnA
HAVING COUNT(*) > 2

Which gives a "ColumnB which is neither GROUPED nor AGGREGATED " error.

From this post, it seems that I need to have all values in SELECT to be grouped or aggregated, but I only one to filter by one particular column

BIGQUERY SELECT list expression references column CHANNEL_ID which is neither grouped nor aggregated at [10:13]

So I’m still trying to figure out a way to filter by value frequency for a particular column.

>Solution :

You can use window function to count frequency and then filter. For example:

select distinct 
    columnA,
    columnB
from
    (select 
        *,
        row_number() over(partition by columnA) as rn 
     from table1)
where rn > 2

Let me know, if it is still not working for you.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading