Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Data cleaning in BigQuery: When session_id has both null and valid find_term, only keeping the records that have valid find_terms

I need some help cleaning my data. I have a table like below:

session_id find_term
11111 null
11111 candy
11111 chocolate
22811 null
11145 null
333222 dark chocolate

When session_id has both null and valid find_term, I want to keep the records that only have valid find_terms (eg: for session_id 11111, I want to exclude the first record when find_term is null and keep when find_term is candy and chocolate. For session_id 22811, 11145, I would want to keep all existed records because they only have null as find_term). I want the output to be:

session_id find_term
11111 candy
11111 chocolate
22811 null
11145 null
333222 dark chocolate

So far I have tried giving same group of records row_num to distinct them, but that is not too helpful. Please share your thoughts! Thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use below

select * from your_table
qualify ((find_term is null ) and countif(not find_term is null) over(partition by session_id) = 0)
or not find_term is null             

if applied to sample data in your question – output is

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading