Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

SQL: Distinct and Distinct count(*) With Hive Tables

I am getting two very different numbers for these seemingly similar queries on (hive) tables:

select count(*) from test
# result: 2609173
select distinct count(*) from test
# result: 2609173
insert into testToo
select distinct * from test
# result: inserted 673065 rows

Any recommendations on how I might be able to discern what is going on? Am I using distinct somehow differently in the first few queries?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You want select count(distinct *) from test rather than select distinct count(*) from test

The former means "select the count of distinct rows" and the latter means "select the distinct values of count(*)" (and there’s only one value so it is semantically the same as select count(*) from test)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading