Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas equivalent of SQL distinct

I have a dataframe with multiple columns and I want to extract the rows that are unique in the manner of the SQL "select distinct" operation. So far whenever I look up forums on this I find comments about counting distinct (but I want the actual values) or (worse) values that are distinct in two columns just joined together as one set (using ravel). What I want is, for example for two columns, values that are distinct in pairs and the result as a dataframe.

I am considering now that the most effective method might be to write it myself – doing a stable sort on tuples and then scanning for duplicates. Any pandas expression that is no simpler than doing essentially that is not an answer to this question. I am looking for a basic or simple compound operation.


For those who do not know what a "distinct" in a query does …

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Starting with

1 2
2 3
1 2
4 5
2 3
2 1

we get back

1 2
2 3
4 5
2 1

Note – the question was asked should (2,1) and (1,2) be considered the same. No, as tuples are ordered. Again – refer to the behaviour of SQL for the details.

>Solution :

To get the unique values of a given column, try pandas.Series.unique():

values = df['column_name'].unique()

To get unique combinations of given columns, try pandas.DataFrame.drop_duplicates():

df.drop_duplicates(subset=['colmun_name1', 'column_name2'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading