Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new dataframe based on distinct combinations of column values in Python

I have a pandas dataframe:

d = {'col1': ['Date1', 'Date1', 'Date1', 'Date2', 'Date2', 'Date2', 'Date3', 'Date3', 'Date3', 'Date4', 'Date4', 'Date4'], 
     'col2': ['Date2', 'Date3', 'Date4', 'Date1', 'Date3', 'Date4', 'Date1', 'Date2', 'Date4', 'Date1', 'Date2', 'Date3']}
df = pd.DataFrame(data=d)

enter image description here

How do I get a unique list of combinations of the values in the columns, like this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

I have tried nested for loops using df.itertuples() and df.drop() and am just getting lost.

>Solution :

You can sort the col1/col2 and then drop duplicates:

df["tmp"] = df[["col1", "col2"]].apply(sorted, axis=1)
df = df.drop_duplicates(subset="tmp").drop(columns="tmp")

print(df)

Prints:

    col1   col2
0  Date1  Date2
1  Date1  Date3
2  Date1  Date4
4  Date2  Date3
5  Date2  Date4
8  Date3  Date4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading