Create new dataframe based on distinct combinations of column values in Python

Advertisements

I have a pandas dataframe:

d = {'col1': ['Date1', 'Date1', 'Date1', 'Date2', 'Date2', 'Date2', 'Date3', 'Date3', 'Date3', 'Date4', 'Date4', 'Date4'], 
     'col2': ['Date2', 'Date3', 'Date4', 'Date1', 'Date3', 'Date4', 'Date1', 'Date2', 'Date4', 'Date1', 'Date2', 'Date3']}
df = pd.DataFrame(data=d)

How do I get a unique list of combinations of the values in the columns, like this?

I have tried nested for loops using df.itertuples() and df.drop() and am just getting lost.

>Solution :

You can sort the col1/col2 and then drop duplicates:

df["tmp"] = df[["col1", "col2"]].apply(sorted, axis=1)
df = df.drop_duplicates(subset="tmp").drop(columns="tmp")

print(df)

Prints:

    col1   col2
0  Date1  Date2
1  Date1  Date3
2  Date1  Date4
4  Date2  Date3
5  Date2  Date4
8  Date3  Date4

Leave a ReplyCancel reply