I have below dataframe
CVE ID Product Versions
0 CVE-2022-46689 Mac OS 12
1 CVE-2022-42856 Safari
2 CVE-2022-46689 Windows 10 21h1
3 CVE-2022-41121 Windows 10 21h2
4 CVE-2022-42856 Safari
I would like to remove duplicates based on the column CVE ID
but also want to make sure that I store the value present in the 2nd column Product Versions
(but remove the value if already present)
Something like this below:
CVE ID Product Versions
0 CVE-2022-46689 Mac OS 12, Windows 10 21h1
1 CVE-2022-42856 Safari
2 CVE-2022-41121 Windows 10 21h2
How should I do it?
Any help is appreciated
>Solution :
here is one way to do it
# drop duplicates (in memory)
# groupby CVE ID and join the resulting list of product version
out=(df.drop_duplicates(subset=['CVE ID','Product Versions'])
.groupby(['CVE ID'],as_index=False)['Product Versions']
.agg(','.join ))
out
CVE ID Product Versions
0 CVE-2022-41121 Windows 10 21h2
1 CVE-2022-42856 Safari
2 CVE-2022-46689 Mac OS 12, Windows 10 21h1