Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

using python I would like to remove duplicate rows based on first column but would like to keep all the values in second column

I have below dataframe

              CVE ID             Product Versions
0      CVE-2022-46689                   Mac OS 12
1      CVE-2022-42856                      Safari
2      CVE-2022-46689             Windows 10 21h1
3      CVE-2022-41121             Windows 10 21h2
4      CVE-2022-42856                      Safari

I would like to remove duplicates based on the column CVE ID but also want to make sure that I store the value present in the 2nd column Product Versions (but remove the value if already present)

Something like this below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

              CVE ID             Product Versions
0      CVE-2022-46689            Mac OS 12, Windows 10 21h1
1      CVE-2022-42856            Safari
2      CVE-2022-41121            Windows 10 21h2

How should I do it?

Any help is appreciated

>Solution :

here is one way to do it

# drop duplicates (in memory)
# groupby CVE ID and join the resulting list of product version

out=(df.drop_duplicates(subset=['CVE ID','Product Versions'])
 .groupby(['CVE ID'],as_index=False)['Product Versions']
 .agg(','.join ))

out
            CVE ID  Product Versions
0   CVE-2022-41121  Windows 10 21h2
1   CVE-2022-42856  Safari
2   CVE-2022-46689  Mac OS 12, Windows 10 21h1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading