Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to label each group with df.groupby() in Python pandas?

Consider we have a pandas data frame as following:

   Questions  cnt similarity
0       ABC    1  [1, 2, 3]
1       abc    2  [1, 2, 3]
2       cba    3  [2, 3, 1]
3      abcd    4  [4, 5, 6]
4      dcsa    5  [2, 3, 1]
5      adcd    6  [4, 5, 6]
6      abcd    7  [1, 2, 3]
7       cba    8  [7, 8, 9]

I have to add another column called cat based on the similarity column. If two rows have the same similarity, then categorize them as the same group. Below is the expected output. Any input is valuable. It is worth mentioning that the original dataset has 1M rows. Thank you.

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

IIUC, you can use pd.factorize :

df["cat"] = pd.factorize(df["similarity"].astype(str))[0] + 1


Output :

print(df)

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading