Follow

Follow

Contact

Home Pandas findall by pattern but not duplicated ones

Questions

Pandas findall by pattern but not duplicated ones

byMR

October 15, 2023

I need to have a list of all non- duplicated regex matches.

Consider below dataframe:

Letter      Actions
r1          a30,a30
r2          a30,a12-rf,a15,a15
r3          0
r4          a10,a93
r5          a13

I expect:

Letter      Actions
r1          ['a30']
r2          ['a30','a12','a15']
r3          0
r4          ['a10','a93']
r5          ['a13']

I have below but it returns all pattern matches, while I need don’t need the duplicated ones:

import pandas as pd

df = pd.DataFrame(
    [['r1', 'a30,a30'],
     ['r2', 'a30,a12-rf,a15,a15'],
     ['r3', '0'],
     ['r4', 'a10,a93'],
     ['r5', 'a13']],
    columns=['Letter', 'Actions'])

df['Action_list'] = df['Actions'].str.findall(r'([a]\d{2})')

>Solution :

You can use set to remove duplicates:

mask = df["Actions"].str.contains(r"a\d+", regex=True)

df["new_Actions"] = np.where(
    mask, df["Actions"].str.findall(r"a\d+").apply(set).apply(list), df["Actions"]
)
print(df)

Prints:

  Letter             Actions      new_Actions
0     r1             a30,a30            [a30]
1     r2  a30,a12-rf,a15,a15  [a30, a15, a12]
2     r3                   0                0
3     r4             a10,a93       [a93, a10]
4     r5                 a13            [a13]

regex

byMR

Published October 15, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

How to clear a JFrame in java swing?

byMR

October 15, 2023

Questions

Scanf does not distinguis between brackets

byMR

October 15, 2023

Questions

Use jq to append array contents from multiple files into one master array

byMR

October 15, 2023

Questions

The method '[]' can't be unconditionally invoked because the receiver can be 'null'. Angela Lu course

byMR

October 15, 2023

Questions

How to reduce recompositions when animating text color?

byMR

October 15, 2023

Questions

client_secret provided does not match any associated SetupIntent on this account

byMR

October 15, 2023