Home How to remove possible suffix repetitions from a str column?

Questions

How to remove possible suffix repetitions from a str column?

March 7, 2023

Consider the following dataframe, where the suffix in a str column might be repeating itself:

    Book
0   Book1.pdf
1   Book2.pdf.pdf
2   Book3.epub
3   Book4.mobi.mobi
4   Book5.epub.epub

Desired output (removed suffixes where needed)

    Book
0   Book1.pdf
1   Book2.pdf
2   Book3.epub
3   Book4.mobi
4   Book5.epub

I have tried splitting on the . character and then counting occurences of the last item to check if there is duplication.

I have used file paths only to illustrate my point! The contents of the column could be something different than paths!

>Solution :

Use a regex with a capturing group + reference and str.replace:

df['Book'] = df['Book'].str.replace(r'(\.[^.]+)\1$', r'\1', regex=True)

# or
df['Book'] = df['Book'].str.replace(r'(\.[^.]+)(?=\1)$', '', regex=True)

Output:

         Book
0   Book1.pdf
1   Book2.pdf
2  Book3.epub
3  Book4.mobi
4  Book5.epub

regex demo 1

regex demo 2

generalization

if you want something generic that doesn’t depend on the .:

df['Book'] = df['Book'].str.replace(r'(.+)\1$', r'\1', regex=True)

regex demo

pandas

byMR

Published March 07, 2023

Add a comment

Can't get pair from map value

byMR

March 7, 2023

Questions

virtual method overridden issue with unique_ptr move semantics

byMR

March 7, 2023

Questions

Is is possible to create a new TypeScript interface of array types from an existing interface of non-array types?

byMR

March 7, 2023

Questions

Create column with year name if all following years meet condition

byMR

March 7, 2023

Questions

Selecting specific year data from date table (PostgreSQL)

byMR

March 7, 2023

Questions

Swift UI Application Navigation Link is disable

byMR

March 7, 2023

How to remove possible suffix repetitions from a str column?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

generalization

Like this:

Leave a ReplyCancel reply

Read more

Can't get pair from map value

virtual method overridden issue with unique_ptr move semantics

Is is possible to create a new TypeScript interface of array types from an existing interface of non-array types?

Create column with year name if all following years meet condition

Selecting specific year data from date table (PostgreSQL)

Swift UI Application Navigation Link is disable

Keep Up to Date with the Most Important News

How to remove possible suffix repetitions from a str column?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

generalization

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Can't get pair from map value

virtual method overridden issue with unique_ptr move semantics

Is is possible to create a new TypeScript interface of array types from an existing interface of non-array types?

Create column with year name if all following years meet condition

Selecting specific year data from date table (PostgreSQL)

Swift UI Application Navigation Link is disable

Discover more from Dev solutions