Home How can I convert `A_B_C_DEF` to `ABC_DEF`?

Questions

How can I convert `A_B_C_DEF` to `ABC_DEF`?

March 22, 2022

I have strings of this form:

A_B_CDEF_GHI
A_B_C_DEF_G_H_I
ABC_D_E_F_GHI
ABCDEFG_H_I
A_B_C

I need to convert those to the following:

AB_CDEF_GHI
ABC_DEF_GHI
ABC_DEF_GHI
ABCDEFG_HI
ABC

So the rules are:

(._){2,} should be converted to XXX_ if it’s not at the end of the string.
If (_.){2,} occurs at the end of a string, it should be converted to _XXX.
If (_.){2,}. is the entire string, all underscores should be removed.

I’ve gotten to (((.)_){2,}), which does match the first rule, but how can I replace it with the non-underscore characters it found?

The python tag is present because that’s where the code is, and I know regex dialects depend on the language.

>Solution :

The dot in your example code matches any character including an underscore. You can make the pattern a bit more specific instead.

You can get all of the double A-Z matches out of the way, and capture the single A-Z followed by _ and A-Z in a group.

Then for the capture group replace the _ with an empty string.

_?[A-Z]{2,}_?|([A-Z](?:_[A-Z](?![A-Z]))+)

_?[A-Z]{2,}_? Match 2 or more occurences of A-Z surrounded by optional underscores
| or
( Capture group 1
- [A-Z] Match a single A-Z
- (?:_[A-Z](?![A-Z]))+ Repeat 1+ times _ and A-Z asserting not A-Z to the right
) Close group 1

See a regex demo and a Python demo

For example:

import re
pattern = r'_?[A-Z]{2,}_?|([A-Z](?:_[A-Z](?![A-Z]))+)'
s = ("A_B_CDEF_GHI\n"
            "A_B_C_DEF_G_H_I\n"
            "ABC_D_E_F_GHI\n"
            "ABCDEFG_H_I\n"
            "A_B_C")

res = re.sub(pattern, lambda x: x.group(1).replace("_", "") if x.group(1) else x.group(), s)
print(res)

Output

AB_CDEF_GHI
ABC_DEF_GHI
ABC_DEF_GHI
ABCDEFG_HI
ABC

A bit broader match instead of characters A-Z could be using a negated character class matching any char except a whitespace char or underscore

_?[^_\s]{2,}_?|([^_\s](?:_[^_\s](?![^_\s]))+)

regex

byMR

Published March 22, 2022

Add a comment

Simple way to convert list into dict with False values

byMR

March 22, 2022

Questions

Flutter + Provider: Consumer works, Selector doesn't, why?

byMR

March 22, 2022

Questions

Unexpected behaviour using `Date` as a dataframe column

byMR

March 22, 2022

Questions

Creating a heatmap in R

byMR

March 22, 2022

Questions

How do I get this getchar() function inside this while loop to return a value? (C)

byMR

March 22, 2022

Questions

Bash; change directory to folders start with 2* or 21*

byMR

March 22, 2022

How can I convert `A_B_C_DEF` to `ABC_DEF`?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Simple way to convert list into dict with False values

Flutter + Provider: Consumer works, Selector doesn't, why?

Unexpected behaviour using `Date` as a dataframe column

Creating a heatmap in R

How do I get this getchar() function inside this while loop to return a value? (C)

Bash; change directory to folders start with 2* or 21*

Keep Up to Date with the Most Important News

How can I convert `A_B_C_DEF` to `ABC_DEF`?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Simple way to convert list into dict with False values

Flutter + Provider: Consumer works, Selector doesn't, why?

Unexpected behaviour using `Date` as a dataframe column

Creating a heatmap in R

How do I get this getchar() function inside this while loop to return a value? (C)

Bash; change directory to folders start with 2* or 21*

Discover more from Dev solutions