Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Re Regular expression operations, remove periods?

I’m working with a function I made to split this sample line below to remove the standalone numerical values (123), however it’s also removing the trailing numbers which I need. I also can’t figure out how to remove the "0.0"

ABC/0.0/123/TT1/1TT//

cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9]+$'):
  for line in lines:
    tokens = line.split(delimiter)
    tokens = [re.sub(remove, "", token) for token in tokens]
    clean_list = list(filter(lambda e:e.strip(), tokens))
    cleaned_data.append(clean_list)
    print(clean_list)
split_lines(lines, "/")

What’s coming out now is below, notice the 0. and "TT" that’s missing the trailing 1.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[ABC], [0.], [TT], [1TT]

>Solution :

Try including the start of line anchor (^) as well.

cleaned_data = []
def split_lines(lines, delimiter, remove = '^[0-9.]+$'):
  for line in lines:
    tokens = line.split(delimiter)
    tokens = [re.sub(remove, "", token) for token in tokens]
    clean_list = list(filter(lambda e:e.strip(), tokens))
    cleaned_data.append(clean_list)
    print(clean_list)
split_lines(lines, "/")

I simply changed the default value of the remove parameter to ‘^[0-9.]+$’ which only matches if the entire search string is numbers (or a period).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading