I am not good at regex and have been trying & failing to extract only numbers, decimals and - from a column in python.
Even better if spaces can also be removed but if not even then it is still manageable.
I have tested ^(\d.+)|[-] and ^(\d.+)|[-]?[^a-z]+$/i and ^(\d.+)|[-]?(\d+)? but none of them worked correctly.
Test Cases (Basically these are Ranges from inconsistent format)
28193.13
28913
28913-13
28193.13-28193.13
28193.13 - 28193.13
28193.13 - 28193.13 / cm
- 28193.13
-28193.13
28913-
28913 -
Desired Results on above cases:
28193.13
28913
28913-13
28193.13-28193.13
28193.13-28193.13
28193.13-28193.13
-28193.13
-28193.13
28913-
28913-
Issue: I am unable to remove characters from this 28193.13 - 28193.13 / cm with my regex code as the desired result from this would be 28193.13-28193.13.
Tool: I have used this regex test website to test regex code
Appreciate any help.
>Solution :
I think this is a good fix with regex.
# Define the regex pattern to match numbers, decimals, and hyphens
pattern = r"[\d\.]+(?:-\d+)?"
# Find all matches
matches = re.findall(pattern, value)
# Join matches to form the cleaned value
cleaned_value = ''.join(matches)
return cleaned_value
You can also sovle this a bit more robustly:
def clean_value(value):
# Filter out characters that are not digits, period, or hyphen
cleaned_value = ''.join([char for char in value if char.isdigit() or char in ['.', '-']])
return cleaned_value
# Apply the cleaning function to the data
cleaned_data = [clean_value(val) for val in data]```