I have a dataset with lots of variation in format like this.
-0.002672945<120>
-0.077635566{600}
5.88365537e-005{500}
-0.116441565{1}
-4.549649974<29.448>
There are all kinds of variety in the end of the values, I need to remove all those weird brackets, problem is sometimes they are 3 characters, some times 6, etc. I also cannot just take first few characters as there are scientific notation numbers such as 8.645637e-007 like this.
Is there a smart way to clear this kind of mess from data?
>Solution :
>>> df = pd.DataFrame({"x": [
... "-0.002672945<120>",
... "-0.077635566{600}",
... "5.88365537e-005{500}",
... "-0.116441565{1}",
... "-4.549649974<29.448>",
... ]})
>>> df["x"].replace(r"[<{].+$", "", regex=True)
0 -0.002672945
1 -0.077635566
2 5.88365537e-005
3 -0.116441565
4 -4.549649974
Name: x, dtype: object
>>>
You can assign that result back into the df then.