I have a dataset, its field contains text information (there are both words and numeric data). As you can see in the screenshot, there are decimal numbers. They are separated by commas, and I need to make sure that there are periods between them.
I have previously tried writing a regex, but it replaces all commas in the text with periods.
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: re.sub(",",'.', str(x)))
How do I write a regex so that it only works for decimal notations of a number? That is, I want an expression in the text of the form: number,number it was like this number.number in text.
Example broke the data
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: re.sub("(\d*)\.(\d*)","\1,\2", str(x)))
Squares appeared 😀
3.
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: re.sub("(\d+)\,(\d+)","\1.\2", str(x)))
>Solution :
The regex you need is "(\d+),(\d+)" to "\1.\2". Decomposition:
(\d+) at least one digit (group 1)
, a literal ,
(\d+) at least one digit (group 2)
replace
\1 group 1
. a period
\2 group 2
Applied to your code, the relevant section would be
lambda x: re.sub(r"(\d+),(\d+)",r"\1.\2", str(x))
Here’s a testbed that verifies this regex is correct


