Python Currency string conversion to float

This is the list:

x = ["111,222","111.222","111,222.11","111.222,11","111111","111.22"]

I would like that it will convert correctly without using locale and will work with the set above as the original data set is a mess

I have tried

  • regex

  • float("".join(["".join([i for i in list(s)[0:-3] if i not in [".",","]]),"".join(list(s)[-3:]).replace(",",".")]) if list(s)[-3] in [".",","] else "".join(list(s)))

  • r = re.sub('[^0-9]', '', s) export = float(r[0:-2]+'.'+r[-2:])

  •    comma = ","
       dot = "."
    
       last_comma_index = s.rfind(comma)
       last_dot_index = s.rfind(dot)
    
       if last_comma_index > last_dot_index:
           last_index = last_comma_index
       else:
           last_index = last_dot_index
    
       before_point = s[:last_index]
    
       no_commas = "".join(before_point.split(comma))
       no_dots = "".join(no_commas.split(dot))
    
       export  = no_dots + dot + s[last_index + 1:]
    
  •        s = "".join(c for c in s if c.isdigit() or c in [",", "."])
    
           if "," in s:
               decimal_sep = ","
               thousands_sep = "."
           else:
               decimal_sep = "."
               thousands_sep = ","
    
           s = s.replace(decimal_sep, ".")
           s = re.sub(f"\\{thousands_sep}(?=[0-9])", "", s)
           export = s
    
  •        def parseNumber(text):
    
           try:
               # First we return None if we don't have something in the text:
               if text is None:
                   return None
               if isinstance(text, int) or isinstance(text, float):
                   return text
               text = text.strip()
               if text == "":
                   return None
               # Next we get the first "[0-9,. ]+":
               n = re.search("-?[0-9]*([,. ]?[0-9]+)+", text).group(0)
               n = n.strip()
               if not re.match(".*[0-9]+.*", text):
                   return None
               # Then we cut to keep only 2 symbols:
               while " " in n and "," in n and "." in n:
                   index = max(n.rfind(','), n.rfind(' '), n.rfind('.'))
                   n = n[0:index]
               n = n.strip()
               # We count the number of symbols:
               symbolsCount = 0
               for current in [" ", ",", "."]:
                   if current in n:
                       symbolsCount += 1
               # If we don't have any symbol, we do nothing:
               if symbolsCount == 0:
                   pass
               # With one symbol:
               elif symbolsCount == 1:
                   # If this is a space, we just remove all:
                   if " " in n:
                       n = n.replace(" ", "")
                   # Else we set it as a "." if one occurence, or remove it:
                   else:
                       theSymbol = "," if "," in n else "."
                       if n.count(theSymbol) > 1:
                           n = n.replace(theSymbol, "")
                       else:
                           n = n.replace(theSymbol, ".")
               else:
                   # Now replace symbols so the right symbol is "." and all left are "":
                   rightSymbolIndex = max(n.rfind(','), n.rfind(' '), n.rfind('.'))
                   rightSymbol = n[rightSymbolIndex:rightSymbolIndex+1]
                   if rightSymbol == " ":
                       return parseNumber(n.replace(" ", "_"))
                   n = n.replace(rightSymbol, "R")
                   leftSymbolIndex = max(n.rfind(','), n.rfind(' '), n.rfind('.'))
                   leftSymbol = n[leftSymbolIndex:leftSymbolIndex+1]
                   n = n.replace(leftSymbol, "L")
                   n = n.replace("L", "")
                   n = n.replace("R", ".")
               # And we cast the text to float or int:
               n = float(n)
    
               if n > 5000000:
                   return 0
               elif n.is_integer():
                   return int(n)
               else:
                   return n
    
           except: pass
    
           return None
    
  • Decimal

  • locale

    • Most of the answers in StackOverflow in around locale but have to avoid it as data messed up…

The result should be like this:

x = [111222,111222,111222.11,111222.11,111111,111.22]

Looking forward for any suggestions.

>Solution :

Try replacing the commas with dots so that the separators are all the same, then split on the separator and check if the rightmost chunk is of length 3.

Since no currencies (that I know of) use three decimal places for their fractional amounts, if the right chunk is of length 3, it must be part of a whole number. Otherwise it must be part a float.

x = ["111,222","111.222","111,222.11","111.222,11","111111","111.22"]

def from_currency(x: str):
    x = x.replace(',', '.')
    if not x.count('.'):
        return int(x)
    *whole_parts, frac = x.split('.')
    if len(frac) == 3:
        return int(''.join([*whole_parts, frac]))
    else:
        whole = ''.join(whole_parts)
        return float(f'{whole}.{frac}')


[from_currency(c) for c in x]
# returns 
[111222,
 111222,
 111222.11,
 111222.11,
 111111,
 111.22]

Leave a Reply