Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Getting an error that i should not be getting

I am trying to get a percentage by dividing the numbers from one column with another column but i keep getting the same error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-60166e8a919c> in <module>()
      6 dataLake = dataLake[['day','Agent','Resolved','Meta','Week','Year']]
      7 #Creating new data (atingimento)
----> 8 dataLake["atingimento"] = ((dataLake.Resolved.astype(int) / dataLake.Meta.astype(int)) * 100)
      9 dataLake['Resolved'] = dataLake.Resolved.astype(int)
     10 dataLake['Meta'] = dataLake.Meta.astype(str)

4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    972         # work around NumPy brokenness, #1987
    973         if np.issubdtype(dtype.type, np.integer):
--> 974             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
    975 
    976         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: ''

I tried converting both data sets to int using .astype(int) but it does not work as you can see from the data set below some how the google colab is reading the column ‘Meta’ as string even though its in the same format as the column Resolved.

           day  |             Agent | Resolved |   Meta |Week | Year
-------------------------------------------------------------------------
103 2021-01-26  |   Ana Carolina B. |     107  |2525252525    4  2021
104 2021-01-25  |       Bárbara D.  |   275    |3831252128    4  2021
105 2021-01-25  |          Danielly |   192    |3831252128    4  2021
106 2021-01-26  |   Felipe Pereira  | 102      |3125212822    4  2021
107 2021-01-26  |Fernanda Favalessa |207       |3125212822    4  2021
108 2021-01-25  |           Guto R. |215       |3831252114    4  2021
109 2021-01-25  |        Helaine S. |   253    |  3831252114    4  2021
110 2021-01-25  |           João M. |   145    |   38252128    4  2021
111 2021-01-25  |           João P. |    173   | 3535353535    4  2021
112 2021-01-26  |     Livia Azeredo |     89   |3125212822    4  2021
113 2021-01-26  |       Lucas Alves |     70   |1815101320    4  2021
114 2021-01-25            Paula P.  |    137   |3831252114    4  2021

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You might want to use pandas.to_numeric that can convert the invalid data to NaN (and then fillna with a default value if needed):

in place of:

dataLake.Resolved.astype(int)

Use:

pd.to_numeric(dataLak['Resolved'], errors='coerce')
# or
pd.to_numeric(dataLak['Resolved'], errors='coerce').fillna(-1) # -1 if invalid

etc. for all other occurrences

Example:

pd.to_numeric(pd.Series(['1', '   12  ', '']), errors='coerce')

output:

0     1.0
1    12.0
2     NaN
dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading