When whorking with pandas, it is necessary to use apply or map function. In my case, I have a very long (14 hours) traitement of my data to do and I want to save the DataFrame if any error raise in the middle.
To be mode concrete, consider the following code
import pandas as pd
from math import log
data = pd.DataFrame()
data['x'] = [1,2,-1,4,5]
try:
data['y'] = data['x'].apply(log)
except Exception as e:
data.to_pickle('data.pkl')
raise(e)
When it is executed, it raise a ValueException when trying to compute log(-1). I would like to save my already computed data and the re-raised the Exception.
Unfortunatly this is not working. When you try
data_bis = pd.read_pickle('data.pkl')
print(data_bis)
You get only the x column. As far as I understand, first pandas create a new Series with the computed values and then append it to the dataframe.
Do you have any idea how to save already computed data before re-raising the exception ?
>Solution :
EDIT:
If you have different function with different Exceptions then solution could be the same – run code in try/except to catch it and replace result with None, numpy.NaN or other value.
EDIT 2:
You can also write data in this function – and append new result to previous reults
def my_log(x):
try:
result = math.log(x)
with open('column_x.csv', 'a') as f:
f.write( f"{result}\n" )
return result
except ValueError:
return None
I would rather use math.log in own try/except to catch problem and send None
def my_log(x):
try:
return log(x)
except ValueError:
return None
data['y'] = data['x'].apply(my_log)
Or I would use numpy.log() because it returns NaN instead of rasing error:
import numpy as np
data['z'] = data['x'].apply(np.log)
Full working example:
import pandas as pd
import math
import numpy as np
def my_log(x):
try:
return math.log(x)
except ValueError:
return None
data = pd.DataFrame()
data['x'] = [1,2,-1,4,5]
data['y'] = data['x'].apply(my_log)
data['z'] = data['x'].apply(np.log)
print(data)
Result:
x y z
0 1 0.000000 0.000000
1 2 0.693147 0.693147
2 -1 NaN NaN
3 4 1.386294 1.386294
4 5 1.609438 1.609438