Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Saving pandas DataFrame in a middle of a of a apply loop

When whorking with pandas, it is necessary to use apply or map function. In my case, I have a very long (14 hours) traitement of my data to do and I want to save the DataFrame if any error raise in the middle.

To be mode concrete, consider the following code

import pandas as pd
from math import log

data = pd.DataFrame()
data['x'] = [1,2,-1,4,5]

try:
    data['y'] = data['x'].apply(log)
except Exception as e:
    data.to_pickle('data.pkl')
    raise(e)

When it is executed, it raise a ValueException when trying to compute log(-1). I would like to save my already computed data and the re-raised the Exception.
Unfortunatly this is not working. When you try

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data_bis = pd.read_pickle('data.pkl')
print(data_bis)

You get only the x column. As far as I understand, first pandas create a new Series with the computed values and then append it to the dataframe.

Do you have any idea how to save already computed data before re-raising the exception ?

>Solution :

EDIT:

If you have different function with different Exceptions then solution could be the same – run code in try/except to catch it and replace result with None, numpy.NaN or other value.

EDIT 2:

You can also write data in this function – and append new result to previous reults

def my_log(x):
    try:
        result = math.log(x)
        with open('column_x.csv', 'a') as f:
             f.write( f"{result}\n" )
        return result
    except ValueError:
        return None    

I would rather use math.log in own try/except to catch problem and send None

def my_log(x):
    try:
        return log(x)
    except ValueError:
        return None    

data['y'] = data['x'].apply(my_log)

Or I would use numpy.log() because it returns NaN instead of rasing error:

import numpy as np

data['z'] = data['x'].apply(np.log)

Full working example:

import pandas as pd
import math
import numpy as np

def my_log(x):
    try:
        return math.log(x)
    except ValueError:
        return None    

data = pd.DataFrame()
data['x'] = [1,2,-1,4,5]
    
data['y'] = data['x'].apply(my_log)
data['z'] = data['x'].apply(np.log)

print(data)

Result:

   x         y         z
0  1  0.000000  0.000000
1  2  0.693147  0.693147
2 -1       NaN       NaN
3  4  1.386294  1.386294
4  5  1.609438  1.609438
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading