Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Recursion error: Dataprep function not working post cleaning data

Using dataprep API and I am getting a recursion error when I use the dataprep functions in Google Colab. Oddly it works fine on 144 features of uncleaned data. But once reduced to 20 features and clean the missing values, I get a recursion error

Code:

df.isna().sum()
Output:
rade                      0
sub_grade                 0
emp_length                0
home_ownership            0
annual_inc                0
verification_status       0
loan_status               0
purpose                   0
dti                       0
delinq_2yrs               0
inq_last_6mths            0
mths_since_last_delinq    0
open_acc                  0
pub_rec                   0
revol_bal                 0
revol_util                0
total_acc                 0
recoveries                0
pub_rec_bankruptcies      0
tax_liens                 0
dtype: int64

sys.setrecursionlimit(15000)
    
from dataprep.eda import create_report, plot, plot_correlation

create_report(df)

Error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-55-463fb2fdfb17> in <module>
----> 1 create_report(df)

33 frames
... last 10 frames repeated, from the frame below ...

/usr/local/lib/python3.8/dist-packages/pandas/core/series.py in __repr__(self)
   1463         show_dimensions = get_option("display.show_dimensions")
   1464 
-> 1465         self.to_string(
   1466             buf=buf,
   1467             name=self.name,

RecursionError: maximum recursion depth exceeded

Following the advice of the first answer, I was able to go through one series at a time and it looks like this code is causing the issue. How can this be written better?

# these columns will take the median value for fillna
median_fill = ['emp_length','annual_inc','open_acc','pub_rec','open_acc','revol_util','total_acc']
for med in median_fill:
  df[med].fillna(df[med].median,inplace=True)

>Solution :

You omitted important details from the stack trace.

But if I had to guess, here’s what’s happening.

Something in create_report wound up calling repr(foo),
where foo is a complex custom object.

In the course of computing self.to_string( ... )
we wound up accidentally calling either to_string or repr(foo) again.
Essentially a while True: loop.
So .setrecursionlimit() won’t help.


You want to understand what foo is all about,
in order to properly diagnose the root cause
and then fix this.

Start with a simpler report,
and build up to the point where you trigger the error.


EDIT

You wrote

  df[med].fillna(df[med].median, inplace=True)

Don’t do that. Rather than inplace, prefer this:

  df[med] = df[med].fillna(df[med].median)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading