Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace data in the whole Dataframe with a condition

I want to replace every element in a pandas dataframe with an empty string (all columns and all records) if they contain a question mark. I am curious what is a best solution for this.

What I thought of is to write a loop like this:

def modify_dataframe_line_by_line(df) -> None:

    for index, record in df.iterrows():
        for colname in df.columns.tolist():
            if "?" in record[colname]:
                record[colname] = ""

It works, but I assume this will be slow as hell with larger datasets.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I also tried this one but it does not work:

def df_loc_replace(df) -> None:

    for colname in df.columns.tolist():
        df.loc["?" in df[colname], colname] = ""

I also tried df.replace() but I did not find an option to add conditions to that (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)

What is the best solution to this?

>Solution :

Try this:

import numpy as np
for colname in df.columns.tolist():
    df[colname] = np.where(df[colname].str.contains('\?'), '', df[colname])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading