Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: faster string operations in dataframes

I am working on a python script that read data from a database and save this data into a .csv file.
In order to save it correctly I need to escape different characters such as \r\n or \n.
Here is how I am currently doing it:

Firstly, I use the read_sql pandas function in order to read the data from the database.

import pandas as pd

df = pd.read_sql(
    sql = 'SELECT * FROM exampleTable',
    con = SQLAlchemyConnection
)

The table I get has different types of values.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Then, the script updates the dataframe obtained changing every string value to raw string.
In order to achive that I use two nested for loops in order to operate with every single value.

def update_df(df)
    for rowIndex, row in df.iterrows():
        for colIndex, values in row.items():
           if isinstance(df[rowIndex, colIndex], str):
               df.at[rowIndex, colIndex] = repr(df.at[rowIndex, colIndex])
    return df

However, the amount of data I need to elaborate is large (more than 1 million rows with more than 100 columns) and it takes hours.

What I need is a way to create the csv file in a faster way.

Thank you in advance.

>Solution :

It should be faster to use applymap if really you have mixed types:

df = df.applymap(lambda x: repr(x) if isinstance(x, str) else x)

However, if you can identify string columns, then you can slice them, (maybe in combination with re.escape?).:

import re
str_cols = ['col1', 'col2']
df[str_cols] = df[str_cols].applymap(re.escape)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading