Home Can we force Python/Pandas to flush to disk immediately?

Questions

Can we force Python/Pandas to flush to disk immediately?

October 5, 2022

I have a setup where a python script (let’s call it test1.py) is spawning a subprocess which executes test2.py. In test2.py, I have some pandas operations which ultimately builds a dataframe test. The final step in test2.py is saving the dataframe to csv (test.to_csv('my_path')). On completion of test2.py, test1.py continues execution and the next step required is to load the same csv file created (i.e., test = pd.read_csv('my_path')).

Now, the issue is that Python is not flushing the buffer to disk, and therefore, when test1.py goes to read the csv file, I get a FileNotFoundError. Of course, if I stop the script, the file is saved to disk. Is there a way to force pandas to flush to disk immediately? I’ve read about using file.flush() and os.fsync(fd) – but this don’t seem to apply to my case since I’m not dealing with any file descriptors.

EDIT: Added a (significantly) simplified example

test1.py looks something like:

import subprocess


def main():
    cmd = ['python3', 'test2.py']
    output_bytes = subprocess.check_output(cmd, stderr=subprocess.STDOUT, timeout=900)
    output = output_bytes.decode('utf-8')
    # test2.py finished, so I want to read the csv
    df = pd.read_csv('my_path')


if __name__ == '__main__':
    main()

test2.py looks something like:

import pandas as pd
import numpy as np

def main():
    df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
    df.to_csv('my_path')

if __name__ == '__main__':
    main()

>Solution :

but this don’t seem to apply to my case since I’m not dealing with any
file descriptors.

You do not have to use filename as 1st argument for .to_csv, as pandas.DataFrame.to_csv docs says you might use

file-like object implementing a write() function.

therefore you can do something like this

import pandas as pd
df = pd.DataFrame({"x":[1,2,3]})
f = open("file.csv","w",newline="")
df.to_csv(f)
f.flush()
f.close()

Observe that if you open file in non-binary mode, then you need to disengage universal newlines.

subprocess

byMR

Published October 05, 2022

Add a comment

How to overlay one list on the second using Stack in Flutter

byMR

October 5, 2022

Questions

error: use of undeclared identifier 'localhost'

byMR

October 5, 2022

Questions

Returning the value of a specific array[index] – C

byMR

October 5, 2022

Questions

Returning the value of a specific array[index] – C

byMR

October 5, 2022

Questions

Pandas filter with month

byMR

October 5, 2022

Questions

Why does a comma transform the type into Line2D of an Axes?

byMR

October 5, 2022

Can we force Python/Pandas to flush to disk immediately?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to overlay one list on the second using Stack in Flutter

error: use of undeclared identifier 'localhost'

Returning the value of a specific array[index] – C

Returning the value of a specific array[index] – C

Pandas filter with month

Why does a comma transform the type into Line2D of an Axes?

Keep Up to Date with the Most Important News

Can we force Python/Pandas to flush to disk immediately?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to overlay one list on the second using Stack in Flutter

error: use of undeclared identifier 'localhost'

Returning the value of a specific array[index] – C

Returning the value of a specific array[index] – C

Pandas filter with month

Why does a comma transform the type into Line2D of an Axes?

Discover more from Dev solutions