Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Multi-threading improvement when reading in CSV using Julia

Wondering if the below makes sense. The Data.csv file is about 8 GB. My laptop has 64GB of RAM with 12 threads. Is this the kind of improvement I should see from multi-threading? Or is there something else I should do here?

@time CSV.read(raw"Data.csv", DataFrame, ntasks=1); # one thread

139.160430 seconds

@time CSV.read(raw"Data.csv", DataFrame, ntasks=8); # 8 threads

113.964781 seconds

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

@time CSV.read(raw"Data.csv", DataFrame, ntasks=12); # 12 threads

112.279668 seconds

As indicated in the above, I tried these different ntasks= options to select different thread counts, but I am new to multi-threading, so trying to get a sense of the level of improvement I should expect.

>Solution :

When reading data from disk, the bottleneck will typically be the disk, so adding threads will not improve that. The speedup you see might be a little parsing improvement.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading