Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

read in .txt file , transform into pandas dataframe, but spaces seperating value vary in number of spaces

This script reads in a txt file and creates a df, but the ‘sep’ argument I want to handle values that may be seperated by 1 space or more, so when I run the script above I get many columns with NaN.

code:

df = pd.read_csv(data_file,header = None, sep=' ')

example txt file

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

blah blahh    bl
blah3 blahhe      ble

I want there to just be 3 columns so i get

Col_a  col_b   col_c
blah   blahh    bl
blah3  blahhe   ble

>Solution :

You can use regex as the delimiter:

pd.read_csv(data_file, header=None, delimiter=r"\s+", names='Col_a Col_b Col_c'.split(' '))

Or you can use delim_whitespace=True argument, it’s faster than regex:

pd.read_csv(data_file, header=None, delim_whitespace=True, names='Col_a Col_b Col_c'.split(' '))

Reference: How to read file with space separated values in pandas

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading