Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Read .txt as dataframe with pandas

I am trying to read in a text file. The file contains among others the following input:

DE  01945   Ruhland Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4576 13.8664 4
DE  01945   Tettau  Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4333 13.7333 4
DE  01945   Grünewald   Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4    14  4
DE  01945   Guteborn    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4167 13.9333 4
DE  01945   Kroppen Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.3833 13.8    4
DE  01945   Schwarzbach Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.45   13.9333 4
DE  01945   Hohenbocka  Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.431  14.0098 4
DE  01945   Lindenau    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4    13.7333 4
DE  01945   Hermsdorf   Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4055 13.8937 4
DE  01968   Senftenberg Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5252 14.0016 4
DE  01968   Schipkau Hörlitz    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5299 13.9508 
DE  01968   Schipkau    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5456 13.9121 4
DE  01979   Lauchhammer Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4881 13.7662 4

My code looks like this.

import pandas as pd

data = pd.read_csv('DE.txt', sep=" ", header=None)

Currently I am getting the following error that I can’t get past:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ParserError: Error tokenizing data. C error: Expected 2 fields in line 11, saw 3

I think this is due to the two-part city name, how can I read the text file correctly?

>Solution :

You have to read the file normally and parse everything to a dictionary and then create the dataframe.

import pandas as pd

file = open("DE.txt", "r")
lines = file.readlines()
dict = {}
for line in lines:
    //Create your own dictionary as you want to be created using the value in each line and store it in dict
df = pd.DataFrame(data=dict)

Or you can create a 2 dimensional list instead of a dictionary, if this is easier for you, and create the dataframe in the same way.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading