Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reading LAS file using lasio library doesn't handle "-" in datetime column

I have a las file and I am trying to read it in python using lasio library, one of the columns is TIME which is in the following format: 00:00:00.22-04-23

Sample of data copied from las file:

TIME               col1 col2
00:00:00.22-06-23  1010  20
00:00:05.22-06-23  1020  25
00:00:10.22-06-23  1015  32

My code to read the data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = lasio.read(file_path).df().reset_index()

This returns the df in the following format:

TIME               col1 col2 UNKNOWN:1  UNKNOWN:2
00:00:00.22         -06 -23    1010       20
00:00:05.22         -06 -23    1020       25
00:00:10.22         -06 -23    1015       32

As you can see, my TIME column has been split into three columns at every -. The data from col1 and col2 have been shifted to UNKNOWN:1 and UNKNOWN:2 (probably these columns are created by lasio during reading). I need it to return the TIME column as in the original form and avoid shifting the values of col1 and col2, so I can strip, split and manipulate TIME using pandas once it is read into a dataframe.

Any advice is appreciated.

>Solution :

You can try to use pd.read_csv with correct delimiter. For example:

df = pd.read_csv('your_file.txt', sep=r"\s+", engine="python")
print(df)

Prints:

                TIME  col1  col2
0  00:00:00.22-06-23  1010    20
1  00:00:05.22-06-23  1020    25
2  00:00:10.22-06-23  1015    32

EDIT: With updated file:

import re
import pandas as pd
from io import StringIO

with open('your_file.txt', 'r') as f_in:
    data = re.sub(r'\A.*~A', '', f_in.read(), count=1, flags=re.S)
    df = pd.read_csv(StringIO(data), sep=r"\s+", engine="python")

print(df)

Prints:

                TIME     col1  col2   col3
0  00:00:00.23-04-23  1977.47   160  160.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading