I have a las file and I am trying to read it in python using lasio library, one of the columns is TIME which is in the following format: 00:00:00.22-04-23
Sample of data copied from las file:
TIME col1 col2
00:00:00.22-06-23 1010 20
00:00:05.22-06-23 1020 25
00:00:10.22-06-23 1015 32
My code to read the data:
df = lasio.read(file_path).df().reset_index()
This returns the df in the following format:
TIME col1 col2 UNKNOWN:1 UNKNOWN:2
00:00:00.22 -06 -23 1010 20
00:00:05.22 -06 -23 1020 25
00:00:10.22 -06 -23 1015 32
As you can see, my TIME column has been split into three columns at every -. The data from col1 and col2 have been shifted to UNKNOWN:1 and UNKNOWN:2 (probably these columns are created by lasio during reading). I need it to return the TIME column as in the original form and avoid shifting the values of col1 and col2, so I can strip, split and manipulate TIME using pandas once it is read into a dataframe.
Any advice is appreciated.
>Solution :
You can try to use pd.read_csv with correct delimiter. For example:
df = pd.read_csv('your_file.txt', sep=r"\s+", engine="python")
print(df)
Prints:
TIME col1 col2
0 00:00:00.22-06-23 1010 20
1 00:00:05.22-06-23 1020 25
2 00:00:10.22-06-23 1015 32
EDIT: With updated file:
import re
import pandas as pd
from io import StringIO
with open('your_file.txt', 'r') as f_in:
data = re.sub(r'\A.*~A', '', f_in.read(), count=1, flags=re.S)
df = pd.read_csv(StringIO(data), sep=r"\s+", engine="python")
print(df)
Prints:
TIME col1 col2 col3
0 00:00:00.23-04-23 1977.47 160 160.5