I have the following csv file which I am reading using pandas dataframe:
Timestamp, UTC, id, loc, spd
001, 12z, q20, "52, 13", 320
002, 13z, a32, "53, 12", 321
003, 14z, q32, "54, 11", 321
004, 15`, a43, "55, 10", 330
I am extracting the data as follows:
import pandas as pd
import matplotlib.pyplot as plt
fname = "data.csv"
data = pd.read_csv(fname,sep=",", header=None, skiprows=1)
data.columns = ["Timestamp", "UTC", "Callsign", "Position", "Speed", "Direction"]
t = data["Timestamp"]
utc = data["UTC"]
acid = data["Callsign"]
pos = data["Position"]
spd = ["Speed"]
However, for the position column, this results in 2 values per row in this column. I would like to have the first value of position in a separate list as well as the second value in a separate list as follows:
latitude = [52,53,54,55]
longitude = [13,12,11,10]
How do I select this using the pandas dataframe?
>Solution :
We can use str.extract here, followed by a cast:
data[["lat", "lng"]] = data["Position"].str.extract(r'(-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?)')
data["lat"] = data["lat"].astype(float)
data["lng"] = data["lng"].astype(float)