I have a dataframe containing a column that has a list of dates stored as strings:
# sample dataframe
data = [[1, ["2019-08-02 08:30:56"]], [2, ["2020-08-02 08:30:56"]]]
df = pd.DataFrame(data, columns=["items", "dates"])
df["dates"] = df["dates"].astype(str)
df
items dates
0 1 ['2019-08-02 08:30:56']
1 2 ['2020-08-02 08:30:56']
I would like to do several things:
- Convert from a list to a string.
- Convert the string to a date.
- Eliminate the time stamp.
So that the final dataset would look like this:
items dates
0 1 2019-08-02
1 2 2020-08-02
I am able to remove the list brackets by doing this:
df["dates_2"] = df["dates"].apply(lambda x: x[1:-1])
But I am wondering if there is a better way to do all of these things in one step?
>Solution :
Another solution is to apply ast.literal_eval to convert the column to list. Then .explode() it, convert to datetime and get a date part:
from ast import literal_eval
df["dates"] = df["dates"].apply(literal_eval)
df = df.explode("dates")
df["dates"] = pd.to_datetime(df["dates"]).dt.date
print(df)
Prints:
items dates
0 1 2019-08-02
1 2 2020-08-02