Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas remove characters from a column of strings

I have a dataframe with a Date column consisting of stings in this format. I need to strip the end of the strin so that I can convert to a datetime object.

"20231101 05:00:00 America/New_York"
"20231101 06:00:00 America/New_York"

I have tried these approaches unsuccessfully.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['Date'] = df['Date'].replace('^.*\]\s*', '', regex=True)
df['Date'] = df['Date'].str.strip(' America/New_York')
df['Date'] = df['Date'].map(lambda x: x.rstrip(' America/NewYork'))``

as well as a couple of others based on my searches. Is there an easy way to do this or should I write a function to slice the string by grabbing the first 17 characters and assigning the result back to the df. Note the string could be of the form ‘20231101 05:00:00 America/Central’
Thanks for any and all assistance.

>Solution :

  • If you want to remove a particular suffix, then I recommend str.removesuffix rather than str.strip.
  • Notice that you sometimes write New_York with an underscore and sometimes NewYork without an underscore. If you ask to remove 'NewYork' then 'New_York' won’t be removed.
  • After the edit in your question, the suffixes all start with ' America' but differ afterwards; in this case you could use str.split(' America').str[0] to keep everything before ' America'.
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Date': ["20231101 05:00:00 America/New_York",
             "20231101 06:00:00 America/New_York",
             "20231101 07:00:00 America/Central"]
})

# df['Date'] = df['Date'].str.removesuffix(' America/New_York')
df['Date'] = df['Date'].str.split(' America').str[0]

print(df)
#      Name               Date
# 0    Alice  20231101 05:00:00
# 1      Bob  20231101 06:00:00
# 2  Charlie  20231101 07:00:00
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading