Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

List all the dates in a column in a pandas dataframe

I am beginner to Python and Pandas

The following is a part of my pandas dataframe.

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to extract all the dates in the INTERVAL_DATETIME column.

Like 16/11/2020, 31/05/2020…….21/11/2023. I do not want dates to be duplicated,

I could not find a way to do this.

I would be thankful if someone could help me.

>Solution :

Use:

df = pd.DataFrame( {'INTERVAL_DATETIME': ['6/11/2020 16:30', 
                                          '6/11/2020 17:00', 
                                          '31/05/2020 18:00',
                                          '31/05/2020 18:30',
                                          '21/11/2023 18:00']})

print (df)
  INTERVAL_DATETIME
0   6/11/2020 16:30
1   6/11/2020 17:00
2  31/05/2020 18:00
3  31/05/2020 18:30
4  21/11/2023 18:00

L = df.INTERVAL_DATETIME.str.split().str[0].unique().tolist()
print (L)
['6/11/2020', '31/05/2020', '21/11/2023']

Explanation:

First use Series.str.split by space, so no sep parameter – output are lists:

print (df.INTERVAL_DATETIME.str.split())
0     [6/11/2020, 16:30]
1     [6/11/2020, 17:00]
2    [31/05/2020, 18:00]
3    [31/05/2020, 18:30]
4    [21/11/2023, 18:00]
Name: INTERVAL_DATETIME, dtype: object

Then select first lists by indexing str[0]:

print (df.INTERVAL_DATETIME.str.split().str[0])
0     6/11/2020
1     6/11/2020
2    31/05/2020
3    31/05/2020
4    21/11/2023
Name: INTERVAL_DATETIME, dtype: object

Get unique values to numpy array by Series.unique:

print (df.INTERVAL_DATETIME.str.split().str[0].unique())
['6/11/2020' '31/05/2020' '21/11/2023']

… and last convert to list:

print (df.INTERVAL_DATETIME.str.split().str[0].unique().tolist())
['6/11/2020', '31/05/2020', '21/11/2023']

Or use Series.str.extract for get values before first space:

L = df.INTERVAL_DATETIME.str.extract(r'^(.*)\s+', expand=False).unique().tolist()
print (L)
['6/11/2020', '31/05/2020', '21/11/2023']

If need string dates in format YYYY-MM-DD solution is use to_datetime with Series.dt.strftime:

L = (pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True)
       .dt.strftime('%Y-%m-%d').unique().tolist())
print (L)
['2020-11-06', '2020-05-31', '2023-11-21']

For dates use Series.dt.date instead strftime:

L = pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True).dt.date.unique().tolist()
print (L)
[datetime.date(2020, 11, 6), datetime.date(2020, 5, 31), datetime.date(2023, 11, 21)]

For completness is using Series.dt.normalize for Timestamps without times (it means 00:00:00):

L = pd.to_datetime(df.INTERVAL_DATETIME, dayfirst=True).dt.normalize().unique().tolist()
print (L)

[Timestamp('2020-11-06 00:00:00'), 
 Timestamp('2020-05-31 00:00:00'), 
 Timestamp('2023-11-21 00:00:00')]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading