Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how can I check a column value=NAN for a datetime row value being the first of the month?

That was the clearest way I could of asked the question I do apologize. I have monthly data like this, with only the first of the month having a data point

city   time        value
London 2000-01-01  5
London 2000-01-02  nan
London 2000-01-03  nan
..
London 2000-01-31  nan
London 2000-02-01  nan
London 2000-02-02  nan
London 2000-02-01  nan
...
London 2000-02-31  nan
London 2000-03-01  3
London 2000-01-01  nan
..

I basically want to do this following statement in pandas form:

If value = nan for timestamps with day = 1, replace that first of the month value with -1. I am struggling with the python sub sectioning notation using a condition as a mask.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So from above I want my data to then look like

city   time        value
London 2000-01-01  5
London 2000-01-02  nan
London 2000-01-03  nan
..
London 2000-01-31  nan
London 2000-02-01  -1
London 2000-02-02  nan
London 2000-02-01  nan
...
London 2000-02-31  nan
London 2000-03-01  3
London 2000-01-01  nan
..

but it obviously continues and there are thousands of rows.

edit-

Below is what I am starting to attempt:
So I saw online that I can make a condition and then use df.loc(that condition) to subsection the data so something like

mask = (df.time.dt.day==1)

So I believe this subsections the times for day=1 but I am not sure how to proceed.

>Solution :

Use numpy.where with pd.to_datetime, Series.eq and Series.isna:

In [503]: import numpy as np

# Convert 'time' column into pandas datetime
In [499]: df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')

In [504]: df['value'] = np.where(df['time'].dt.day.eq(1) & df['value'].isna(), -1, df['value'])

In [505]: df
Out[505]: 
     city       time  value
0  London 2000-01-01    5.0
1  London 2000-01-02    NaN
2  London 2000-01-03    NaN
3  London 2000-01-31    NaN
4  London 2000-02-01   -1.0
5  London 2000-02-02    NaN
6  London 2000-02-01   -1.0
7  London 2000-03-01    3.0
8  London 2000-01-01   -1.0

OR use df.loc:

In [499]: df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
In [510]: df.loc[df['time'].dt.day.eq(1) & df['value'].isna(), 'value'] = -1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading