That was the clearest way I could of asked the question I do apologize. I have monthly data like this, with only the first of the month having a data point
city time value
London 2000-01-01 5
London 2000-01-02 nan
London 2000-01-03 nan
..
London 2000-01-31 nan
London 2000-02-01 nan
London 2000-02-02 nan
London 2000-02-01 nan
...
London 2000-02-31 nan
London 2000-03-01 3
London 2000-01-01 nan
..
I basically want to do this following statement in pandas form:
If value = nan for timestamps with day = 1, replace that first of the month value with -1. I am struggling with the python sub sectioning notation using a condition as a mask.
So from above I want my data to then look like
city time value
London 2000-01-01 5
London 2000-01-02 nan
London 2000-01-03 nan
..
London 2000-01-31 nan
London 2000-02-01 -1
London 2000-02-02 nan
London 2000-02-01 nan
...
London 2000-02-31 nan
London 2000-03-01 3
London 2000-01-01 nan
..
but it obviously continues and there are thousands of rows.
edit-
Below is what I am starting to attempt:
So I saw online that I can make a condition and then use df.loc(that condition) to subsection the data so something like
mask = (df.time.dt.day==1)
So I believe this subsections the times for day=1 but I am not sure how to proceed.
>Solution :
Use numpy.where with pd.to_datetime, Series.eq and Series.isna:
In [503]: import numpy as np
# Convert 'time' column into pandas datetime
In [499]: df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
In [504]: df['value'] = np.where(df['time'].dt.day.eq(1) & df['value'].isna(), -1, df['value'])
In [505]: df
Out[505]:
city time value
0 London 2000-01-01 5.0
1 London 2000-01-02 NaN
2 London 2000-01-03 NaN
3 London 2000-01-31 NaN
4 London 2000-02-01 -1.0
5 London 2000-02-02 NaN
6 London 2000-02-01 -1.0
7 London 2000-03-01 3.0
8 London 2000-01-01 -1.0
OR use df.loc:
In [499]: df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
In [510]: df.loc[df['time'].dt.day.eq(1) & df['value'].isna(), 'value'] = -1