I have a dataframe which looks like this:
Market Date Begin Date Settlement
0 2016-01-01 2016-01-01 26.1935
1 2016-01-01 2016-02-01 24.1071
2 2016-01-01 2016-03-01 21.0591
3 2016-01-01 2016-04-01 20.7348
4 2016-01-01 2016-05-01 20.2072
... ... ... ...
265198 2022-09-21 2031-04-01 65.1300
265199 2022-09-21 2031-05-01 65.1300
265200 2022-09-21 2031-06-01 65.1300
265201 2022-09-21 2031-07-01 65.1300
265202 2022-09-21 2031-08-01 65.1300
I wish to change the values of settlement to zero, if the Begin Date doesn’t have data for the entire year.
example output:
Market Date Begin Date Settlement
0 2016-01-01 2018-01-01 0
1 2016-01-01 2018-02-01 0
2 2016-01-01 2019-01-01 15.2789
3 2016-01-01 2019-02-01 20.7348
4 2016-01-01 2019-03-01 26.1552
5 2016-01-01 2019-04-01 24.1891
6 2016-01-01 2019-05-01 21.7891
7 2016-01-01 2019-06-01 20.7356
8 2016-01-01 2019-07-01 20.2072
9 2016-01-01 2019-08-01 20.2172
10 2016-01-01 2019-09-01 20.2272
11 2016-01-01 2019-10-01 20.2372
12 2016-01-01 2019-11-01 20.2572
13 2016-01-01 2019-12-01 20.2772
I don’t want to essentially do a row wise iteration because of obvious reasons. Is there a more Pythonic way of doing it?
>Solution :
Assuming Begin Date has monthly frequency, you can count the number of elements per year and set the years with less than 12 elements to 0:
df.loc[df.groupby(df['Begin Date'].dt.year).\
Settlement.transform('count') < 12, 'Settlement'] = 0