Vectorized way of checking a date column's calendar sequence

I have a dataframe which looks like this:

   Market Date  Begin Date  Settlement
0   2016-01-01  2016-01-01  26.1935
1   2016-01-01  2016-02-01  24.1071
2   2016-01-01  2016-03-01  21.0591
3   2016-01-01  2016-04-01  20.7348
4   2016-01-01  2016-05-01  20.2072
... ... ... ...
265198  2022-09-21  2031-04-01  65.1300
265199  2022-09-21  2031-05-01  65.1300
265200  2022-09-21  2031-06-01  65.1300
265201  2022-09-21  2031-07-01  65.1300
265202  2022-09-21  2031-08-01  65.1300

I wish to change the values of settlement to zero, if the Begin Date doesn’t have data for the entire year.

example output:

   Market Date  Begin Date  Settlement
0   2016-01-01  2018-01-01  0
1   2016-01-01  2018-02-01  0
2   2016-01-01  2019-01-01  15.2789
3   2016-01-01  2019-02-01  20.7348
4   2016-01-01  2019-03-01  26.1552
5   2016-01-01  2019-04-01  24.1891
6   2016-01-01  2019-05-01  21.7891
7   2016-01-01  2019-06-01  20.7356
8   2016-01-01  2019-07-01  20.2072
9   2016-01-01  2019-08-01  20.2172
10  2016-01-01  2019-09-01  20.2272
11  2016-01-01  2019-10-01  20.2372
12  2016-01-01  2019-11-01  20.2572
13  2016-01-01  2019-12-01  20.2772

I don’t want to essentially do a row wise iteration because of obvious reasons. Is there a more Pythonic way of doing it?

>Solution :

Assuming Begin Date has monthly frequency, you can count the number of elements per year and set the years with less than 12 elements to 0:

df.loc[df.groupby(df['Begin Date'].dt.year).\
                     Settlement.transform('count') < 12, 'Settlement'] = 0

Leave a Reply