Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does date format change with np.where and how do you stop it from happening

I have the following subset of data from a dataframe.

{'NID': {131598: '215026851',
  131599: '215026851',
  131600: '215026851',
  131601: '215026851',
  131602: '215026851',
  131603: '215026851',
  131604: '215026851',
  131605: '215026851',
  131606: '215026851'},
 'AbCode': {131598: 0,
  131599: 0,
  131600: 0,
  131601: 0,
  131602: 0,
  131603: 1,
  131604: 0,
  131605: 0,
  131606: 0},
 'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
  131599: Timestamp('2019-01-25 00:00:00'),
  131600: NaT,
  131601: Timestamp('2019-11-08 00:00:00'),
  131602: Timestamp('2020-01-24 00:00:00'),
  131603: Timestamp('2020-02-15 00:00:00'),
  131604: Timestamp('2020-10-16 00:00:00'),
  131605: Timestamp('2020-10-26 00:00:00'),
  131606: NaT}}

When formatted the data looks like below

          NID     AbCode  ABdat
131598  215026851   0   2018-01-24
131599  215026851   0   2019-01-25
131600  215026851   0   NaT
131601  215026851   0   2019-11-08
131602  215026851   0   2020-01-24
131603  215026851   1   2020-02-15
131604  215026851   0   2020-10-16
131605  215026851   0   2020-10-26
131606  215026851   0   NaT 

I would like to replace the ABdat with missing (NaT) for AbCode = 0 and replace ABdat with ABdat-7days for AbCode = 1

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I wrote the following np.where code below to do this.

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)

The output is presented below

          NID   AbCode       ABdat
131598  215026851   0   <bound method Series.isnull of 49017 ...
131599  215026851   0   <bound method Series.isnull of 49017 ...
131600  215026851   0   <bound method Series.isnull of 49017 ...
131601  215026851   0   <bound method Series.isnull of 49017 ...
131602  215026851   0   <bound method Series.isnull of 49017 ...
131603  215026851   1   1581120000000000000
131604  215026851   0   <bound method Series.isnull of 49017 ...
131605  215026851   0   <bound method Series.isnull of 49017 ...
131606  215026851   0   <bound method Series.isnull of 49017 ...

Could you please advise why the date format is changing and how I can avoid this from happening?

Thanks

>Solution :

Simpluiest is use some pandas solutions with pandas method e.g. Series.where:

breed_info['ABdat'] = (breed_info['ABdat'] - pd.DateOffset(days=7))
                                                        .where(breed_info.AbCode == 1)

With np.where hacky solution with helper Series:

breed_info['ABdat'] = np.where(breed_info.AbCode == 1,
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                                pd.Series(pd.NaT, index=breed_info.index))
print (breed_info)
              NID  AbCode      ABdat
131598  215026851       0        NaT
131599  215026851       0        NaT
131600  215026851       0        NaT
131601  215026851       0        NaT
131602  215026851       0        NaT
131603  215026851       1 2020-02-08
131604  215026851       0        NaT
131605  215026851       0        NaT
131606  215026851       0        NaT

because if passing pd.NAT it return underline numpy array (in nanoseconds):

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, 
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                               pd.NaT)
print (breed_info)
              NID  AbCode                ABdat
131598  215026851       0                  NaT
131599  215026851       0                  NaT
131600  215026851       0                  NaT
131601  215026851       0                  NaT
131602  215026851       0                  NaT
131603  215026851       1  1581120000000000000
131604  215026851       0                  NaT
131605  215026851       0                  NaT
131606  215026851       0                  NaT

I think reason is bug.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading