I have a Pandas DataFrame comprising of records of tropical cyclone (TC) position, intensity, and timing. It’s arranged as such:
Pertinent pieces of data here include the CY which is essentially a unique identifier number for the TC in the basin that year, and the YYYYMMDDHH column, which is the date that the observation is being entered.
I wish to determine for with dates the storm in question exists in the database 72 hours afterward. Currently I’m attempting to do something of the following sort:
deck.loc[((deck['BASIN'] == deck['BASIN']) & (deck['CY'] == deck['CY'])
& (len(deck['YYYYMMDDHH'] == (deck['YYYYMMDDHH'] + datetime.timedelta(hours=72)))) > 0)]
However, this isn’t giving me the boolean series I desire for the indexing. I’ve also considered doing things row-by-row, but know that this goes against the Pandas philosophy.
>Solution :
Since you posted data as an image, I recreated a simple example.
We’re checking if the last time per storm minus the current time is greater to or equal 72 hours
df = pd.DataFrame({'storm':['A','B'],
'time':[pd.date_range(periods=75, freq='H', start='01-01-2022 00:00:00'),
pd.date_range(periods=10, freq='H', start='01-01-2022 00:00:00')]}).explode('time')
((df.groupby('storm')['time'].transform('max') - df['time']).dt.total_seconds() / 3600).ge(72)
