Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

count if event occurs per city

I am trying to count how often an event occurs before another event occurs per city. To be more specific I am trying to count how often event number 21 occurrs before 2, and the other way around.

The dataset looks like this

import pandas as pd

data = { 'city': ['Amsterdam', 'Vienna', 'Paris', 'Paris',  'Istanbul', 'Istanbul','Delhi', 'London', 'London', 'Barcelona', 'Barcelona'], 
        'date': [ '2022-09-01T11:34:53', '2022-09-01T13:37:37', '2022-09-01 10:44:22.000', '2022-09-01T10:39:33', '2022-09-01 16:18:24.000', '2022-09-01T16:15:14', '2022-09-01T13:28:33', '2022-09-01 15:50:54.000', '2022-09-01T15:51:07', '2022-09-01 12:24:26.000','2022-09-01T12:24:07' ], 
        'year': [ '2022', '2022', '2022', '2022', '2022', '2022', '2022', '2022', '2022', '2022','2022' ],
        'month': [9,9,9,9,9,9,9,9,9,9,9 ],
        'hour': [ 11,13,11,10,17,16,13,16,16,13,12 ],
        'eventcode': [ 'J', 'J', 'J', 'J', 'J', 'J', 'J', 'J', 'J', 'J','J' ],
        'eventnumber': [ '21', '21', '2', '21', '2', '21', '21', '2', '21', '2','21' ]}

df =  pd.DataFrame(data, columns= ['city', 'date', 'year',  'month', 'hour', 'eventcode','eventnumber' ])
print (df) 

In this dataset, when looking at the city and date, event 2 occurs 1 time before event 21, and event 21 occurs 3 times before event 2.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The code below works for counting the number of occurences of event 21 happening before 2. But when I switch the 2 and 21 in the code, it results in an infinite loop…

bc=np.array(df['city'])
un_bc,bc_index,bc_count=np.unique(bc,return_counts=True,return_index=True)
new_df=pd.DataFrame()
count=0

for i,j in zip(bc_index,bc_count):
    j=j+i-1
    while i+1 <= j:
        if df.iat[i,7]==21 and df.iat[i+1,7]==2:
            count +=1
            new_df=new_df.append(df[i:i+2])
        i +=1
        
print(count)

Code above works, code below does not.

for i,j in zip(bc_index,bc_count):
    j=j+i-1
    while i+1 <= j:
        if df.iat[i,7]==2 and df.iat[i+1,7]==21:
            count +=1
            new_df=new_df.append(df[i:i+2])
        i +=1
        
print(count)

>Solution :

One solution could be as follows:

df['date'] = pd.to_datetime(df['date'])

res = df.sort_values('date').astype({'eventnumber':int})\
    .groupby('city')['eventnumber'].diff().value_counts()
res.index = res.index.map({-19:'21_before_2', 19: '2_before_21'})

print(res)

21_before_2    3
2_before_21    1
Name: eventnumber, dtype: int64

Or, if you don’t want to turn df['date'] into datetime, use .sort_values(['year', 'month', 'hour']).

Explanation

  • First, use pd.to_datetime to turn date strings into datetime.
  • Next, 1) sort the df on date (using df.sort_values), 2) turn eventnumber strings into ints (using df.astype), 3) apply df.groupby on column city, 4) retrieve diff, and 5) retrieve Series.value_counts.
  • The result will be a count for -19 (21 before 2) and 19 (2 before 21), with all cities with just one value being dropped. These values will end up as the index, so we can use map to assign the appropriate labels.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading