Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum of individual labels over a month of granular data

I have a dataframe which contains life logging data gathered over several years from 44 unique individuals.

Int64Index: 77171 entries, 0 to 4279
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   start      77171 non-null  datetime64[ns]
 1   end        77171 non-null  datetime64[ns]
 2   labelName  77171 non-null  category      
 3   id         77171 non-null  int64         

The start column contains granular datetimes of the format 2020-11-01 11:00:00, in intervals of 30 minutes. The labelName column has 14 different categories.

Categories (14, object): ['COOK', 'EAT', 'GO WALK', 'GO TO BATHROOM', ..., 'DRINK', 'WAKE UP', 'SLEEP', 'WATCH TV']

Here’s a sample user’s head, which is [2588 rows x 4 columns] and spans from 2020 to 2021. There are also gaps in the data, occasionally.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                  start                 end       labelName   id
0   2020-08-05 00:00:00 2020-08-05 00:30:00  GO TO BATHROOM  486
1   2020-08-05 06:00:00 2020-08-05 06:30:00         WAKE UP  486
2   2020-08-05 09:00:00 2020-08-05 09:30:00            COOK  486
3   2020-08-05 11:00:00 2020-08-05 11:30:00             EAT  486
4   2020-08-05 12:00:00 2020-08-05 12:30:00             EAT  486
..                  ...                 ...             ...  ...
859 2021-03-10 12:30:00 2021-03-10 13:00:00  GO TO BATHROOM  486
861 2021-03-10 13:30:00 2021-03-10 14:00:00  GO TO BATHROOM  486
862 2021-03-10 18:30:00 2021-03-10 19:00:00            COOK  486
864 2021-03-11 08:00:00 2021-03-11 08:30:00             EAT  486
865 2021-03-11 12:30:00 2021-03-11 13:00:00            COOK  486

I want a sum of each unique labelNames per user per month, but I’m not sure how to do this.

I would first split the data frame by id, which is easy. But how do you split these start datetimes when it records every 30 minutes over several years of data— and then create 14 new columns which record the sums?

The final data frame might look something like this (with fake values):

user month SLEEP WATCH TV
486 jun20 324 23
486 jul20 234 12

The use-case for this data frame is training a few statistical and machine-learning models.

How do I achieve something like this?

>Solution :

Because there are 30 minutes data you can count them by crosstab per months by months periods by Series.dt.to_period and then multiple by 0.5 for output in hours:

If start is 2020-09-30 23:30:00 and end is 2020-10-01 00:00:00 then if need count this record for October use df['end'] in crosstab, if for September use df['start'] .

df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

df1 = (pd.crosstab([df['id'], df['end'].dt.to_period('m')], df['labelName']).mul(0.5)
        .rename_axis(columns=None, index=['id','month'])
        .rename(columns=str)
        .reset_index()
        .assign(month=lambda x:x['month'].dt.strftime('%b%Y')))
print (df1)
    id    month  COOK  EAT  GO TO BATHROOM  SLEEP  WAKE UP
0  650  Sep2020   0.0  0.0             1.0    0.5      1.0
1  650  Mar2021   0.5  1.0             0.5    0.5      0.0

For ouput in 30 minutes:

df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

df = (pd.crosstab([df['id'], df['end'].dt.to_period('m')], df['labelName'])
        .rename_axis(columns=None, index=['id','month'])
        .reset_index()
        .assign(month=lambda x:x['month'].dt.strftime('%b%Y')))
print (df)

    id    month  COOK  EAT  GO TO BATHROOM  SLEEP  WAKE UP
0  650  Sep2020     0    0               2      1        2
1  650  Mar2021     1    2               1      1        0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading