Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to summarize a column based on repeated values using a for loop?

I’m trying to sum the duration of the activities using a for loop with conditions.
My knowledge of for loops is lacking, and I’m unsure how to search the right keywords to get my answer.

A slice of my data frame looks like this : 
mydf= {'Duration' : [14, 8, 6, 36, 12, 5, 3, 2, 4, 5, 8, 3, 14, 1, 27, 25, 117, 2, 962, 2, 2, 1], 'Activity': ['Groom', 'Pause', 'Groom', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Awaken', 'Groom', 'Pause', 'Groom', 'Eat', 'Cuddle', 'Come down', 'Dig', 'Forage']}
df = pd.DataFrame(mydf)

I want to sum all the sleep duration together. Instead of Sleep appearing multiple times, just one "Sleep" with the summed duration.
However, the multiplied "Sleep" patterns occur several times in my dataframe, so I thought I should do a for loop, but I am unsure of my logic.

I tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Sleep_sum = [sum(df['Duration'] for i in df['Activity'] if [i+1]=='Sleep')] 

but this gives me an error : TypeError: can only concatenate str (not "int") to str.

I also tried this, but this looks very wrong.

(if df['Activity'] == 'Sleep' + df['Activity'] == 'Twitch').df['Duration'].sum()

Essentially, I need to sum the duration of ‘Sleep" if it’s followed by another row == "Sleep".

Thank you for your time!

>Solution :

You can create virtual groups to group consecutive activities (Sleep and other):

out = (df.groupby(df['Activity'].ne(df['Activity'].shift()).cumsum(), as_index=False)
         .agg({'Duration': 'sum', 'Activity': 'first'}))
print(out)

# Output
    Duration   Activity
0         14      Groom
1          8      Pause
2          6      Groom
3         92      Sleep
4          1     Awaken
5         27      Groom
6         25      Pause
7        117      Groom
8          2        Eat
9        962     Cuddle
10         2  Come down
11         2        Dig
12         1     Forage

Update

Just for Sleep activity:

m = df['Activity'] == 'Sleep'
sleep = (df.reset_index(drop=False)[m]
           .groupby(df['Activity'].ne(df['Activity'].shift()).cumsum())
           .agg({'index': 'first', 'Duration': 'sum', 'Activity': 'first'})
           .set_index('index'))

out = pd.concat([df[~m], sleep]).sort_index()
print(out)

# Output
    Duration   Activity
0         14      Groom
1          8      Pause
2          6      Groom
3         92      Sleep
13         1     Awaken
14        27      Groom
15        25      Pause
16       117      Groom
17         2        Eat
18       962     Cuddle
19         2  Come down
20         2        Dig
21         1     Forage
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading