Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

groupby mean of datetime64[ns] column

I have a datafrmame:

  • user_id object
  • local time datetime64[ns]
  • value int32
  user_id          local time  value
0   user1 2023-01-01 00:00:00      3
1   user1 2023-01-01 00:00:00      3
2   user1 2023-01-01 01:00:00      7
3   user1 2023-01-01 01:00:00      2
4   user2 2023-01-01 02:00:00      4
5   user2 2023-01-01 02:00:00     10
6   user2 2023-01-01 03:00:00      7
7   user2 2023-01-01 03:00:00      2

I want to:

  • groupby user_id
  • mean of cols: "local time" (only time HH:MM:SS, not datetime) and "value"
import pandas as pd
import numpy as np

# Set the random seed for reproducibility
np.random.seed(123)
# Define the number of users and values
num_users = 2
num_values = 4

# Generate the user IDs
user_ids = ['user{}'.format(i+1) for i in range(num_users)]

# Generate the local time values
local_time = pd.date_range(start='2023-01-01 00:00:00', periods=num_values, freq='H')

# Generate the random values
values = np.random.randint(1, 11, size=(num_values*num_users))

# Create the DataFrame
df = pd.DataFrame({ 
    'user_id': np.repeat(user_ids, num_values),
    'local time': np.repeat(local_time, num_users),
    'value': values})


# calculate the mean of local time TIME - NOT datetime.
print (df)

print("expected_output")


'''
      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75
'''
df.groupby('user_id').mean()

Expected Output:

I want the mena of the time (hour minutes and seoncds, not date) and mean of value, groupby user

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75

>Solution :

mean compute only numeric by default. Use numeric_only=False to compute the mean of local time:

However you can use:

>>> df.groupby('user_id', as_index=False).mean(numeric_only=False)

  user_id          local time  value
0   user1 2023-01-01 09:36:00    3.8
1   user2 2023-01-03 00:00:00    4.4
2   user3 2023-01-04 14:24:00    5.2

Update

I want the mean of Time, not dateime (so just hour, minute, seconds)

mean_time = lambda x: str(pd.to_timedelta(x.dt.time.astype(str)).mean())[7:]

out = (df.groupby('user_id', as_index=False)
         .agg({'local time': mean_time, 'value': 'mean'}))

  user_id local time  value
0   user1   00:30:00   3.75
1   user2   02:30:00   5.75
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading