groupby mean of datetime64[ns] column

Advertisements

I have a datafrmame:

  • user_id object
  • local time datetime64[ns]
  • value int32
  user_id          local time  value
0   user1 2023-01-01 00:00:00      3
1   user1 2023-01-01 00:00:00      3
2   user1 2023-01-01 01:00:00      7
3   user1 2023-01-01 01:00:00      2
4   user2 2023-01-01 02:00:00      4
5   user2 2023-01-01 02:00:00     10
6   user2 2023-01-01 03:00:00      7
7   user2 2023-01-01 03:00:00      2

I want to:

  • groupby user_id
  • mean of cols: "local time" (only time HH:MM:SS, not datetime) and "value"
import pandas as pd
import numpy as np

# Set the random seed for reproducibility
np.random.seed(123)
# Define the number of users and values
num_users = 2
num_values = 4

# Generate the user IDs
user_ids = ['user{}'.format(i+1) for i in range(num_users)]

# Generate the local time values
local_time = pd.date_range(start='2023-01-01 00:00:00', periods=num_values, freq='H')

# Generate the random values
values = np.random.randint(1, 11, size=(num_values*num_users))

# Create the DataFrame
df = pd.DataFrame({ 
    'user_id': np.repeat(user_ids, num_values),
    'local time': np.repeat(local_time, num_users),
    'value': values})


# calculate the mean of local time TIME - NOT datetime.
print (df)

print("expected_output")


'''
      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75
'''
df.groupby('user_id').mean()

Expected Output:

I want the mena of the time (hour minutes and seoncds, not date) and mean of value, groupby user

      local time  value 
user1 00:30:00    3.75
user2 02:30:00   5.75

>Solution :

mean compute only numeric by default. Use numeric_only=False to compute the mean of local time:

However you can use:

>>> df.groupby('user_id', as_index=False).mean(numeric_only=False)

  user_id          local time  value
0   user1 2023-01-01 09:36:00    3.8
1   user2 2023-01-03 00:00:00    4.4
2   user3 2023-01-04 14:24:00    5.2

Update

I want the mean of Time, not dateime (so just hour, minute, seconds)

mean_time = lambda x: str(pd.to_timedelta(x.dt.time.astype(str)).mean())[7:]

out = (df.groupby('user_id', as_index=False)
         .agg({'local time': mean_time, 'value': 'mean'}))

  user_id local time  value
0   user1   00:30:00   3.75
1   user2   02:30:00   5.75

Leave a ReplyCancel reply