Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create column with mean of rows before the date of current row

I need get mean of rows, but with date before the current row date.

I have this code, but is it takes a long time in datasets with 50k rows:

import pandas as pd

data = {
  'id': [1,2,3,4,5],
  'home_goals': [1,0,3,1,2],
  'away_goals': [1,1,2,0,1],
  'home_name': ['a','b','a','b','a'],
  'away_name': ['b','a','b','a','b'],
  'date': ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05'],
}

df = pd.DataFrame(data=data)
for i, row in df.iterrows():
  rows_before_date = df[df['date'] < row['date']]
  home_matches = rows_before_date[rows_before_date['home_name'] == row['home_name']]
  away_matches = rows_before_date[rows_before_date['away_name'] == row['away_name']]
  if len(home_matches) == 0 or len(away_matches) == 0: continue

  df.loc[i, 'home_mean'] = home_matches['home_goals'].sum() / len(home_matches)
  df.loc[i, 'away_mean'] = away_matches['away_goals'].sum() / len(away_matches)

I wanted to know if it is possible to make a more optimized and readable code.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Assuming prior sorting of the dates, you can use a shifted expanding.mean per group:

df['home_mean'] = (df.groupby('home_name')['home_goals']
                     .apply(lambda s: s.expanding().mean().shift())
                   )
df['away_mean'] = (df.groupby('away_name')['away_goals']
                     .apply(lambda s: s.expanding().mean().shift())
                  )

output:

   id  home_goals  away_goals home_name away_name        date  home_mean  away_mean
0   1           1           1         a         b  2020-01-01        NaN        NaN
1   2           0           1         b         a  2020-01-02        NaN        NaN
2   3           3           2         a         b  2020-01-03        1.0        1.0
3   4           1           0         b         a  2020-01-04        0.0        1.0
4   5           2           1         a         b  2020-01-05        2.0        1.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading