Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group the dataframe based on ids and stick the values of ids to each other with mean of the last days

I have a data frame, I want to group the data based on the ids and then stick the values for each id together. However, since the number of the ids are not equal, I have to fill the missing values for those ids with mean of the day before.

I have provided an example as below, for example, for id=1, id=2 there is two dates, but for id=3, there are three days.

df = pd.DataFrame()
df['id'] = [1, 1, 2,2, 3, 3, 3]
df['date'] = ['2019-01-01', '2019-01-03', '2019-01-01','2019-01-02', '2019-01-01',   '2019-01-02','2019-01-03']
df['val1']  = [10, 100, 20, 30, 40, 50, 60]
df['val2']  = [30, 30, -20, -30, -40,-50, -60 ]
df['val3']  = [50, 10, 120, 300, 140, 150, 160]

I have tried with the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

DF_sticked            = df.filter(regex='val\d+', axis=1).groupby(df['id'])\
                         .apply(np.ravel).apply(pd.Series).rename(lambda x: f"val{x}", axis=1).reset_index().fillna(0)

But, in the above code, I fill the missing value with 0. And also the missing value are at the end of the columns.
The output that I want is as follow. As you can see, the val 4, val5, val 6 of id=1 is same as the mean of the day before. or the val 7, 8, 9 of id=2 is equal to the mean of the day before (mean of 30, -30, 300)

enter image description here

Thank you so much.

>Solution :

You can actually pivot the dataframe, then then groupy the dates, transform('mean'), and ffill:

tmp = df.pivot(index='id', columns='date').swaplevel(axis=1).sort_index(axis=1)
tmp[tmp.isna()] = tmp.groupby(level=0, axis=1).transform('mean').ffill(axis=1)
tmp = tmp.astype(int).droplevel(0, axis=1)

# Fix the columns
tmp = tmp.set_axis(np.arange(1, tmp.shape[1] + 1).astype(str), axis=1).add_prefix('val')

Output:

>>> tmp
    val1  val2  val3  val4  val5  val6  val7  val8  val9
id                                                      
1     10    30    50    30    30    30   100    30    10
2     20   -20   120    30   -30   300   100   100   100
3     40   -40   140    50   -50   150    60   -60   160
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading