I’m struggling in Python (Pandas) with a way to fill empty rows from one column based in the following example:
| email | run | other cols ....
| cris@gmail.com | 12345 |
| patty@gmail.com | 134254 |
| rick@outlook.com | 23232 |
| rick@outlook.com | |
| | 134254 |
| | 134254 |
| cris@gmail.com | |
due I have other columns, the rows aren’t duplicates, so I would like to fill the empty rows depending if I have the same information in other rows like this:
| email | run | other cols ....
| cris@gmail.com | 12345 |
| patty@gmail.com | 134254 |
| rick@outlook.com | 23232 |
| rick@outlook.com | 23232 |
| patty@gmail.com | 134254 |
| patty@gmail.com | 134254 |
| cris@gmail.com | 12345 |
Anyone could help me please?
>Solution :
You can perform several groupby:
out = df.assign(run=df['run'].fillna(df.groupby('email')['run'].transform('first')),
email=df['email'].fillna(df.groupby('run')['email'].transform('first'))
)
Using a helper function:
def fill_from(target, group, df=df):
return df[target].fillna(df.groupby(group)[target].transform('first'))
out = df.assign(run=fill_from('run', 'email'), email=fill_from('email', 'run'))
Output:
email run other cols
0 cris@gmail.com 12345.0 NaN
1 patty@gmail.com 134254.0 NaN
2 rick@outlook.com 23232.0 NaN
3 rick@outlook.com 23232.0 NaN
4 patty@gmail.com 134254.0 NaN
5 patty@gmail.com 134254.0 NaN
6 cris@gmail.com 12345.0 NaN