I have a dataframe that looks like this:
| Step | Text | Parameter |
|---|---|---|
| 15 | 1 | |
| 16 | control | 2 |
| 17 | printout | 3 |
| 18 | print2 | 1 |
| 19 | Nan | 2 |
| 20 | Nan | 3 |
| 21 | Nan | 4 |
| 22 | Nan | 1 |
| 23 | Nan | 2 |
| 24 | Nan | 1 |
And I want my dataframe to look like this:
| Step | Text | Parameter |
|---|---|---|
| 15 | 1 | |
| 15 | 2 | |
| 15 | 3 | |
| 16 | control | 1 |
| 16 | control | 2 |
| 17 | control | 3 |
| 17 | control | 4 |
| 18 | printout | 1 |
| 18 | printout | 2 |
| 19 | print2 | 1 |
So basically when I have "1" in Parameter column, I need the next value from Step and Text.
Any ideas?:)
>Solution :
You can use repeat on a custom group:
# ensure NaN
df['Text'] = df['Text'].replace('Nan', pd.NA)
# get the number of rows per group starting with 1
n = df.groupby(df['Parameter'].eq(1).cumsum()).size()
# repeat the index of the non NaN values as many times
idx = df['Text'].dropna().index.repeat(n)
# replace the values ignoring the index
# (using the underlying numpy array)
df[['Step', 'Text']] = df.loc[idx, ['Step', 'Text']].to_numpy()
output:
Step Text Parameter
0 15 print 1
1 15 print 2
2 15 print 3
3 16 control 1
4 16 control 2
5 16 control 3
6 16 control 4
7 17 printout 1
8 17 printout 2
9 18 print2 1