My google-fu is failing me. I have a simple dataframe that looks like this:
| Sample | Subject | Person | Place | Thing |
|---|---|---|---|---|
| 1-1 | Janet | |||
| 1-1 | Boston | |||
| 1-1 | Hat | |||
| 1-2 | Chris | |||
| 1-2 | Austin | |||
| 1-2 | Scarf |
I want the values in the subject column to move into their appropriate column so that I end up with something like this:
| Sample | Subject | Person | Place | Thing |
|---|---|---|---|---|
| 1-1 | Janet | Janet | Boston | Hat |
| 1-2 | Chris | Chris | Austin | Scarf |
I’ve looked at pivot and transpose, but those don’t seem right.
Any ideas would be appreciated! 🙂
>Solution :
If the groups are sorted and the pattern is always the same (no missing values), then reshape with numpy:
cols = ['Person', 'Place', 'Thing']
out = df.loc[::len(cols), ['Sample']].reset_index(drop=True)
out[cols] = df['Subject'].to_numpy().reshape(-1, len(cols))
For a more generic approach, only assuming that the categories are always in the same order within a group, identify the position per group with groupby.cumcount and map the names, then pivot:
order = ['Person', 'Place', 'Thing']
out = (df.assign(col=df.groupby('Sample').cumcount()
.map(dict(enumerate(order))))
.pivot(index='Sample', columns='col', values='Subject')
.reset_index().rename_axis(columns=None)
)
Variant with rename:
order = ['Person', 'Place', 'Thing']
out = (df.assign(col=df.groupby('Sample').cumcount())
.pivot(index='Sample', columns='col', values='Subject')
.rename(columns=dict(enumerate(order)))
.reset_index().rename_axis(columns=None)
)
Output:
Sample Person Place Thing
0 1-1 Janet Boston Hat
1 1-2 Chris Austin Scarf
Finally, if you really want the "Subject" column, insert it:
out.insert(1, 'Subject', out['Person'])
print(out)
Sample Subject Person Place Thing
0 1-1 Janet Janet Boston Hat
1 1-2 Chris Chris Austin Scarf