I am working in Python 3.11.1. I have data like these stored in a Pandas Dataframe:
ID Position Select
1 A 0
2 B 1
2 C 0
3 B 0
3 C 0
4 A 1
5 A 0
Where some ID’s are recorded in multiple rows, but others only appear in a single row. I need to subset this dataset by keeping all single ID rows coded 1 for Select AND keeping ALL multiple ID rows if ANY one of those multiple rows is coded 1 for Select for the same ID. The resulting dataset should look like:
ID Position Select
2 B 1
2 C 0
4 A 1
What is the best way to do this?
Ultimately, I then need to covert from long to wide form. Therefore, the final result should be:
ID Position1 Position2 Select
2 B C 1
4 A 1
Thanks in advance.
>Solution :
Try:
u = df.groupby("ID").filter(lambda x: (x.Select == 1).any()) # or just `lambda x: x.Select.any()` if there are only 0/1 values
print(u)
Prints:
ID Position Select
1 2 B 1
2 2 C 0
5 4 A 1
To long form:
u["col"] = u.groupby("ID").cumcount() + 1
s = u.groupby("ID")["Select"].any().astype(int)
u = u[["ID", "col", "Position"]].pivot(index="ID", columns="col")
u.columns = [f"{c[0]}{c[1]}" for c in u.columns]
print(pd.concat([u, s], axis=1).fillna("").reset_index())
Prints:
ID Position1 Position2 Select
0 2 B C 1
1 4 A 1