I am sure what I want to do must be a standard thing for some people, and probably has a special name (groupby, pivot, melt, etc) only whatever the name is I don’t know so I can’t seem to make progress.
In summary:
Given a frame DF1 with columns A, B and C, I would like to be able to get a new frame DF2 which consists of all rows "x" of DF1 for which the value of A (call it “a[x]”) is such that there is “some row y” (x may or may not equal y) containing the triplet of values (a[y] ,b[y],c[y]) such that a[x]==“a[y]” and b[y]=“bbb”.
If that sounds too confusing, suppose that the dataframe was:
DF1 =
Person Possession Cost
Geoff Cat 3
Sue Cat 4
Geoff Dog 10
Geoff Cat 11
Mary Cat 1
Tom Dog 55
I.e. in comparison to the general statement, we have A=person, B=Possesion, C=cost.
Then, given a particular possession, such as "Cat",
I want to be able to retrieve the new frame:
DF2 =
Person Possession Cost
Geoff Cat 3
Sue Cat 4
Geoff Dog 10
Geoff Cat 11
Mary Cat 1
which is every line of DF1 for which the "Person" is a "Cat owner" (which in this example happens to be everyone except "Tom").
My initial idea was to filter on "Cat" to get the cat owners, but this removes the line "Geoff Dog 10".
I don’t want to write slow iteration (though I could).
Google doesn’t help me as I can’t find a way of phrasing what I want to do that’s succincter than the above.
It must be a standard thing to want to do, though, so I am hoping that someone more experienced will say "it’s just a XXXXXX problem" where XXXXX is a good specialist search term for me to get further.
>Solution :
I don’t know the name, but the way I’ve been doing this is getting the possible values:
df.loc[ df['Possession'] == 'Cat', 'Person' ]
0 Geoff
1 Sue
3 Geoff
4 Mary
Name: Person, dtype: object
Then using that with .isin()
df.loc[ df['Person'].isin( df.loc[ df['Possession'] == 'Cat', 'Person' ] ) ]
Person Possession Cost
0 Geoff Cat 3
1 Sue Cat 4
2 Geoff Dog 10
3 Geoff Cat 11
4 Mary Cat 1