Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I can't work out what this sort of search is called

I am sure what I want to do must be a standard thing for some people, and probably has a special name (groupby, pivot, melt, etc) only whatever the name is I don’t know so I can’t seem to make progress.

In summary:

Given a frame DF1 with columns A, B and C, I would like to be able to get a new frame DF2 which consists of all rows "x" of DF1 for which the value of A (call it “a[x]”) is such that there is “some row y” (x may or may not equal y) containing the triplet of values (a[y] ,b[y],c[y]) such that a[x]==“a[y]” and b[y]=“bbb”.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If that sounds too confusing, suppose that the dataframe was:

DF1 =
Person Possession Cost
Geoff Cat 3
Sue Cat 4
Geoff Dog 10
Geoff Cat 11
Mary Cat 1
Tom Dog 55

I.e. in comparison to the general statement, we have A=person, B=Possesion, C=cost.

Then, given a particular possession, such as "Cat",
I want to be able to retrieve the new frame:

DF2 =
Person Possession Cost
Geoff Cat 3
Sue Cat 4
Geoff Dog 10
Geoff Cat 11
Mary Cat 1

which is every line of DF1 for which the "Person" is a "Cat owner" (which in this example happens to be everyone except "Tom").

My initial idea was to filter on "Cat" to get the cat owners, but this removes the line "Geoff Dog 10".

I don’t want to write slow iteration (though I could).

Google doesn’t help me as I can’t find a way of phrasing what I want to do that’s succincter than the above.

It must be a standard thing to want to do, though, so I am hoping that someone more experienced will say "it’s just a XXXXXX problem" where XXXXX is a good specialist search term for me to get further.

>Solution :

I don’t know the name, but the way I’ve been doing this is getting the possible values:

df.loc[ df['Possession'] == 'Cat', 'Person' ]
0    Geoff
1      Sue
3    Geoff
4     Mary
Name: Person, dtype: object

Then using that with .isin()

df.loc[ df['Person'].isin( df.loc[ df['Possession'] == 'Cat', 'Person' ] ) ]
  Person Possession  Cost
0  Geoff        Cat     3
1    Sue        Cat     4
2  Geoff        Dog    10
3  Geoff        Cat    11
4   Mary        Cat     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading