Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter dataframe using multiple values

As the title says, I have a dataframe and I want to filter (keep) rows where the id variable is either 1 or 2.

An example:

using DataFrames

# eaxample data set
df1 = DataFrame(id = repeat(1:3, 3),
name = repeat(["bob", "jane", "steve"], inner = 3))


# filtering based on a single id - works fine 
kp_id = 1
df1[df1.id .== 1, :]

# filter on multipel id's - my attempt returns an empty dataframe
kp_id = (1,2)
df1[df1.id .== in(df1.id, kp_id), :]

Any advice would be much appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

It comes down to writing a boolean expression that generates the desired BitVector. While you work on this, skip the indexing first and just work on

df1.id .== 1

Then a useful tool in julia is to consider the single case first:

julia> in(1, (1,2))
true
julia> in(3, (1,2))
false

so far so good! Lets broadcast this, but only on the first argument, so we protect the second argument with a Ref

julia> in.(df1.id, Ref((1,2)))
9-element BitVector:
 1
 1
 0
 1
 1
 0
 1
 1
 0

thus

df1[in.(df1.id, Ref(kp_id)), :]

will work.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading