Home Creating subsets of df using pandas groupby and getting a value based on a function

Questions

Creating subsets of df using pandas groupby and getting a value based on a function

February 22, 2022

I have df similar to below. I need to select rows where df['Year 2'] is equal or closest to df['Year'] in subsets grouped by df['ID'] so in this example rows 1,2 and 5.

df
    Year     ID   A   Year 2      C
0    2020    12   0    2019       0
1    2020    12   0    2020       0 <-
2    2017    10   1    2017       0 <-
3    2017    10   0    2018       0
4    2019    6    0    2017       0
5    2019    6    1    2018       0 <-

I am trying to achieve that with the following piece of code using group by and passing a function to get the proper row with the closest value for both columns.

df1 = df.groupby(['ID']).apply(min(df['Year 2'], key=lambda x:abs(x-df['Year'].min())))

This particular line returns 'int' object is not callable. Any ideas how to fix this line of code or a fresh approach to the problem is appreciated.

TYIA.

>Solution :

You can subtract both columns by Series.sub, convert to absolute and aggregate indices by minimum values by DataFrameGroupBy.idxmin:

idx = df['Year 2'].sub(df['Year']).abs().groupby(df['ID']).idxmin()

If need new column filled by boolean use Index.isin:

df['new'] = df.index.isin(idx)
print (df)
   Year  ID  A  Year 2  C    new
0  2020  12  0    2019  0  False
1  2020  12  0    2020  0   True
2  2017  10  1    2017  0   True
3  2017  10  0    2018  0  False
4  2019   6  0    2017  0  False
5  2019   6  1    2018  0   True

If need filter rows use DataFrame.loc:

df1 = df.loc[idx]
print (df1)
   Year  ID  A  Year 2  C
5  2019   6  1    2018  0
2  2017  10  1    2017  0
1  2020  12  0    2020  0

One row solution:

df1 = df.loc[df['Year 2'].sub(df['Year']).abs().groupby(df['ID']).idxmin()]

pandas

byMR

Published February 22, 2022

Add a comment

Count Distinct Occurrences of Field, Group By Another Field

byMR

February 22, 2022

Questions

Find the maximum number in an R dataframe column of strings

byMR

February 22, 2022

Questions

search for a specific property in array of objects and return boolean

byMR

February 22, 2022

Questions

Numpy comparing vectors of length m and n resulting in boolean matrix of size m,n

byMR

February 22, 2022

Questions

how can I change props of component that pass from higher component . reactjs

byMR

February 22, 2022

Questions

Code gives missing 1 required positional argument error

byMR

February 22, 2022

Creating subsets of df using pandas groupby and getting a value based on a function

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Count Distinct Occurrences of Field, Group By Another Field

Find the maximum number in an R dataframe column of strings

search for a specific property in array of objects and return boolean

Numpy comparing vectors of length m and n resulting in boolean matrix of size m,n

how can I change props of component that pass from higher component . reactjs

Code gives missing 1 required positional argument error

Keep Up to Date with the Most Important News

Creating subsets of df using pandas groupby and getting a value based on a function

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Count Distinct Occurrences of Field, Group By Another Field

Find the maximum number in an R dataframe column of strings

search for a specific property in array of objects and return boolean

Numpy comparing vectors of length m and n resulting in boolean matrix of size m,n

how can I change props of component that pass from higher component . reactjs

Code gives missing 1 required positional argument error

Discover more from Dev solutions