Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select top n groups in pandas dataframe

I have the following dataframe:

    Country Crop    Harvest Year    Area (ha)
Afghanistan Maize   2019    94910
Afghanistan Maize   2020    140498
Afghanistan Maize   2021    92144
Afghanistan Winter Wheat    2019    2334000
Afghanistan Winter Wheat    2020    2668000
Afghanistan Winter Wheat    2021    1833357
Argentina   Maize   2019    7232761
Argentina   Maize   2020    7730506
Argentina   Maize   2021    8146596
Argentina   Winter Wheat    2019    6050953
Argentina   Winter Wheat    2020    6729838
Argentina   Winter Wheat    2021    6394102
China   Maize   2019    41309740
China   Maize   2020    41292000
China   Maize   2021    43355859
China   Winter Wheat    2019    23732560
China   Winter Wheat    2020    23383000
China   Winter Wheat    2021    23571400
Ethiopia    Maize   2019    2274306
Ethiopia    Maize   2020    2363507
Ethiopia    Maize   2021    2530000
Ethiopia    Winter Wheat    2019    1789372
Ethiopia    Winter Wheat    2020    1829051
Ethiopia    Winter Wheat    2021    1950000
France  Maize   2019    1506100
France  Maize   2020    1691130
France  Maize   2021    1549520
France  Winter Wheat    2019    5244250
France  Winter Wheat    2020    4512420
France  Winter Wheat    2021    5276730
India   Maize   2019    9027130
India   Maize   2020    9569060
India   Maize   2021    9860000
India   Winter Wheat    2019    29318780
India   Winter Wheat    2020    31357020
India   Winter Wheat    2021    31610000
Namibia Maize   2019    21123
Namibia Maize   2020    35000
Namibia Maize   2021    46070
Namibia Winter Wheat    2019    1079
Namibia Winter Wheat    2020    2000
Namibia Winter Wheat    2021    3026

I want to select the top 2 countries by the average value of Area (ha) column across the `Harvest Year’s. I tried this but it does not work:

df = df.groupby("Crop", dropna=False).apply( lambda x: x.nlargest(2, "Area (ha)") )

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Output should be, here China and india are the countries with the largest average Area (ha) for both maize and Winter Wheat, but in the full datasets different countries would have largest values for different crops:

Country Crop    Harvest Year    Area (ha)
China   Maize   2019    41309740
China   Maize   2020    41292000
China   Maize   2021    43355859
China   Winter Wheat    2019    23732560
China   Winter Wheat    2020    23383000
China   Winter Wheat    2021    23571400
India   Maize   2019    9027130
India   Maize   2020    9569060
India   Maize   2021    9860000
India   Winter Wheat    2019    29318780
India   Winter Wheat    2020    31357020
India   Winter Wheat    2021    31610000

>Solution :

IIUC, you can do double .groupby:

x = (
    df.groupby("Crop")
    .apply(lambda x: x.groupby("Country")["Area (ha)"].mean())
    .stack()
    .groupby(level=0, group_keys=False)
    .nlargest(2)
)

print(x)

Prints top 2 Crop/Countries by average area:

Crop          Country
Maize         China      4.198587e+07
              India      9.485397e+06
Winter Wheat  India      3.076193e+07
              China      2.356232e+07
dtype: float64

Then you can use this index to filter the original dataframe:

out = df.set_index(["Crop", "Country"]).loc[x.index].reset_index()
print(out)

Prints:

            Crop Country  Harvest Year  Area (ha)
0          Maize   China          2019   41309740
1          Maize   China          2020   41292000
2          Maize   China          2021   43355859
3          Maize   India          2019    9027130
4          Maize   India          2020    9569060
5          Maize   India          2021    9860000
6   Winter Wheat   India          2019   29318780
7   Winter Wheat   India          2020   31357020
8   Winter Wheat   India          2021   31610000
9   Winter Wheat   China          2019   23732560
10  Winter Wheat   China          2020   23383000
11  Winter Wheat   China          2021   23571400
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading