Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unexpected output from pandas' DataFrameGroupBy.diff function

Consider the following piece of python code, which is essentially copied from the first code insert in the Transformation section of pandas‘ user guide’s Group by: split-apply-combine chapter.

import pandas as pd
import numpy as np

speeds = pd.DataFrame(
    data = {'class': ['bird', 'bird', 'mammal', 'mammal', 'mammal'],
            'order': ['Falconiformes', 'Psittaciformes', 'Carnivora', 'Primates', 'Carnivora'],
            'max_speed': [389.0, 24.0, 80.2, np.NaN, 58.0]},
    index = ['falcon', 'parrot', 'lion', 'monkey', 'leopard']
)

grouped = speeds.groupby('class')['max_speed']
grouped.diff()

When executed in Google Colab, the output is:

falcon       NaN
parrot    -365.0
lion         NaN
monkey       NaN
leopard      NaN
Name: max_speed, dtype: float64

This is the same output as shown in the user guide.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Why is the value corresponding to the parrot index element -365.0 rather than NaN like the rest of the values in this Series?

>Solution :

The output is correct and expected. Here is a breakdown of what is does for clarity:

falcon       NaN                 # NaN since first of the "bird" group
parrot    -365.0                 # 24 - 389   = -365
lion         NaN                 # NaN since first of the "mammal" group
monkey       NaN                 # NaN - 80.2 = NaN
leopard      NaN                 # 58 - NaN   = NaN
Name: max_speed, dtype: float64

If you replace the NaN in the input by a valid value (e.g. 42), you will get:

alcon       NaN                 # NaN since first of the "bird" group
parrot    -365.0                 # 24 - 389   = -365
lion         NaN                 # NaN since first of the "mammal" 
monkey     -38.2                 # 42 - 80.2 = -38.2
leopard     16.0                 # 58 - 38.2 = 16
Name: max_speed, dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading