Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

ValueError: Length mismatch: Expected axis has X elements, new values have Y elements

I try to fill missing value with the most appeared one in its group .
Code :

f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
s = df.groupby('VehicleType')['FuelType'].transform(f)
df['FuelType']=df['FuelType'].fillna(s)

Error: ValueError: Length mismatch: Expected axis has 316879 elements, new values have 354369 elements

Data sample :enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Possible solutions: I think that maybe the VehicleType data has missing values, therefore it gives an error .Because when I use another column that has no missing values, it works. But I have to use VehicleType for this task .

>Solution :

This problem appears to have been fixed in newer versions of pandas. (Works without issue on 1.4.0). But for older versions of pandas…

The issue is caused by NaN values in your grouping column together with .transform. To get around this problem instead of grouping by the column name, group by the Series where you first .fillna() with some value that doesn’t occur in that column. This will succeed in assiging the NaN 'VehicleType' rows with the modal value for 'FuelType' among those NaN rows.

I’ll assign the result as a separate column below for illustration.

Sample data to reproduce problem

import pandas as pd
import numpy as np

df = pd.DataFrame({'VehicleType': ['a', 'b', 'c', 'a', np.NaN, np.NaN, np.NaN, 'a'],
                   'FuelType': ['Y', np.NaN, 'Y', 'X', 'Z', 'Z', 'Y', 'X']})
f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan    

df.groupby('VehicleType')['FuelType'].transform(f)
#ValueError: Length mismatch: Expected axis has 5 elements, new values have 8 elements

Solution

df['FuelType_mode'] = (df.groupby(df['VehicleType'].fillna('SPECIAL_MISSING'))
                         ['FuelType'].transform(f))

print(df)
  VehicleType FuelType FuelType_mode
0           a        Y             X
1           b      NaN           NaN
2           c        Y             Y
3           a        X             X
4         NaN        Z             Z
5         NaN        Z             Z
6         NaN        Y             Z
7           a        X             X

With newer versions of pandas the dropna arg can be used to specify whether you want to ignore NaN rows entirely when you group, or if you want to consider them their own unique group. Depending upon your desired behavior you would do:

# Still assigns NAN Vehicle Typethe modal Fuel Type. 
# Same logic as above
df['FT3'] = df.groupby('VehicleType', dropna=False)['FuelType'].transform(f)

# NAN Vehicle Types get NAN Fuel
df['FT4'] = df.groupby('VehicleType')['FuelType'].transform(f)


  VehicleType FuelType FuelType_mode  FT3  FT4
0           a        Y             X    X    X
1           b      NaN           NaN  NaN  NaN
2           c        Y             Y    Y    Y
3           a        X             X    X    X
4         NaN        Z             Z    Z  NaN
5         NaN        Z             Z    Z  NaN
6         NaN        Y             Z    Z  NaN
7           a        X             X    X    X
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading