Getting the "ValueError: Columns must be same length as key" when splitting out the date column by "/"

Advertisements

I tried to split the date column into 3 separate columns

df[['date1', 'date2', 'date3']] = df['Date'].str.split('/')

Here’s the error I’m getting

ValueError: Columns must be same length as key

It works fine when I just type in df[‘Date’].str.split(‘/’) and I was able to confirm that each list has only 3 elements.

I think it might have to do with some of the dates not having the full year, so the length of the column varies but I’m not sure why this would matter here.

df['Date'].value_counts()

22/05/2022    10
26/12/05      10
26/12/08      10
11/05/14      10
12/05/2019    10
              ..
28/02/05       1
14/09/2015     1
27/09/2015     1
28/09/2015     1
17/08/2015     1

df['Date'].str.len().value_counts()

8.0     4850
10.0    2280
Name: Date, dtype: int64

>Solution :

By default Series.str.split will return a Series (i.e. a single "column") with lists that contain the split elements).

df['Date'].str.split('/')

0    [22, 05, 2022]
1      [26, 12, 05]
2      [26, 12, 08]
3     [11, 5, 2014]
4     [12, 5, 2019]
5      [28, 02, 05]
6    [14, 09, 2015]
7    [27, 09, 2015]
8    [28, 09, 2015]
9    [17, 08, 2015]
Name: Date, dtype: object

So, the error you are getting is the result of trying to assign this single "column" to three new columns. To fix this, you need to set the expand parameter to True. This will expand the result into three different columns, which we can then assign as intended:

df[['date1', 'date2', 'date3']] = df['Date'].str.split('/', expand=True)

df

         Date  Count date1 date2 date3
0  22/05/2022     10    22    05  2022
1    26/12/05     10    26    12    05
2    26/12/08     10    26    12    08
3   11/5/2014     10    11     5  2014
4   12/5/2019     10    12     5  2019
5    28/02/05      1    28    02    05
6  14/09/2015      1    14    09  2015
7  27/09/2015      1    27    09  2015
8  28/09/2015      1    28    09  2015
9  17/08/2015      1    17    08  2015

Data used

import pandas as pd

data = {'Date': {0: '22/05/2022', 1: '26/12/05', 2: '26/12/08', 3: '11/5/2014', 
                 4: '12/5/2019', 5: '28/02/05', 6: '14/09/2015', 7: '27/09/2015', 
                 8: '28/09/2015', 9: '17/08/2015'}, 
        'Count': {0: 10, 1: 10, 2: 10, 3: 10, 4: 10, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}}

df = pd.DataFrame(data)

Leave a Reply Cancel reply