Advertisements

I have a dataframe that looks like this:

```
index key set_col data
0 "a1" ("a", "b") "a1_data"
1 "a2" ("j", "k", "l", "m") "a2_data"
2 "b1" ("z", "y", "x", "w", "v", "u", "t") "b1_data"
```

I need to split the `set_col`

, if the length of the set is higher than 3 elements and add it to a duplicated row, with the same data, resulting in this df:

```
index key set_col data
0 "a1" ("a", "b") "a1_data"
1 "a2" ("j", "k", "l") "a2_data"
2 "a2" ("m") "a2_data"
3 "b1" ("z", "y", "x") "b1_data"
4 "b1" ("w", "v", "u") "b1_data"
5 "b1" ("t") "b1_data"
```

I have read other answers using `explode`

, `replace`

or `assign`

, like this or this but neither handles the case for splitting lists or sets to a length and duplicating the rows.

On this answer I found the following code:

```
def split(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
```

And I try to apply to the columns like this:

```
df['split_set_col'] = df['set_col'].apply(split(df['set_col'], 3))
```

But i get the Error:

```
pandas.errors.SpecificationError: nested renamer is not supported
```

### >Solution :

Your function call is not right:

```
df['set_col'].apply(split(df['set_col'], 3))
```

Replace with:

```
df['set_col'].apply(split, n=3) # note the n=3 as named argument
```

The function also contains errors, use `np.array_split`

:

```
import numpy as np
def split(a, n):
return np.array_split(a, np.arange(0, len(a), n)[1:])
df['split_set_col'] = df['set_col'].apply(split, n=3)
```

Output:

```
>>> df.explode('split_set_col', ignore_index=True)
key set_col data split_set_col
0 "a1" (a, b) "a1_data" [a, b]
1 "a2" (j, k, l, m) "a2_data" [j, k, l]
2 "a2" (j, k, l, m) "a2_data" [m]
3 "b1" (z, y, x, w, v, u, t) "b1_data" [z, y, x]
4 "b1" (z, y, x, w, v, u, t) "b1_data" [w, v, u]
5 "b1" (z, y, x, w, v, u, t) "b1_data" [t]
```