Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I add a list to a column in pandas?

I’m trying to merge the columns kw1, kw2, kw3 shown here:
enter image description here

and have it in one separate column called keywords. This is what I tried:

df['keywords'] = list((df['kw1'], df['kw2'], df['kw3']))
df

but I’m getting this error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ValueError                                Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 df['keywords'] = list((df['kw1'], df['kw2'], df['kw3']))
      2 df

File /lib/python3.10/site-packages/pandas/core/frame.py:3655, in DataFrame.__setitem__(self, key, value)
   3652     self._setitem_array([key], value)
   3653 else:
   3654     # set column
-> 3655     self._set_item(key, value)

File /lib/python3.10/site-packages/pandas/core/frame.py:3832, in DataFrame._set_item(self, key, value)
   3822 def _set_item(self, key, value) -> None:
   3823     """
   3824     Add series to DataFrame in specified column.
   3825 
   (...)
   3830     ensure homogeneity.
   3831     """
-> 3832     value = self._sanitize_column(value)
   3834     if (
   3835         key in self.columns
   3836         and value.ndim == 1
   3837         and not is_extension_array_dtype(value)
   3838     ):
   3839         # broadcast across multiple columns if necessary
   3840         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File /lib/python3.10/site-packages/pandas/core/frame.py:4535, in DataFrame._sanitize_column(self, value)
   4532     return _reindex_for_setitem(value, self.index)
   4534 if is_list_like(value):
-> 4535     com.require_length_match(value, self.index)
   4536 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File /lib/python3.10/site-packages/pandas/core/common.py:557, in require_length_match(data, index)
    553 """
    554 Check the length of data matches the length of the index.
    555 """
    556 if len(data) != len(index):
--> 557     raise ValueError(
    558         "Length of values "
    559         f"({len(data)}) "
    560         "does not match length of index "
    561         f"({len(index)})"
    562     )

ValueError: Length of values (3) does not match length of index (141)

Is there a way to make it so that it turns it into a list like this [{value of kw1}, {value of kw2}, {value of kw3}]

>Solution :

You can do it like this

df['keywords'] = np.stack([df['kw1'], df['kw2'], df['kw3']], axis=1).tolist()

Pandas treats each element in the outermost list as a single value, so it complains that you only has three values (which are your three series) while you need 141 values for a new column since your original frame has 141 rows.

Stacking the underlying numpy arrays of the three series on the last dimension gives you a shape (141,3) and converting them to list gives you a list of length 141, with each element being another list of length 3.

A more concise way is to extract three columns as another df and let pandas do the stacking for you

df['keywords'] = df[['kw1', 'kw2', 'kw3']].values.tolist()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading