Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Efficient way to set all empty lists in column to None in Pandas

In pandas, it is possible to do the following:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 99919 entries, 0 to 99918
Data columns (total 47 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   reason                        99919 non-null  object   

Most of these entries contain [], and some contain an actual object, e.g. [{'a':'x', 'b':'y', 'c':'z'}]

I want to efficiently (without using df.apply) set all entries that contain only the empty list to e.g. None.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I thought that I might be able to do something like the following:

df[df['reasons'].str.len() == 0]['reasons'] = None

but this does not work since it is setting it on a copy of the dataframe (see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy)

Usually in the case where I set something on df[row][column], I can instead do:

df.loc[:, (row, column)]

to set it on the actual dataframe, but that does not work in this case since df['reasons'].str.len() == 0 returns a series which is unhashable and not a valid argument for loc.

Is there any way to do this without using apply?

>Solution :

The syntax should be:

df.loc[df['reasons'].str.len() == 0, 'reasons'] = None

The correct use of loc is loc[row, col], not loc[:, (row, col)]

df.loc[:, (X, Y)] can be used if you have a MultiIndex (see below for an example), but this is not the case here.

df = pd.DataFrame(None, index=range(2),
                  columns=pd.MultiIndex.from_product([['A', 'B'], [1, 2]]))
df.loc[:, ('A', 1)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading