Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas DataFrame shows cells to be strings, but returns an error when I try to split cells

I have a Pandas DataFrame df, with a column df['auc_all'] which contains a tuple with two values (e.g. (0.54, 0.044))

When I run:

type(df['auc_all'][0])
>>> str

Yet, when I run:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def convert_str_into_tuple(self, string):
    splitted_tuple = string.split(',')
    value1 = float(splitted_tuple[0][1:])
    value2 = float(splitted_tuple[1][1:-1])
    return (value1, value2)

df['auc_all'] = df['auc_all'].apply(convert_str_into_tuple)

I get the following error:

df = full_df.create_full()
Traceback (most recent call last):
    
  File "<ipython-input-437-34fc05204bad>", line 18, in create_full
    df['auc_all'] = df['auc_all'].apply(self.convert_str_into_tuple)

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\apply.py", line 1043, in apply
    return self.apply_standard()

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\apply.py", line 1099, in apply_standard
    mapped = lib.map_infer(

  File "pandas\_libs\lib.pyx", line 2859, in pandas._libs.lib.map_infer

  File "<ipython-input-437-34fc05204bad>", line 63, in convert_str_into_tuple
    splitted_tuple = string.split(',')

AttributeError: 'tuple' object has no attribute 'split'

This seems to indicate that the cell holds a tuple.

However:

df['auc'][0][0]
>>> '('

It seems as if the variable type changes based on where I use it. Is this actually happening?

>Solution :

If your column contains tuples as string, use pd.eval:

df['auc_all'] = pd.eval(df['auc_all'])

Example:

# df = pd.DataFrame({'auc_all': ['(0.54, 0.044)']})
>>> df
         auc_all
0  (0.54, 0.044)

>>> type(df['auc_all'][0])
str


# df['auc_all'] = pd.eval(df['auc_all'])
>>> df
         auc_all
0  [0.54, 0.044]

>>> type(df['auc_all'][0])
list

The drawback is your tuple is converted as a list but you can use literal_eval from ast module:

# import ast
# df['auc_all'] = df['auc_all'].apply(ast.literal_eval)
>>> df
         auc_all
0  (0.54, 0.044)

>>> type(df['auc_all'][0])
tuple
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading