Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

KeyError after renaming the first n column names of python pandas dataframe

We have a simple input file as in the picture. We load the csv input into pandas dataframe, and we want to rename the first n-th columns, in this example, the first three columns.

enter image description here

The code

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
file_path = r"C:\Codes\test\test_data.csv"
df1 = pd.read_csv(file_path)
print (df1, "\n", type(df1), "\n", df1.columns, "\n", type(df1.columns), "\n", df1.columns.values, "\n", type(df1.columns.values))

df2 = df1.copy()
print (df2, "\n", type(df2), "\n", df2.columns, "\n", type(df2.columns), "\n", df2.columns.values, "\n", type(df2.columns.values))

df2.columns.values[0:3] = ["symbol","field","abc"]  
print("\n after renaming the columns: ", df2)
print(df2["symbol"])

The result is as follows:

enter image description here

It seems that the code df2.columns.values[0:3] = ["symbol","field","abc"] is not stable. Sometimes it works and sometimes it does not, for example, it may report an "KeyError" or the code freeze when trying to display df2. I do understand why it does not work.
I am working on Windows 10, with Python 3.10.4

Of course, I can also write the following code, which works

df2.rename(columns={df2.columns[0]:  "symbol"},inplace=True)
df2.rename(columns={df2.columns[1]:  "field"},inplace=True)
df2.rename(columns={df2.columns[2]:  "abc"},inplace=True)

But my goal is to change the first n-th columns in a simple code.

>Solution :

You can try updating the column labels like this:

df2 = df2.rename(columns=dict(zip(list(df2.columns)[0:3], ["symbol","field","abc"])))

… or like this:

df2.columns = ["symbol","field","abc"] + list(df2.columns)[3:]

Output:

   COLA  COL_B  testC
0     1      2      3
1    10     11     12
 <class 'pandas.core.frame.DataFrame'>
 Index(['COLA', 'COL_B', 'testC'], dtype='object')
 <class 'pandas.core.indexes.base.Index'>
 ['COLA' 'COL_B' 'testC']
 <class 'numpy.ndarray'>
   COLA  COL_B  testC
0     1      2      3
1    10     11     12
 <class 'pandas.core.frame.DataFrame'>
 Index(['COLA', 'COL_B', 'testC'], dtype='object')
 <class 'pandas.core.indexes.base.Index'>
 ['COLA' 'COL_B' 'testC']
 <class 'numpy.ndarray'>

 after renaming the columns:
   symbol  field  abc
0       1      2    3
1      10     11   12
df2["symbol"]
0     1
1    10
Name: symbol, dtype: int64

Note that the docs for Index.values have a warning which reads:

We recommend using Index.array or Index.to_numpy(), depending on whether you need a reference to the underlying data or a NumPy array.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading