Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does pandas.concat() add (), to column name

I am trying to work out why the column names for pandas.concat() are in brackets.

There is a similar question here – but in my context I don’t understand how this can be hapenning. It is like there is a double bracket in the assignment, but given the concatenated dataframe looks fine I cannot understand what is causing it.

The output is below the code.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import warnings
import random
import pandas as pd # dataframe manipulation
import numpy as np # linear algebra
from sklearn.preprocessing import OneHotEncoder
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

url = 'https://raw.githubusercontent.com/bryonbaker/datasets/main/SIT720/Ass4/forestfires.csv'
full_df = pd.read_csv(url)
print(f"{full_df.head()}\n")

ohe = OneHotEncoder(handle_unknown='ignore', drop=None, dtype='int')

transformed = ohe.fit_transform(full_df[['month']])
month_df = pd.DataFrame(transformed.toarray())
month_df.columns = ohe.categories_

print(month_df.head())

full_df = full_df.drop(['month'], axis=1)

result = pd.concat([full_df, month_df], axis=1)
result.head()

The full output is:

   X  Y month  day  FFMC   DMC     DC  ISI  temp  RH  wind  rain  area
0  7  5   mar  fri  86.2  26.2   94.3  5.1   8.2  51   6.7   0.0   0.0
1  7  4   oct  tue  90.6  35.4  669.1  6.7  18.0  33   0.9   0.0   0.0
2  7  4   oct  sat  90.6  43.7  686.9  6.7  14.6  33   1.3   0.0   0.0
3  8  6   mar  fri  91.7  33.3   77.5  9.0   8.3  97   4.0   0.2   0.0
4  8  6   mar  sun  89.3  51.3  102.2  9.6  11.4  99   1.8   0.0   0.0

  apr aug dec feb jan jul jun mar may nov oct sep
0   0   0   0   0   0   0   0   1   0   0   0   0
1   0   0   0   0   0   0   0   0   0   0   1   0
2   0   0   0   0   0   0   0   0   0   0   1   0
3   0   0   0   0   0   0   0   1   0   0   0   0
4   0   0   0   0   0   0   0   1   0   0   0   0
X   Y   day FFMC    DMC DC  ISI temp    RH  wind    ... (dec,)  (feb,)  (jan,)  (jul,)  (jun,)  (mar,)  (may,)  (nov,)  (oct,)  (sep,)
0   7   5   fri 86.2    26.2    94.3    5.1 8.2 51  6.7 ... 0   0   0   0   0   1   0   0   0   0
1   7   4   tue 90.6    35.4    669.1   6.7 18.0    33  0.9 ... 0   0   0   0   0   0   0   0   1   0
2   7   4   sat 90.6    43.7    686.9   6.7 14.6    33  1.3 ... 0   0   0   0   0   0   0   0   1   0
3   8   6   fri 91.7    33.3    77.5    9.0 8.3 97  4.0 ... 0   0   0   0   0   1   0   0   0   0
4   8   6   sun 89.3    51.3    102.2   9.6 11.4    99  1.8 ... 0   0   0   0   0   1   0   0   0   0
5 rows Ă— 24 columns

>Solution :

The categories are stored in a list of arrays. When you make them column names, each name becomes a one-element tuple. Change this line:

month_df.columns = ohe.categories_

to:

month_df.columns = ohe.categories_[0]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading