Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace MultiIndex names for the nth position in all levels

I have about 300 csv files that I want to merge into a single one using pandas. All of them have 3 rows for variable names (project in the first one, Device_1 in the second one and variable in the third one).
The first column goes something like (‘Asset’,’Element’,’Date’) and continues the timeseries values. Sometimes the names are in other languages, but there are 100 files that have their first row as (‘ts’,nan,nan), so I need to replace the first "column" of the MultiIndex names in order to be able to sort it by date after I finished merging the files.

df.iloc[:3,:5]
Out[16]: 
                    ts       Asset_1                                             
                   nan      Device_1                        Device_2     Device_3
                   nan     Variable_1         Variable_2  Variable_1    Variable_1
0  2022-12-31 00:00:00         0.0                NaN          0.0          0.0
1  2022-12-31 00:05:00         0.0                NaN          0.0          0.0
2  2022-12-31 00:10:00         0.0                NaN          0.0          0.0

So basically what I want to have is the first "column" of the multiindex names the same as my new_cols values

                 Asset       Asset_1                                             
                 Device      Device_1                        Device_2     Device_3
                 Date       Variable_1         Variable_2  Variable_1    Variable_1
0  2022-12-31 00:00:00         0.0                NaN          0.0          0.0
1  2022-12-31 00:05:00         0.0                NaN          0.0          0.0
2  2022-12-31 00:10:00         0.0                NaN          0.0          0.0

So far I’ve tried iterating through the three levels:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_cols = ['Asset','Element','Date']
for i in range(3):
    df.rename(columns={df.columns[0][i]:new_cols[i]},inplace=True)

However, as this method doesn’t replace by position and instead does it by name, as the second and third row have the same name (nan) the final names are (‘Asset’,’Date’,’Date’) and when I try to merge it with the other files the final dataframe has two datetime columns.

I tried using set_levels, but I have to put the verify_integrity to False because otherwise I receive an error:

new_cols = ['Asset','Element','Date']
for i in range(3):
updated_columns = [new_cols[i]] + list(df.columns.get_level_values(i)[1:])
df.columns.set_levels(updated_columns,level=i,verify_integrity=True)

ValueError: Level values must be unique: ['Asset', 'Asset_1','Asset_1','Asset_1','Asset_1'] on level 0

But then if I mark verify_integrity as False the final names get mixed up and I don’t know why.

Is there a way to replace any name in a MultiIndex with a given position? Something like iloc, for example, something like

pd.Multiindex.iloc[0,2]='Date'

>Solution :

Convert MultiIndex to list of tuples and set new values:

new_cols = ['Asset','Element','Date']

L = df.columns.tolist()
L[0] = tuple(new_cols)
print (L)
[('Asset', 'Element', 'Date'), 
 ('Asset_1', 'Device_1', 'Variable_1'), 
 ('Asset_1', 'Device_1', 'Variable_2'), 
 ('Asset_1', 'Device_2', 'Variable_1'), 
 ('Asset_1', 'Device_3', 'Variable_1')]

df.columns = pd.MultiIndex.from_tuples(L)
print (df)
                 Asset    Asset_1                                 
               Element   Device_1              Device_2   Device_3
                  Date Variable_1 Variable_2 Variable_1 Variable_1
0  2022-12-31 00:00:00        0.0        NaN        0.0        0.0
1  2022-12-31 00:05:00        0.0        NaN        0.0        0.0
2  2022-12-31 00:10:00        0.0        NaN        0.0        0.0

Or you can use helper DataFrame is also possible, but slowier:

new_cols = ['Asset','Element','Date']

df1 = df.columns.to_frame()
df1.iloc[0] = new_cols

df.columns = pd.MultiIndex.from_frame(df1, names=df.columns.names)
print (df)
                 Asset    Asset_1                                 
               Element   Device_1              Device_2   Device_3
                  Date Variable_1 Variable_2 Variable_1 Variable_1
0  2022-12-31 00:00:00        0.0        NaN        0.0        0.0
1  2022-12-31 00:05:00        0.0        NaN        0.0        0.0
2  2022-12-31 00:10:00        0.0        NaN        0.0        0.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading