Suppose my data frame df has the following column names: ['date', 'value', '20211010', '20211017', '20211024', ...]
I want to rename the column names of '20211010', '20211017', '20211024', ... (that is, all the columns starting from 20211010) to t1, t2, t3, ... continue to increase.
The expected new column names will be ['date', 'value', 't1', 't2', 't3', ...].
How to achieve this in Pandas? Thanks.
Reference:
how do i rename the columns from unnamed:0 to columns with increment number in python
>Solution :
IIUC, a robust method could be to use pandas.to_datetime and pandas.factorize:
idx, _ = pd.factorize(pd.to_datetime(df.columns, format='%Y%m%d',
errors='coerce'),
sort=True)
df.columns = ('t'+pd.Series(idx+1, dtype='str')).mask(idx<0, df.columns)
Example output:
Index(['date', 'value', 't1', 't2', 't4', 'other', 't3'], dtype='object')
Input columns:
Index(['date', 'value', '20211010', '20211017', '20211024', 'other',
'20211018'],
dtype='object')
robustness
to_datetime ensures that valid dates are used, and sort=True in factorize enables to keep the dates sorted.
Example on this input:
['X', '20211010', '20229999', '20211018', '20211024', 'Y', '20211001']
The output would be:
['X', 't2', '20229999', 't3', 't4', 'Y', 't1']
The invalid date is ignored and the tn are in order.