I was couriose that is any way we can use these initially generated column names by Pandas while reading a csv/Text files like as follows
df = pd.read_csv("some_text_file.txt", header = None)
which will give something like
0 1 2
0 data1 data2 data3
1 r2 data1 r2 data2 r2 data3
When we used header = None it genarated some column names as = 0 1 2 by default.
When I try to acces them like
--> df['0'] = sometask
It throws error
raise KeyError(key) from err
KeyError: '0'
Aren’t they column names at all?. I’ve seen some people calling them as Levels. Like
level0 - column 0
level1 - column 1
level2 - column 2
I’ve also tried
--> df[level0] = sometask
it throwed
NameError: name 'level0' is not definedNameError: name 'level0' is not defined
I know we have to rename the column names and use them like
df.columns =['col1','col2'.....]
But, Wondering there is any way we can these pandas genarated column names without renamaing them as shown above.
>Solution :
Inside pd.read_csv, you can pass a list to the names parameter. E.g.:
df = pd.read_csv('some_text_file.txt', header=None,
names=[f'col_{i}' for i in range(1,4)])
print(df)
col_1 col_2 col_3
0 data1 data2 data3
1 r2 data1 r2 data2 r2 data3
Note that the list of names cannot contain any duplicates (e.g. ['col', 'col', 'col2'] will cause an error).
The default col "names" 0,1,2 etc. are integers, rather than strings. You can check this as follows:
print(df.columns)
Int64Index([0, 1, 2], dtype='int64')
E.g. to access column 0, you should use df[0] or df.loc[:,0], not df['0'] etc.