Hi i have a dataset in csv format. the issue with that data set is that it combine different csv files. and from the other csv files it also copied the column names. now i want to remove all column name which is in the middle of dataset
current csv file is like
col1 col2 col3
1 2 3
1 2 3
1 2 3
1 2 3
col1 col2 col3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
col1 col2 col3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
col1 col2 col3
col1 col2 col3
1 2 3
1 2 3
1 2 3
1 2 3
want to change it to this column name only on the top
col1 col2 col3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
>Solution :
you can use this line of codes:
df = pd.read_csv(df_path)
# removing repeating headers
df = df[df.ne(df.columns).any(1)]
This solution compares each row with actual columns and works regardless of non-column rows being number or not