I’m creating a data frame and want to drop entries in it that are not relevant. I’m looking to drop the values that are not numbers.
I have created the data frame using the following code (credit):
import pandas as pd
import os
os.chdir('/pathdirectory/files')
csv_files = [f for f in os.listdir() if f.endswith('.csv')]
dfs = []
for csv in csv_files:
df = pd.read_csv(csv, header=None)
df = df.T
df.columns = ['DC energy', 'AC energy', 'Capacity factor', 'Inverter Loss']
dfs.append(df)
final_df = pd.concat(dfs, ignore_index=True)
final_df
And it returns this data frame. Obviously I want to remove the wording from the data frame but I am struggling with doing this.
Any help is greatly appreciated.
>Solution :
You should set the first columns of the CSVs as index:
pd.read_csv(csv, header=None, index_col=0)
Alternatively:
cols = ['DC energy', 'AC energy', 'Capacity factor', 'Inverter Loss']
final_df = pd.concat([pd.read_csv(csv, header=None, index_col=0)
for csv in csv_files],
axis=1, ignore_index=True).T.set_axis(cols)
Note that this assumes that all files have the same order of columns. You could also keep the default name:
final_df = pd.concat([pd.read_csv(csv, header=None, index_col=0)
for csv in csv_files],
axis=1, ignore_index=True).T