Removing unwanted values from a Pandas data frame

September 12, 2023

I’m creating a data frame and want to drop entries in it that are not relevant. I’m looking to drop the values that are not numbers.

I have created the data frame using the following code (credit):

import pandas as pd
import os

os.chdir('/pathdirectory/files')
csv_files = [f for f in os.listdir() if f.endswith('.csv')]

dfs = []

for csv in csv_files:
    df = pd.read_csv(csv, header=None)
    df = df.T
    df.columns = ['DC energy', 'AC energy', 'Capacity factor', 'Inverter Loss']
    dfs.append(df)

final_df = pd.concat(dfs, ignore_index=True)
final_df

And it returns this data frame. Obviously I want to remove the wording from the data frame but I am struggling with doing this.

Any help is greatly appreciated.

>Solution :

You should set the first columns of the CSVs as index:

pd.read_csv(csv, header=None, index_col=0)

Alternatively:

cols = ['DC energy', 'AC energy', 'Capacity factor', 'Inverter Loss']
    
final_df = pd.concat([pd.read_csv(csv, header=None, index_col=0)
                      for csv in csv_files],
                     axis=1, ignore_index=True).T.set_axis(cols)

Note that this assumes that all files have the same order of columns. You could also keep the default name:

final_df = pd.concat([pd.read_csv(csv, header=None, index_col=0)
                      for csv in csv_files],
                     axis=1, ignore_index=True).T