Find starting point to read csv data

I am given a csv file everyday that contains data to be used for calculations. I want to use pandas to organize the data. My issue is that the data to be read does not start on the first line of the file. The data also does not start on the same exact line every time so I cannot use the skiprows parameter in read_csv() method.

The data does have some indicators as to where the data begins.

For example this is how the beginning of my csv file would look. I am only interested in starting at the first column header which is ‘Deal Type’:

Counterparty Name
ID Number

.
.
.

Asset
USD.HO
USD.LCO
USD.RB

Cpty:
Product:

[Deal Type] [column] ... ... ...
[here the data begins]

How could I parse through the file and find the first column header and start at that line? The column header ‘Deal Type’ is always the first column.

>Solution :

Read the file line by line until it finds the line containing the target header, and then it will use pd.read_csv() with skiprows parameter to start reading the file from that line.

import pandas as pd

def find_starting_line(file_path, target_header):
    with open(file_path, 'r') as file:
        for line_num, line in enumerate(file):
            if target_header in line:
                return line_num

def read_csv_with_header(file_path, header):
    starting_line = find_starting_line(file_path, header)
    if starting_line is not None:
        return pd.read_csv(file_path, skiprows=starting_line, header=None)
    else:
        print(f"Header '{header}' not found in the file.")
        return None

# Example usage
file_path = 'your_file.csv'
header_to_find = 'Deal Type'
df = read_csv_with_header(file_path, header_to_find)
if df is not None:
    print(df)

Leave a Reply