Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

String in set gives weird results

My code is reading the header of a csv file and converting that to a lookup table of column_name=>column_index:

class CSVOutput:
  def __init__(self, csv_file, required_columns):
    csv_reader = csv.reader(csv_file)

    # Construct lookup table for header
    self.header = {}
    for idx, column in enumerate(next(csv_reader)):
      print(f"{column.lower().strip()} == key: {column.lower().strip() == 'key'}")
      print(f"{column.lower().strip()} is key: {column.lower().strip() is 'key'}")
      self.header[column.lower().strip()] = idx

    print(self.header)

     # Load the row data into memory/index it against key
     key_idx = self.header['key']

with open("test.csv") as csv_file:
    data = CSVOutput(csv_file, {})

When I run this, I get the following output and error:

{'key': 0, 'col1': 1, 'col2': 2}

key == key: False
key is key: False
col1 == key: False
col1 is key: False
col2 == key: False
col2 is key: False

Traceback (most recent call last):
  File "D:\compare.py", line 74, in <module>
    actual_data = CSVOutput(act_csv, required_columns)
  File "D:\compare.py", line 40, in __init__
    key_idx = self.header['key']
KeyError: 'key'

Basically there seems to be an inequivalence between the literal ‘key’ and the ‘key’ that’s loaded from the file. I’ve tried looking at the source file in notepad++ with show all symbols on, but I’m not seeing any difference. I’ve also just had a look at the csv file in a hex editor and I can see the start looks like this: Key,  being EF BB BF. I’m not sure if that’s the source of my problem, but if it is, why isn’t strip() getting rid of it, and how do I handle that?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any ideas?

>Solution :

EF BB BF

This is UTF-8 BOM, you might use utf-8-sig encoding to deal with such files. Use encoding of open function following way

with open("test.csv",encoding="utf-8-sig") as csv_file:
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading