How to convert a csv-file to a dictionnary of lists with python?

July 11, 2022

I’m trying to have this kind of result :

Output_Screenshot

Here is the csv-file :

OsmID,NewName,IdLocal

1020287758,NN1,Id0001

1021229973,NN2,Id0002

1025409497,NN3,Id0003

I’m using the code below:

import csv

input = r'C:\Users\_M92\csvFiles\csv0001.csv'

fileRead = open(input, 'r')

with open(input, 'r') as csv:
    headerLine = fileRead.readline()
    header = headerLine.split(",")  
    #print(header)
    nameIndex = header.index("OsmID")    
    output = {}
    for line in fileRead.readlines():
        values = line.split(",")
        output[values[nameIndex]] = values

print(output)

And it results in the following error:

File "c:\Users\_M92\Scripts\CsvToDict.py", 
    
    line 19, in <module>
        nameIndex = header.index("OsmID")

  ValueError: 'OsmID' is not in list

>Solution :

Instead of manually splitting each line by commas, use the CSV module that you’ve imported. This module contains a DictReader class that will yield dictionaries for each row. Then, you just need to add this to your output dictionary.

# Create an empty dictionary
# We will add keys to this as needed
output = {}
# Keep track of number of rows, so we can add an empty column if needed
row_count = 0

# This function adds a row to the output dictionary
def add_row(row_dict):
    global row_count # Need to declare this as global because we're assigning to the variable in this function
    if not row_dict: return # If row is empty, do nothing
    for k, v in row_dict.items():
        # Loop over all key-value pairs in the row to add
        if k not in output: # If the output doesn't contain this column, create a blank column
            output[k] = [None] * row_count 
   
        output[k].append(v) # Append the value to the correct column in output

    row_count += 1 

input_file = r'C:\Users\_M92\csvFiles\csv0001.csv'
with open(input_file, 'r') as fh:
    reader = csv.DictReader(fh) # Create a DictReader
    for row in reader:
        add_row(row) # Add every row to the output

This gives the following output:

{'OsmID': ['1020287758', '1021229973', '1025409497'], 
 'NewName': ['NN1', 'NN2', 'NN3'], 
 'IdLocal': ['Id0001', 'Id0002', 'Id0003']}

Note: I removed the blank lines in the input csv you provided, but it doesn’t make a difference to the program, since a blank line will yield an empty dictionary from DictReader, and add_row doesn’t do anything with empty dicts

Note 2: You could discard the row_count variable if you dynamically count the number of rows like so:

def add_row(row_dict):
    row_count = 0
    for first_key, first_val in output.items():
        row_count = len(first_val)
        break # We can just break out here because all keys should have the same number of values
    
    # Create keys that do not yet exist in output but do exist in the new row
    existing_keys = set(output.keys())
    new_row_keys = set(row_dict.keys())
    keys_to_create = new_row_keys - existing_keys 
    for key in keys_to_create:
        output[key] = [None] * row_count

    # Append to each column in output
    for key in output:
        output[key].append(row_dict.get(key, None)) # If the key doesn't exist in the current row, append None