Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to convert a csv-file to a dictionnary of lists with python?

I’m trying to have this kind of result :

Output_Screenshot

Here is the csv-file :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

OsmID,NewName,IdLocal

1020287758,NN1,Id0001

1021229973,NN2,Id0002

1025409497,NN3,Id0003

I’m using the code below:

import csv

input = r'C:\Users\_M92\csvFiles\csv0001.csv'

fileRead = open(input, 'r')

with open(input, 'r') as csv:
    headerLine = fileRead.readline()
    header = headerLine.split(",")  
    #print(header)
    nameIndex = header.index("OsmID")    
    output = {}
    for line in fileRead.readlines():
        values = line.split(",")
        output[values[nameIndex]] = values

print(output)

And it results in the following error:

File "c:\Users\_M92\Scripts\CsvToDict.py", 
    
    line 19, in <module>
        nameIndex = header.index("OsmID")

  ValueError: 'OsmID' is not in list

>Solution :

Instead of manually splitting each line by commas, use the CSV module that you’ve imported. This module contains a DictReader class that will yield dictionaries for each row. Then, you just need to add this to your output dictionary.

# Create an empty dictionary
# We will add keys to this as needed
output = {}
# Keep track of number of rows, so we can add an empty column if needed
row_count = 0

# This function adds a row to the output dictionary
def add_row(row_dict):
    global row_count # Need to declare this as global because we're assigning to the variable in this function
    if not row_dict: return # If row is empty, do nothing
    for k, v in row_dict.items():
        # Loop over all key-value pairs in the row to add
        if k not in output: # If the output doesn't contain this column, create a blank column
            output[k] = [None] * row_count 
   
        output[k].append(v) # Append the value to the correct column in output

    row_count += 1 

input_file = r'C:\Users\_M92\csvFiles\csv0001.csv'
with open(input_file, 'r') as fh:
    reader = csv.DictReader(fh) # Create a DictReader
    for row in reader:
        add_row(row) # Add every row to the output

This gives the following output:

{'OsmID': ['1020287758', '1021229973', '1025409497'], 
 'NewName': ['NN1', 'NN2', 'NN3'], 
 'IdLocal': ['Id0001', 'Id0002', 'Id0003']}

Note: I removed the blank lines in the input csv you provided, but it doesn’t make a difference to the program, since a blank line will yield an empty dictionary from DictReader, and add_row doesn’t do anything with empty dicts

Note 2: You could discard the row_count variable if you dynamically count the number of rows like so:

def add_row(row_dict):
    row_count = 0
    for first_key, first_val in output.items():
        row_count = len(first_val)
        break # We can just break out here because all keys should have the same number of values
    
    # Create keys that do not yet exist in output but do exist in the new row
    existing_keys = set(output.keys())
    new_row_keys = set(row_dict.keys())
    keys_to_create = new_row_keys - existing_keys 
    for key in keys_to_create:
        output[key] = [None] * row_count

    # Append to each column in output
    for key in output:
        output[key].append(row_dict.get(key, None)) # If the key doesn't exist in the current row, append None

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading