Working with csv files in python – sort and fill values

January 26, 2022

I would like to populate the missing info: MissingAuthor and MissingLanguage based on BookID – if the BookID is the same copy and fill in the info from where you find it.

input CSV file:

input.csv
BookName,BookID,BookLocation,BookAuthor,BookLanguage
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English
BookC,15351,C126,Eric Matthes,LanguageMissing
BookD,15551,C446,AuthorMissing,English
BookE,15351,C126,AuthorMissing,English
BookF,15551,C446,Al Sweigart,LanguageMissing
BookG,21221,C556,AuthorMissing,English
BookH,21221,C556,Mark Lutz,English
BookI,14421,C656,Eric Matthes,English
BookJ,14421,C656,Eric Matthes,LanguageMissing
BookK,55521,C776,AuthorMissing,English
BookL,55521,C776,Mark Lutz,LanguageMissing
BookM,16721,C886,Al Sweigart,English
BookN,16721,C886,AuthorMissing,English
BookO,16721,C996,Al Sweigart,LanguageMissing

As an example:

input:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English

output:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,Guido van Rossum,English <--- AuthorMissing

Code:

import csv

with open('input.csv', 'r') as incsv:
    reader = csv.DictReader(incsv, delimiter=',')
    header = next(reader)
    sortedbyid = sorted(reader, key=lambda row: (row['BookID']), reverse=False) # sort by BookID
    for line in sortedbyid:
        print(line)
{'BookName': 'BookI', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'English'}
{'BookName': 'BookJ', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookC', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookE', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookD', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookF', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookM', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'English'}
{'BookName': 'BookN', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookO', 'BookID': '16721', 'BookLocation': 'C996', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookB', 'BookID': '19821', 'BookLocation': 'C216', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookG', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookH', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'English'}
{'BookName': 'BookK', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookL', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'LanguageMissing'}

I need to loop here, match the BookID, store the info if present, and fill it in somehow.

# create a new file
with open('output.csv', 'w') as outcsv:
    fieldnames = header
    writer = csv.DictWriter(outcsv, fieldnames=fieldnames, delimiter=',', lineterminator='\n')
    writer.writeheader()
    for row in sortedbyid:
        write.writerow(row)

>Solution :

You can loop through the books list, find the ones that have ‘AuthorMissing’, and for each of them, find the books that have the same ID but have an actual author:

for book in sortedbyid:
    if book['BookAuthor']=='AuthorMissing':
        matches = [ otherbook for otherbook in sortedbyid if book['BookID'] == otherbook['BookID'] and otherbook['BookAuthor'] != 'AuthorMissing' ]
        # Assuming no more than one match
        if len(matches)>0:
            book['BookAuthor'] = matches[0]['BookAuthor']

Same thing can be done for the language field.

for book in sortedbyid:
    if book['BookLanguage']=='LanguageMissing':
        matches = [ otherbook for otherbook in sortedbyid if book['BookID'] == otherbook['BookID'] and otherbook['BookLanguage'] != 'LanguageMissing' ]
        # Assuming no more than one match
        if len(matches)>0:
            book['BookLanguage'] = matches[0]['BookLanguage']

You can, of course, combine the two together in a single loop.