I would like to populate the missing info: MissingAuthor and MissingLanguage based on BookID – if the BookID is the same copy and fill in the info from where you find it.
input CSV file:
input.csv
BookName,BookID,BookLocation,BookAuthor,BookLanguage
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English
BookC,15351,C126,Eric Matthes,LanguageMissing
BookD,15551,C446,AuthorMissing,English
BookE,15351,C126,AuthorMissing,English
BookF,15551,C446,Al Sweigart,LanguageMissing
BookG,21221,C556,AuthorMissing,English
BookH,21221,C556,Mark Lutz,English
BookI,14421,C656,Eric Matthes,English
BookJ,14421,C656,Eric Matthes,LanguageMissing
BookK,55521,C776,AuthorMissing,English
BookL,55521,C776,Mark Lutz,LanguageMissing
BookM,16721,C886,Al Sweigart,English
BookN,16721,C886,AuthorMissing,English
BookO,16721,C996,Al Sweigart,LanguageMissing
As an example:
input:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English
output:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,Guido van Rossum,English <--- AuthorMissing
Code:
import csv
with open('input.csv', 'r') as incsv:
reader = csv.DictReader(incsv, delimiter=',')
header = next(reader)
sortedbyid = sorted(reader, key=lambda row: (row['BookID']), reverse=False) # sort by BookID
for line in sortedbyid:
print(line)
{'BookName': 'BookI', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'English'}
{'BookName': 'BookJ', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookC', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookE', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookD', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookF', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookM', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'English'}
{'BookName': 'BookN', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookO', 'BookID': '16721', 'BookLocation': 'C996', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookB', 'BookID': '19821', 'BookLocation': 'C216', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookG', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookH', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'English'}
{'BookName': 'BookK', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookL', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'LanguageMissing'}
I need to loop here, match the BookID, store the info if present, and fill it in somehow.
# create a new file
with open('output.csv', 'w') as outcsv:
fieldnames = header
writer = csv.DictWriter(outcsv, fieldnames=fieldnames, delimiter=',', lineterminator='\n')
writer.writeheader()
for row in sortedbyid:
write.writerow(row)
>Solution :
You can loop through the books list, find the ones that have ‘AuthorMissing’, and for each of them, find the books that have the same ID but have an actual author:
for book in sortedbyid:
if book['BookAuthor']=='AuthorMissing':
matches = [ otherbook for otherbook in sortedbyid if book['BookID'] == otherbook['BookID'] and otherbook['BookAuthor'] != 'AuthorMissing' ]
# Assuming no more than one match
if len(matches)>0:
book['BookAuthor'] = matches[0]['BookAuthor']
Same thing can be done for the language field.
for book in sortedbyid:
if book['BookLanguage']=='LanguageMissing':
matches = [ otherbook for otherbook in sortedbyid if book['BookID'] == otherbook['BookID'] and otherbook['BookLanguage'] != 'LanguageMissing' ]
# Assuming no more than one match
if len(matches)>0:
book['BookLanguage'] = matches[0]['BookLanguage']
You can, of course, combine the two together in a single loop.