Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Working with csv files in python – sort and fill values

I would like to populate the missing info: MissingAuthor and MissingLanguage based on BookID – if the BookID is the same copy and fill in the info from where you find it.

input CSV file:

input.csv
BookName,BookID,BookLocation,BookAuthor,BookLanguage
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English
BookC,15351,C126,Eric Matthes,LanguageMissing
BookD,15551,C446,AuthorMissing,English
BookE,15351,C126,AuthorMissing,English
BookF,15551,C446,Al Sweigart,LanguageMissing
BookG,21221,C556,AuthorMissing,English
BookH,21221,C556,Mark Lutz,English
BookI,14421,C656,Eric Matthes,English
BookJ,14421,C656,Eric Matthes,LanguageMissing
BookK,55521,C776,AuthorMissing,English
BookL,55521,C776,Mark Lutz,LanguageMissing
BookM,16721,C886,Al Sweigart,English
BookN,16721,C886,AuthorMissing,English
BookO,16721,C996,Al Sweigart,LanguageMissing

As an example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

input:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,AuthorMissing,English

output:
BookA,19821,C216,Guido van Rossum,English
BookB,19821,C216,Guido van Rossum,English <--- AuthorMissing

Code:

import csv

with open('input.csv', 'r') as incsv:
    reader = csv.DictReader(incsv, delimiter=',')
    header = next(reader)
    sortedbyid = sorted(reader, key=lambda row: (row['BookID']), reverse=False) # sort by BookID
    for line in sortedbyid:
        print(line)
{'BookName': 'BookI', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'English'}
{'BookName': 'BookJ', 'BookID': '14421', 'BookLocation': 'C656', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookC', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'Eric Matthes', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookE', 'BookID': '15351', 'BookLocation': 'C126', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookD', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookF', 'BookID': '15551', 'BookLocation': 'C446', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookM', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'English'}
{'BookName': 'BookN', 'BookID': '16721', 'BookLocation': 'C886', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookO', 'BookID': '16721', 'BookLocation': 'C996', 'BookAuthor': 'Al Sweigart', 'BookLanguage': 'LanguageMissing'}
{'BookName': 'BookB', 'BookID': '19821', 'BookLocation': 'C216', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookG', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookH', 'BookID': '21221', 'BookLocation': 'C556', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'English'}
{'BookName': 'BookK', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'AuthorMissing', 'BookLanguage': 'English'}
{'BookName': 'BookL', 'BookID': '55521', 'BookLocation': 'C776', 'BookAuthor': 'Mark Lutz', 'BookLanguage': 'LanguageMissing'}

I need to loop here, match the BookID, store the info if present, and fill it in somehow.

# create a new file
with open('output.csv', 'w') as outcsv:
    fieldnames = header
    writer = csv.DictWriter(outcsv, fieldnames=fieldnames, delimiter=',', lineterminator='\n')
    writer.writeheader()
    for row in sortedbyid:
        write.writerow(row)

>Solution :

You can loop through the books list, find the ones that have ‘AuthorMissing’, and for each of them, find the books that have the same ID but have an actual author:

for book in sortedbyid:
    if book['BookAuthor']=='AuthorMissing':
        matches = [ otherbook for otherbook in sortedbyid if book['BookID'] == otherbook['BookID'] and otherbook['BookAuthor'] != 'AuthorMissing' ]
        # Assuming no more than one match
        if len(matches)>0:
            book['BookAuthor'] = matches[0]['BookAuthor']

Same thing can be done for the language field.

for book in sortedbyid:
    if book['BookLanguage']=='LanguageMissing':
        matches = [ otherbook for otherbook in sortedbyid if book['BookID'] == otherbook['BookID'] and otherbook['BookLanguage'] != 'LanguageMissing' ]
        # Assuming no more than one match
        if len(matches)>0:
            book['BookLanguage'] = matches[0]['BookLanguage']

You can, of course, combine the two together in a single loop.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading