parsing long raw data in Python

December 15, 2023

I have a raw data file containig texts of many books. the data formatted as following (sequencial number,book name, text)

0,book_a, " long content"
1,book_b," long content"
2,book_c," long content"

how to parser and load the books to a python dicationary
{'id': 0, 'book_name', 'text': "full text"}

>Solution :

I assumed your Raw data file format is csv.

import csv

books_dict = {}

with open('books_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        # Assuming each row has the format: id,book_name, "text"
        book_id, book_name, text = row[0], row[1], row[2]

        # Removing leading/trailing whitespaces from book_name and text
        book_name = book_name.strip()
        text = text.strip()

        # Creating the dictionary entry
        books_dict[book_id] = {'book_name': book_name, 'text': text}

# Example usage
for book_id, book_info in books_dict.items():
    print(f"Book ID: {book_id}, Book Name: {book_info['book_name']}, Text: {book_info['text']}")