Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

parsing long raw data in Python

I have a raw data file containig texts of many books. the data formatted as following (sequencial number,book name, text)

  • 0,book_a, " long content"
  • 1,book_b," long content"
  • 2,book_c," long content"

how to parser and load the books to a python dicationary
{'id': 0, 'book_name', 'text': "full text"}

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I assumed your Raw data file format is csv.

import csv

books_dict = {}

with open('books_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        # Assuming each row has the format: id,book_name, "text"
        book_id, book_name, text = row[0], row[1], row[2]

        # Removing leading/trailing whitespaces from book_name and text
        book_name = book_name.strip()
        text = text.strip()

        # Creating the dictionary entry
        books_dict[book_id] = {'book_name': book_name, 'text': text}

# Example usage
for book_id, book_info in books_dict.items():
    print(f"Book ID: {book_id}, Book Name: {book_info['book_name']}, Text: {book_info['text']}")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading