This may seem like a strange way to deal with CSV file, but it is a study task: I need to open csv file as a text, read lines, create a list and then create a pandas df using that list
import pandas as pd
with open ('file.csv', 'r') as f:
lst = f.readlines()
for idx, line in enumerate(lst):
lst[idx] = line.strip('\n')
header = lst[0].replace('"', '').split(",")
for idx, line in enumerate(lst[1:]):
lst[idx] = line.split(',')
df = pd.DataFrame(data = lst, columns = header)
ValueError: 5 columns passed, passed data had 39 columns
It crashes, because pd.Dataframe adds (?) a bunch of Nones at the end of each row
I checked it, when tried to run this without specifying ‘columns’
Please help me to understand where this Nones come from
>Solution :
The issue you’re encountering is related to how you’re processing the CSV file lines and subsequently trying to construct a Pandas DataFrame. Let’s break down the steps and see where the problem might be:
- Reading the File: You correctly open and read the lines from the CSV
file, storing them in a list. - Stripping Newline Characters: You remove the newline characters from
each line. This is also done correctly. - Processing the Header: You correctly process the header, but the
replacement of double quotes (") is not always necessary unless you
are sure your header contains double quotes. - Processing the Data Rows: Here’s where the issue likely originates.
You’re iterating over lst[1:] but assigning the split lines back to
lst[idx]. Because lst[1:] is shorter than lst, this doesn’t
overwrite all the entries in lst. As a result, the original, unsplit
lines from lst remain in your list, leading to more columns than
expected when you create the DataFrame.
import pandas as pd with open('file.csv', 'r') as f: lines = f.readlines() # Remove newline characters and strip quotes if needed lines = [line.strip('\n').replace('"', '') for line in lines] # Split the header header = lines[0].split(',') # Split the data rows data = [line.split(',') for line in lines[1:]] # Create the DataFrame df = pd.DataFrame(data, columns=header)
This script should correctly process the CSV file into a DataFrame. If your CSV contains quoted fields with commas inside, this simple split approach may not work correctly, and you might need to use a CSV parser, like the one built into Pandas (pandas.read_csv()) or Python’s csv module.