I want to go through large CSV files and if there is missing data I want to remove that row completely, This is only row specific so if there is a cell that = 0 or has no value then I want to remove the entire row. I want this to happen for all the columns so if any column has a black cell it should delete the row, and return the corrected data in a corrected csv.
import csv
with open('data.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print(row)
if not row[0]:
print("12")
This is what I found and tried but it doesnt not seem to be working and I dont have any ideas about how to aproach this problem, help please?
Thanks!
>Solution :
Due to the way in which CSV reader presents rows of data, you need to know how many columns there are in the original CSV file. For example, if the CSV file content looks like this:
1,2
3,
4
Then the lists return by iterating over the reader would look like this:
['1','2']
['3','']
['4']
As you can see, the third row only has one column whereas the second row has 2 columns albeit that one is (effectively) empty.
This function allows you to either specify the number of columns (if you know them before hand) or allow the function to figure it out. If not specified then it is assumed that the number of columns is the greatest number of columns found in any row.
So…
import csv
DELIMITER = ','
def fix_csv(input_file, output_file, cols=0):
if cols == 0:
with open(input_file) as indata:
for row in csv.reader(indata, delimiter=DELIMITER):
cols = max(len(row), cols)
with open(input_file) as indata, open(output_file, 'w') as outdata:
for row in csv.reader(indata, delimiter=DELIMITER):
for col in row:
if not col.strip():
break # empty cell
try:
if float(col) == 0:
break # cell is numeric but zero
except ValueError:
pass
else:
if len(row) == cols:
print(DELIMITER.join(row), file=outdata)
fix_csv('foo_original.csv', 'foo_fixed.csv')