I’m learning python.
I’m trying to read in a text file (Linux: /etc/passwd)
I’ve taken the answer from here and tried to implement it with my learning’s. I know from all my readings many websites state to use readline() (helps with big files).
import pandas as pd
import numpy as np
def nonblank_lines(f):
rawline=f.readline()
while rawline != '':
line=rawline.rstrip()
print("#'#'#'#'#'",line)
if line:
yield line
rawline=f.readline()
filein="/etc/passwd"
datain=[]
columnnames=['Username','Password','UID','GID','Name of User','HOMEDIR','Login Shell']
with open(filein,'r') as passwdline:
print(f"passwdline: {passwdline}")
for line in nonblank_lines(passwdline):
print(f"Back from function ====, {line}")
datain.append(line.split(':'))
For reference I edited the /etc/passwd file and put in a blank line to test my program
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
However, my program keeps returning empty lines instead of each non-blank line in the /etc/passwd file:
#'#'#'#'#'
#'#'#'#'#'
#'#'#'#'#'
#'#'#'#'#'
#'#'#'#'#'
repeated continuously.
What am I doing wrong. My guess is it has to do with putting a readline after the yield statement, but I have no idea why it would be a problem. Any thoughts?
>Solution :
Your second call to rawline=f.readline() is indented too far, so if a line contains nothing but whitespace (so it passes the while test, but not the if line: test), you stop reading new lines, and just loop forever.
The minimalist solution is dedenting the line properly so it’s not controlled by the if statement:
def nonblank_lines(f):
rawline=f.readline()
while rawline != '':
line=rawline.rstrip()
print("#'#'#'#'#'",line)
if line:
yield line
rawline=f.readline()
but a better solution, available as of 3.8+, is to just loop over the file naturally:
def nonblank_lines(f):
for rawline in f:
line=rawline.rstrip()
print("#'#'#'#'#'",line)
if line:
yield line
and avoid calling .readline() at all.
Note: There is one behavioral difference from iterating the lines of the file like this vs. manual .readline() calls: Direct iteration, while faster, will prevent you from calling f.tell() on a text file (it’s costs something to maintain the state needed for an accurate .tell(), so for performance, direct iteration doesn’t do so, and just disables .tell()).
If you might need to call .tell() on the file, you can use 3.8+’s walrus operator (properly, the assignment expression, :=) with .readline() without doubling your calls to .readline(), risking mistakes like this (and similar ones, e.g. using continue and skipping the line read):
def nonblank_lines(f):
while rawline := f.readline():
line=rawline.rstrip()
print("#'#'#'#'#'",line)
if line:
yield line