Generator always returning same value

Advertisements

I have a function that reads a file line by line and returns it as a list of the words. Since the file is very large, i would like to make it a generator.

Here is the function:

def tokenize_each_line(file):
   with open(file, 'r') as f:
      for line in f:
         yield line.split()

However, everytime i call next(tokenize_each_line()), it always returns the first line of the file. I guess this is not the expected behavior for generators. Instead, i’d like the function to return the next line.

>Solution :

Calling the function tokenize_each_line() returns a newly-initialized generator. So next(tokenize_each_line()) initializes a generator and makes it yield its first item (the first line of the file).

Instead, initialize the generator, hold a reference to it, and call next on it according to your requirements.

For example:

gen = tokenize_each_line('myfile.txt')

# just as an example of how you might want to use the generator
words = []
while len(words) < 1000:
    words += next(gen)

Leave a ReplyCancel reply