Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting data from a log file

I have a log.txt file like this,

2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: channels=1, size=82434
2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: waiting asr-core ready: 12 secs
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state: START, true
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state 2: START
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: end of decoding 57 true 0
NEC Input :secure folder app close it
NEC Replacement suggestion :Secure folder
NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word  : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding 

Now I have written a code to extract the lines with beginning "NEC Input Before Replace" and "NEC Matching Word" such that my output.txt file looks like this

NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app

At first the code which I had written for the same was

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#!/usr/bin/env python
f = open('log.txt')
f1 = open('output.txt', 'a')

doIHaveToCopyTheLine=False

for line in f.readlines():

    if 'NEC Input Before Replace' in line:
        doIHaveToCopyTheLine=True
    elif 'NEC Matching Word' in line:
        doIHaveToCopyTheLine=True

    if doIHaveToCopyTheLine:
        f1.write(line)

f1.close()
f.close()

which was throwing me this error

UnicodeDecodeError                        Traceback (most recent call last)
Input In [3], in <cell line: 7>()
      3 f1 = open('output.txt', 'a')
      5 doIHaveToCopyTheLine=False
----> 7 for line in f.readlines():
      9     if 'NEC Input Before Replace' in line:
     10         doIHaveToCopyTheLine=True

File D:\Anaconda\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
     22 def decode(self, input, final=False):
---> 23     return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7524: character maps to <undefined>

So i changed the code to

#!/usr/bin/env python
f = open('log.txt','r',encoding='utf-8')
f1 = open('output.txt', 'a')

doIHaveToCopyTheLine=False

for line in f.readlines():

    if 'NEC Input Before Replace' in line:
        doIHaveToCopyTheLine=True
    elif 'NEC Matching Word' in line:
        doIHaveToCopyTheLine=True

    if doIHaveToCopyTheLine:
        f1.write(line)

f1.close()
f.close()

Though the file is opening currently but in the output I am getting this

NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word  : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding 

All the lines after my desired lines are also getting printed. Anyone knows why is this happening and how to fix this issue?

>Solution :

You reset doIHaveToCopyTheLine to False outside the loop, before you start iterating.

Instead, you need to do it inside the loop, at the beginning of every iteration.

But you may as well just do:

if 'NEC Input Before Replace' in line or 'NEC Matching Word' in line:
    f1.write(line)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading