Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

re.findall not reading a file

I have a simple regex script to find IPs from a text file and add them to a list.

import re


pattern = '^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$'
ip_list = []
ifilepath = input('Please specify full file path location' + '\n')
with open(ifilepath) as inputf:
    ips = re.findall(pattern, inputf.read())
    print(ips) ##Just to test if re.findall is matching against the file
    print(ifilepath)
    for ip in ips:
        ip_list.append(ip)

print('IPs matching the regex pattern: ')
print(ip_list)
print('\n')

After running, the output that I am seeing:

Please specify full file path location
C:\Users\Samson\Desktop\IP.txt

[]
C:\Users\Samson\Desktop\IP.txt
IPs matching the regex pattern: 
[]

It seems that the re.findall() method is not matching against the file, similar script with match method works. A bit of a head scratcher – what am I missing here?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Sample input text file

192.168.0.1 proxy123 10.10.0.1
192.168.0.2 httpstatus=404 proxy_result=block 10.10.0.2
192.163.0.3 %%%
192.168.0.4
abcde
%&&%#(%#(%#

>Solution :

You need to remove the ^ and $ anchors which only are true at the start and end of a string (or line with re.M set).

Consider:

>>> print(t)
192.168.0.1 proxy123 10.10.0.1
192.168.0.2 httpstatus=404 proxy_result=block 10.10.0.2
192.163.0.3 %%%
192.168.0.4
abcde
%&&%#(%#(%#'

Your pattern does not find any matches since there are multiple lines:

>>> re.findall('^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$', t)
[]

The '192.168.0.4' would be found if you added the re.M flag (and there are no trailing whitespace in that line):

>>> re.findall(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$', t, flags=re.M)
['192.168.0.4']

vs

>>> re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', t)
['192.168.0.1', '10.10.0.1', '192.168.0.2', '10.10.0.2', '192.163.0.3', '192.168.0.4']

Your pattern does work if you break up the lines into substrings first:

>>> pattern=r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$'
>>> [s for s in t.split() if re.match(pattern, s)]
['192.168.0.1', '10.10.0.1', '192.168.0.2', '10.10.0.2', '192.163.0.3', '192.168.0.4']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading