I have a simple regex script to find IPs from a text file and add them to a list.
import re
pattern = '^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$'
ip_list = []
ifilepath = input('Please specify full file path location' + '\n')
with open(ifilepath) as inputf:
ips = re.findall(pattern, inputf.read())
print(ips) ##Just to test if re.findall is matching against the file
print(ifilepath)
for ip in ips:
ip_list.append(ip)
print('IPs matching the regex pattern: ')
print(ip_list)
print('\n')
After running, the output that I am seeing:
Please specify full file path location
C:\Users\Samson\Desktop\IP.txt
[]
C:\Users\Samson\Desktop\IP.txt
IPs matching the regex pattern:
[]
It seems that the re.findall() method is not matching against the file, similar script with match method works. A bit of a head scratcher – what am I missing here?
Sample input text file
192.168.0.1 proxy123 10.10.0.1
192.168.0.2 httpstatus=404 proxy_result=block 10.10.0.2
192.163.0.3 %%%
192.168.0.4
abcde
%&&%#(%#(%#
>Solution :
You need to remove the ^ and $ anchors which only are true at the start and end of a string (or line with re.M set).
Consider:
>>> print(t)
192.168.0.1 proxy123 10.10.0.1
192.168.0.2 httpstatus=404 proxy_result=block 10.10.0.2
192.163.0.3 %%%
192.168.0.4
abcde
%&&%#(%#(%#'
Your pattern does not find any matches since there are multiple lines:
>>> re.findall('^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$', t)
[]
The '192.168.0.4' would be found if you added the re.M flag (and there are no trailing whitespace in that line):
>>> re.findall(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$', t, flags=re.M)
['192.168.0.4']
vs
>>> re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', t)
['192.168.0.1', '10.10.0.1', '192.168.0.2', '10.10.0.2', '192.163.0.3', '192.168.0.4']
Your pattern does work if you break up the lines into substrings first:
>>> pattern=r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$'
>>> [s for s in t.split() if re.match(pattern, s)]
['192.168.0.1', '10.10.0.1', '192.168.0.2', '10.10.0.2', '192.163.0.3', '192.168.0.4']