Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python RegEx – Capture Varying Length of Characters

I’m trying to capture strings that do not have a standard form. Some come with two words, others three, and some even have a phrase. What I’ve been able to muster at this point is to capture up to two words only. Any help is appreciated.

str1 = 'File quarantined'
str2 = 'Unable to quarantine file'
str3 = 'Action Required - Restart the endpoint to finish cleaning the security threat'
str4 = 'Unable to upload file'
str5 = 'Unable to delete file'

The following is not working as expected since it only captures the first two words.

pattern = '\w+\s([^\s]+)([^\s]+)'
str2 = 'Unable to quarantine file'
res = re.search(pattern,str2)
print(res)

These strings come directly from the server. The RegEx needs to capture all the strings, whether it’s 2 words, 3, or more.
The strings are part of a list of long strings. The section I need is preceded by cs5=. A sample of said list of strings is provided below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

malware = ['Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File quarantined|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=313 rt=2022-12-21 08:44:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=66e9f4d4-df39-488d-8cf8-bdcf5d890598.tmp filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File passed|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=314 rt=2022-12-21 08:45:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=rev_shell.exe filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File cleaned|TROJ_GEN.R002C0DKG22|3|deviceExternalId=315 rt=2022-12-21 10:20:31 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1814500 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=1 fname=aowect.dll filePath=C:\\\\Users\\\\emil\\\\AppData\\\\Local\\\\Temp\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to upload file|TSC_GENCLEAN|3|deviceExternalId=316 rt=2022-12-21 13:37:42 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=Unable to clean file cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=Non confermato 184296.crdownload filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to delete file|Troj.Win32.TRX.XXPE50FFF063|3|deviceExternalId=317 rt=2022-12-21 13:37:49 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=pumpkin-2.7.3.exe filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ']

>Solution :

Use lookarounds to match the text between the keywords.

import re

pattern = re.compile(r'(?<=cs5=).*?(?=\s+cs6Label=)')
malware = ['Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File quarantined|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=313 rt=2022-12-21 08:44:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=66e9f4d4-df39-488d-8cf8-bdcf5d890598.tmp filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File passed|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=314 rt=2022-12-21 08:45:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=rev_shell.exe filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File cleaned|TROJ_GEN.R002C0DKG22|3|deviceExternalId=315 rt=2022-12-21 10:20:31 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1814500 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=1 fname=aowect.dll filePath=C:\\\\Users\\\\emil\\\\AppData\\\\Local\\\\Temp\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to upload file|TSC_GENCLEAN|3|deviceExternalId=316 rt=2022-12-21 13:37:42 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=Unable to clean file cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=Non confermato 184296.crdownload filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to delete file|Troj.Win32.TRX.XXPE50FFF063|3|deviceExternalId=317 rt=2022-12-21 13:37:49 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=pumpkin-2.7.3.exe filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ']
results=[]
for s in malware:
    m = pattern.search(s)
    if m:
        results.append(m.group())

print(results)

Output:

['File quarantined', 'File quarantined', 'File cleaned', 'Unable to clean file', 'File cleaned']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading