Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how the get value of string with ignoring the whitespaces using regex in python

if I have this data

/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);
|           ; var int32_t var_324h @ ebp-0x324
|           ; arg int32_t arg_4h @ ebp+0x4
|           ; arg int32_t arg_8h @ ebp+0x8
|           0x004020b0      55             push ebp
|           0x004020b1      8bec           mov ebp, esp
|           0x004020b3      81ec24030000   sub esp, 0x324
|           0x004020b9      6a17           push 0x17                   ; 23
|           0x004020bb      ff151c304000   call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c
|           0x004020c1      85c0           test eax, eax
|       ,=< 0x004020c3      7407           je 0x4020cc
|       |   0x004020c5      b902000000     mov ecx, 2
|       |   0x004020ca      cd29           int 0x29
|       |   ; CODE XREF from fcn.004020b0 @ 0x4020c3
|       `-> 0x004020cc      a340744000     mov dword [0x407440], eax   ; [0x407440:4]=0
|           0x004020d1      890d3c744000   mov dword [0x40743c], ecx   ; [0x40743c:4]=0
|           0x004020d7      891538744000   mov dword [0x407438], edx   ; [0x407438:4]=0

and i want the get the opcodes

55
8bec
81ec24030000
6a17
--snip--

till i have the full opcodes

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

558bec81ec240300006a17--snip--

How i can do it in python using regex ?
I tried 0x[0-9a-z]\ *(.*?)\ + but it didn’t works

>Solution :

You can use

0x[0-9a-fA-F]{8} *(\S+)
0x[0-9a-fA-F]{8}[\t ]*(\S+)
0x[0-9a-fA-F]{8}[^\S\n]*(\S+)

See the regex demo. Details:

  • 0x – a literal text
  • [0-9a-fA-F]{8} – eight hex chars
  • * – zero or more spaces
  • [\t ]* – zero or more spaces/tabs
  • [^\S\n]* – zero or more whitespaces that are not LF (line feed, "\n") chars
  • (\S+) – Group 1: one or more non-whitespace chars

See the Python demo:

import re
text = "/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);\n|           ; var int32_t var_324h @ ebp-0x324\n|           ; arg int32_t arg_4h @ ebp+0x4\n|           ; arg int32_t arg_8h @ ebp+0x8\n|           0x004020b0      55             push ebp\n|           0x004020b1      8bec           mov ebp, esp\n|           0x004020b3      81ec24030000   sub esp, 0x324\n|           0x004020b9      6a17           push 0x17                   ; 23\n|           0x004020bb      ff151c304000   call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c\n|           0x004020c1      85c0           test eax, eax\n|       ,=< 0x004020c3      7407           je 0x4020cc\n|       |   0x004020c5      b902000000     mov ecx, 2\n|       |   0x004020ca      cd29           int 0x29\n|       |   ; CODE XREF from fcn.004020b0 @ 0x4020c3\n|       `-> 0x004020cc      a340744000     mov dword [0x407440], eax   ; [0x407440:4]=0\n|           0x004020d1      890d3c744000   mov dword [0x40743c], ecx   ; [0x40743c:4]=0\n|           0x004020d7      891538744000   mov dword [0x407438], edx   ; [0x407438:4]=0"
print(re.findall(r'0x[0-9a-fA-F]{8}[\t ]*(\S+)', text))
# => ['55', '8bec', '81ec24030000', '6a17', 'ff151c304000', '85c0', '7407', 'b902000000', 'cd29', 'a340744000', '890d3c744000', '891538744000']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading