Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex words extraction within a string store it into a python list

I am new to regex, I want to extract specific words within a python string. This is the string:

'1. feature name: occupation_Transport-moving<br>coefficient: 0.1776<br>2. feature name: education<br>coefficient: 0.0726<br>3. feature name: occupation_Machine-op-inspct<br>coefficient: 0.0661<br>4. feature name: occupation_Armed-Forces<br>coefficient: 0.0006<br>5. feature name: workclass_Without-pay<br>coefficient: -0.0194<br>6. feature name: occupation_Handlers-cleaners<br>coefficient: -0.1256<br>7. feature name: occupation_Farming-fishing<br>coefficient: -0.3938<br>8. feature name: GDP Group<br>coefficient: -0.4138<br>9. feature name: occupation_Other-service<br>coefficient: -0.4294<br>10. feature name: occupation_Priv-house-serv<br>coefficient: -0.6560<br>'

The result I am looking for:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[occupation_Transport-moving,education,occupation_Machine-op-inspct,occupation_Armed-Forces,workclass_Without-pay,occupation_Handlers-cleaners,occupation_Farming-fishing,GDP Group,occupation_Other-service,occupation_Priv-house-serv]

I have tried this but it does return the whole string starting from::
re.findall(':\s(.*)<',txt)

Thank you in advance for your assistance.

>Solution :

Use

:\s*([^:.<]+)<

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^:.<]+                  any character except: ':', '.', '<' (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  <                        '<'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading