Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how do i extract the multiple pattern in a column value and put it in new dataframe

I am unexperienced in coding, need help in this simple code.

I can’t split Index, expirydate, strike and opt type.

I am stuck in splitting expiry date.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

**Sample 1**
data = ["NIFTY2431322000PE", "NIFTY2441522000PE"]
df = pd.DataFrame({'details': data})

# Regular expression pattern to match the desired format
pattern = r'([A-Z]+)(\d{2})(\d{1})(\d{2})(\d{5})([A-Z]+)'
df[['Instrument', 'Year', 'month', 'Date', 'strike', 'CEorPE']] = df['details'].str.extract(pattern)
RESULT
             details Instrument Year month Date strike CEorPE
0  NIFTY2431322000PE      NIFTY   24     3   13  22000     PE
1  NIFTY2441522000PE      NIFTY   24     4   15  22000     PE
SAMPLE 2
data = ["NIFTY24MAR22000PE", "NIFTY24DEC22000PE"]
df = pd.DataFrame({'details': data})

pattern = r'([A-Z]+)(\d{2})([A-Z]+)(\d{5})([A-Z]+)'
df[['Instrument', 'Year','month', 'strike', 'CEorPE']] = df['details'].str.extract(pattern)
df['date'] = '30'
Result
             details Instrument Year month strike CEorPE date
0  NIFTY24MAR22000PE      NIFTY   24   MAR  22000     PE   30
1  NIFTY24DEC22000PE      NIFTY   24   DEC  22000     PE   30

I can’t figure out how to run on dataframe which have multiple variation

data = ["NIFTY2431322000PE", "NIFTY24DEC22000PE"]

If possible I want expiry date in 1 column..

thank you in advance

>Solution :

You could use:

pattern = r'([A-Z]+)(\d{2})(\d{1}|[A-Z]+)(\d{2})?(\d{5})([A-Z]+)'
df[['Instrument', 'Year', 'month', 'date', 'strike', 'CEorPE']
  ] =  df['details'].str.extract(pattern).fillna({'date': 30})

Same with named capturing groups:

pattern = r'(?P<Instrument>[A-Z]+)(?P<Year>\d{2})(?P<month>\d{1}|[A-Z]+)(?P<date>\d{2})?(?P<strike>\d{5})(?P<CEorPE>[A-Z]+)'
df[['Instrument', 'Year', 'month', 'date', 'strike', 'CEorPE']] =  df['details'].str.extract(pattern).fillna({'date': 30})

Output:

             details Instrument Year month date strike CEorPE
0  NIFTY2431322000PE      NIFTY   24     3   13  22000     PE
1  NIFTY24DEC22000PE      NIFTY   24   DEC   30  22000     PE

regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading