Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Separate numerical values from alphabetic characters within a column as well as expand dates

I would like to separate the numerical values from the alphabetic characters within a column as well as to expand the quarters in a date.

Data

state   stat1   type    stat2   qtr
NY      up      AAA01   Gr      Q1 24
NY      up      AAA02   Re      Q1 24
NY      up      BB01    Gr      Q1 24
NY      up      DD01    Gr      Q1 24
NY      up      DD02    Gr      Q1 24
CA      low     AAA01   Re      Q2 24
CA      low     DD01    Re      Q2 24
CA      low     AAA01   Re      Q2 24
CA      low     SSS01   Gr      Q2 24

Desired

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

state   stat1   type    stat2   qtr
NY      up      AAA     Gr      Q1 2024
NY      up      AAA     Re      Q1 2024
NY      up      BB      Gr      Q1 2024
NY      up      DD      Gr      Q1 2024
NY      up      DD      Gr      Q1 2024
CA      low     AAA     Re      Q2 2024
CA      low     DD      Re      Q2 2024
CA      low     AAA     Re      Q2 2024
CA      low     SSS     Gr      Q2 2024

Doing

# Extract year from the 'qtr' column
df['year'] = df['qtr'].apply(lambda x: x.split(' ')[-1])

# Modify the 'type' column to include only the first three characters
df['type'] = df['type'].str[:3]

# Concatenate 'qtr' and 'year' columns
df['qtr'] = df['qtr'].apply(lambda x: x.split(' ')[0] + ' ' + x.split(' ')[-1])

However the output does not fully remove the numerical values from the characters within a column. The quarter transformation is not correct as well. Any suggestion is appreciated.

>Solution :

A possible solution:

df.assign(type = df['type'].str.replace(r'\d*$', '', regex=True),
          qtr = df['qtr'].str.replace(' ', ' 20'))

Output:

  state stat1 type stat2      qtr
0    NY    up  AAA    Gr  Q1 2024
1    NY    up  AAA    Re  Q1 2024
2    NY    up   BB    Gr  Q1 2024
3    NY    up   DD    Gr  Q1 2024
4    NY    up   DD    Gr  Q1 2024
5    CA   low  AAA    Re  Q2 2024
6    CA   low   DD    Re  Q2 2024
7    CA   low  AAA    Re  Q2 2024
8    CA   low  SSS    Gr  Q2 2024
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading