Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting a column based on character transition in a dataframe

I want to make a basic split (see image below) with Python/Pandas.

enter image description here

For people who use Excel/PowerQuery, there is a nice function that allow them to do that, Splitter.SplitTextByCharacterTransition.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried to make an equivalent in Python using itertools but unfortunately I get a copy of the column "Col1":

import pandas as pd
import itertools

df = pd.DataFrame({'Col1': ['304', '321A', '14', '319B', '315', '88', '242C', '243']})

df['Col2'] = [''.join(a) for b, a in itertools.groupby(df['Col1'])]

>>> df

    Col1    Col2
0   304     304
1   321A    321A
2   14      14
3   319B    319B
4   315     315
5   88      88
6   242C    242C
7   243     243

Do you have any suggestions/propositions to fix that, please ?

>Solution :

Let’s try .str.extractall to extract both the number and character then get the first non value in group

out = df.join(df.pop('Col1')
              .str.extractall('(\d+)|([a-zA-Z]+)')
              .groupby(level=0).first()
              .fillna('')
              .set_axis(['Col1', 'Col2'], axis=1))
print(out)

  Col1 Col2
0  304
1  321    A
2   14
3  319    B
4  315
5   88
6  242    C
7  243
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading