Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding numbers from one series to a series of letters within the same dataframes, possibly using regex

I have a subset of a much larger dataframe:

import pandas as pd

data = {'Name': ['car- at', 'car- ar', 'truck- br', 'car- at'],
        'SysNum': [1, 2, 3, 4 ],
            }
     Name     SysNum
0   car- at     1   
1   car- ar     2   
2   truck- br   3   
3   car- at     4

I need to be able to enter the numbers into the name after the – . The dash is in each name so I tried to use a couple Regex lines but I am unable to get it so far. The dataframe should look like:

    Name      SysNum    Count
0   car- at     1      car-1 at
1   car- ar     2      car-2 ar
2   truck- br   3      truck-3 br
3   car- at     4      car-4 at

I have used something similar before:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['BaseName'] = [re.sub('(?<=-)\d{1,6}', '', s) for s in df['Name'].values]
df['SysNum'] = [int(re.search('(?<=-)\d{1,6}', s)[0]) for s in df['Name'].values]
df['NewName'] = df['BaseName'] + df['SysNum'].astype(str) 

This takes away the numbers after the – and then I add a new set of numbers back onto the end with the last line. I now have a set of letters at the end so I need to be able to put in a new set of numbers after the – and keep the letters on at the end.
A new solution to this or a way to use regex that looks at the – and stop at the next set of letters should work. Thanks.

>Solution :

You can use

df['Count'] = df.apply(lambda x: x['Name'].replace('- ', f"-{x['SysNum']} "), axis=1)
# => df['Count']
#    0      car-1 at
#    1      car-2 ar
#    2    truck-3 br
#    3      car-4 at
#    Name: Count, dtype: object

This is just replacing - + space with - + SysNum column value + space.

If you want to make sure you insert the values after the first word that has a - after it, you may use a regex:

df['Count'] = df.apply(lambda x: re.sub(r'^\w+-', fr"\g<0>{x['SysNum']}", x["Name"]), axis=1)
# => df
#         Name  SysNum       Count
# 0    car- at       1    car-1 at
# 1    car- ar       2    car-2 ar
# 2  truck- br       3  truck-3 br
# 3    car- at       4    car-4 at

Here, ^\w+- matches start of string (^), \w+ matches one or more letters/digits/_, and then - is matched. The replacement is the whole match value (\g<0>) + the value from the SysNum column.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading