Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace part of pandas row and make a new column

I have the below pandas dataframe.

d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['22a| df a1asd_V1', 'xcd a2a_sd_V3','23vg aa_bsd_V1','22a| df a1asd_V1|5mo','a3as  d_V1','aa b_12mo','aasd_V4','aa_6mo_bsd','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
    col1    col3    Name                    Date        
0   1       1       22a| df a1asd_V1        2021-06-13
1   2       22      xcd a2a_sd_V3           2021-06-13
2   3       33      23vg aa_bsd_V1          2021-06-13
3   4       44      22a| df a1asd_V1|5mo    2021-06-14
4   5       55      a3as d_V1               2021-06-15
5   60      60      aa b_12mo               2021-06-15
6   0       1       aasd_V4                 2021-06-13
7   0       5       aa_6mo_bsd              2021-06-16
8   6       6       aa_adn sd_V15           2021-06-13
9   3       3       NaN                     2021-06-13
10  2       2       aasd_V12                2021-06-13
11  4       4       aasd120Abs              2021-06-16

I want to replace _, | into space and if there is like 5mo, 6mo, 12mo.. into 5 months, 6 months, 12, months like that for Name column and make a new column called New Name. Like below data frame.

    col1    col3    Name                    Date             NewName
0   1       1       22a| df a1asd_V1        2021-06-13      22a  df a1asd V1
1   2       22      xcd a2a_sd_V3           2021-06-13      xcd a2a sd V3
2   3       33      23vg aa_bsd_V1          2021-06-13      23vg aa bsd V1
3   4       44      22a| df a1asd_V1|5mo    2021-06-14      22a df a1asd V1 5 months
4   5       55      a3as d_V1               2021-06-15      a3as d V1
5   60      60      aa b_12mo               2021-06-15      aa b 12 months  
6   0       1       aasd_V4                 2021-06-13      aasd V4     
7   0       5       aa_6mo_bsd              2021-06-16      aa 6 months bsd 
8   6       6       aa_adn sd_V15           2021-06-13      aa adn sd V15
9   3       3       NaN                     2021-06-13      NaN 
10  2       2       aasd_V12                2021-06-13      aasd V12
11  4       4       aasd120Abs              2021-06-16      aasd120Abs

Is it possible to do it in the lambda function? Since my actual data frame has more than 1million records I need something much efficient to work.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks in advance! Any idea would be appriciate.

>Solution :

You can use pd.Series.replace:

print (df["Name"].replace({"[|_]":" ", "(\d+)mo":"\\1 months"}, regex=True))

0              22a  df a1asd V1
1                 xcd a2a sd V3
2                23vg aa bsd V1
3     22a  df a1asd V1 5 months
4                    a3as  d V1
5                aa b 12 months
6                       aasd V4
7               aa 6 months bsd
8                 aa adn sd V15
9                           NaN
10                     aasd V12
11                   aasd120Abs
Name: Name, dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading