Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create new column and add value before hyphen in a dataframe when splitting word with hyphen in another column of same dataframe?

I want to create a new column and add values before hyphen in a dataframe when splitting word with hyphen in another column of same dataframe.That is , I want to show only the value before the hyphen in the new column.

code:

data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
    'kpi': ['D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'K sales', 'K sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales'],
    'product': ['ps', 'ss', 'ks', 'ds', 'ls', 'js', 'vs', 'es', 'rs',
            'os', 'Export-ps', 'Retail-is', 'Export-water', 'Retail-bottle', 'Export-salt', 'Farming-Fruits', 'Farming-Vegetables', 'Export-Tea',
            'Export-water', 'Retail-bottle', 'Retail-water', 'Farming-Vegetables', 'Farming-Vegetables', 'Export-salt'],
    'year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023]
    }
df = pd.DataFrame(data)
new = df["product"].str.split(" ", n = 1, expand = True)
df["group"]= new[0]
print(df)

Output :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

        month      kpi             product  year               group
0       April  D sales                  ps  2022                  ps
1         May  D sales                  ss  2022                  ss
2        June  D sales                  ks  2022                  ks
3        July  D sales                  ds  2022                  ds
4      August  D sales                  ls  2022                  ls
5   September  D sales                  js  2022                  js
6     October  D sales                  vs  2022                  vs
7    November  D sales                  es  2022                  es
8    December  D sales                  rs  2022                  rs
9     January  D sales                  os  2023                  os
10   February  K sales           Export-ps  2023           Export-ps
11      March  K sales           Retail-is  2023           Retail-is
12      April  S sales        Export-water  2022        Export-water
13        May  S sales       Retail-bottle  2022       Retail-bottle
14       June  S sales         Export-salt  2022         Export-salt
15       July  S sales      Farming-Fruits  2022      Farming-Fruits
16     August  S sales  Farming-Vegetables  2022  Farming-Vegetables
17  September  S sales          Export-Tea  2022          Export-Tea
18    October  S sales        Export-water  2022        Export-water
19   November  S sales       Retail-bottle  2022       Retail-bottle
20   December  S sales        Retail-water  2022        Retail-water
21    January  S sales  Farming-Vegetables  2023  Farming-Vegetables
22   February  S sales  Farming-Vegetables  2023  Farming-Vegetables
23      March  S sales         Export-salt  2023         Export-salt

Expected output is:

    month      kpi             product  year           group
0       April  D sales                  ps  2022                  
1         May  D sales                  ss  2022                  
2        June  D sales                  ks  2022                  
3        July  D sales                  ds  2022                  
4      August  D sales                  ls  2022                  
5   September  D sales                  js  2022                  
6     October  D sales                  vs  2022                  
7    November  D sales                  es  2022                  
8    December  D sales                  rs  2022                 
9     January  D sales                  os  2023                  
10   February  K sales           Export-ps  2023       Export           
11      March  K sales           Retail-is  2023       Retail        
12      April  S sales        Export-water  2022       Export
13        May  S sales       Retail-bottle  2022       Retail
14       June  S sales         Export-salt  2022       Export
15       July  S sales      Farming-Fruits  2022       Farming
16     August  S sales  Farming-Vegetables  2022       Farming
17  September  S sales          Export-Tea  2022       Export
18    October  S sales        Export-water  2022       Export
19   November  S sales       Retail-bottle  2022       Retail
20   December  S sales        Retail-water  2022       Retail
21    January  S sales  Farming-Vegetables  2023       Farming
22   February  S sales  Farming-Vegetables  2023       Farming
23      March  S sales         Export-salt  2023       Export

can anyone suggest a solution to get the expected output?

>Solution :

Use Series.str.extract:

df["group"] = df["product"].str.extract(r'(.*)-')
# capturing everything before `-`, returning `NaN` if no match

print(df)

        month      kpi             product  year    group
0       April  D sales                  ps  2022      NaN
1         May  D sales                  ss  2022      NaN
2        June  D sales                  ks  2022      NaN
3        July  D sales                  ds  2022      NaN
4      August  D sales                  ls  2022      NaN
5   September  D sales                  js  2022      NaN
6     October  D sales                  vs  2022      NaN
7    November  D sales                  es  2022      NaN
8    December  D sales                  rs  2022      NaN
9     January  D sales                  os  2023      NaN
10   February  K sales           Export-ps  2023   Export
11      March  K sales           Retail-is  2023   Retail
12      April  S sales        Export-water  2022   Export
13        May  S sales       Retail-bottle  2022   Retail
14       June  S sales         Export-salt  2022   Export
15       July  S sales      Farming-Fruits  2022  Farming
16     August  S sales  Farming-Vegetables  2022  Farming
17  September  S sales          Export-Tea  2022   Export
18    October  S sales        Export-water  2022   Export
19   November  S sales       Retail-bottle  2022   Retail
20   December  S sales        Retail-water  2022   Retail
21    January  S sales  Farming-Vegetables  2023  Farming
22   February  S sales  Farming-Vegetables  2023  Farming
23      March  S sales         Export-salt  2023   Export

Chain Series.fillna if you want to replace the NaN values (e.g. with an empty string: .fillna('')).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading