I want to output the following table in pandas. I only have the description column so far but I want to split on the comma and output the contents before the comma in the commondescrip column.
I have the description column right now, I need the commondescrip column
| description | commondescrip |
|---|---|
| 00001 | 00001 |
| 00002 | 00002 |
| 00003,Area01 | 00003 |
| 00004 | 00004 |
| 00005,Area02 | 00005 |
I tried
splitword = df2["description"].str.split(",", n=1, expand = True)
df2["commondescrip"] = splitword[0]
but it gives me NaN for those rows that have Area.
How can I fix it so that I can achieve the above the table and split it to output before the comma?
>Solution :
Don’t split, this would require to handle several parts while you’re only interested in one: remove or extract.
removing everything after the first comma:
df['commondescrip'] = df['description'].str.replace(',.*', '', regex=True)
or extracting everything before the first comma:
df['commondescrip'] = df['description'].str.extract('([^,]+)')
output:
description commondescrip
0 00001 00001
1 00002 00002
2 00003,Area01 00003
3 00004 00004
4 00005,Area02 00005