Sample dataset I have looks like this:
| Language | Count |
|---|---|
| Russian | 1000 |
| English | 1500 |
| Spanish | 500 |
| Arabic,Russian, English, Spanish | 2 |
| Arabic, English | 15 |
I want it to transform so that the result looks like this:
| Language | Count |
|---|---|
| Russian | 1002 |
| English | 1517 |
| Spanish | 502 |
| Arabic | 17 |
So what happened is that, I parsed rows that contained more than one language. Added up them to languages that were already given. If it was not given (in this case: Arabic) created the new one.
How can i achieve this?
Thank you!
>Solution :
Use DataFrame.assign with Series.str.split, DataFrame.explode and last aggregate sum:
df = (df.assign(Language=df.Language.str.split(','))
.explode('Language')
.groupby('Language', as_index=False, sort=False)
.sum())
print (df)
Language Count
0 Russian 1002
1 English 1517
2 Spanish 502
3 Arabic 17