for example:
I have a dataframe with columns like
| lens | plain-prod 102 | plain-prod 105 | plain-prod 107 |
|---|---|---|---|
| First | 1 | 3 | 4 |
| Second | 2 | 5 | 3 |
| First | 3 | 7 | 2 |
| Second | 4 | 8 | 1 |
so i need to do a pattern matching (^plain-prod.*) and pick up all 3 columns matching that and create new column plain_sum having the sum how to achieve this using pyspark or pandas.
| lens | plain-prod 102 | plain-prod 105 | plain-prod 107 | plain_sum |
|---|---|---|---|---|
| First | 1 | 3 | 4 | 8 |
| Second | 2 | 5 | 3 | 10 |
| First | 3 | 7 | 2 | 12 |
| Second | 4 | 8 | 1 | 13 |
>Solution :
Try this approach with Pandas (df is your data frame):
df['plain_sum'] = df.filter(regex='^plain-prod.*').sum(axis=1)