I want to extract specific columns that contain specific names. Below you can see my data
import numpy as np
import pandas as pd
data = {
'Names': ['Store (007) Total amount of Sales ',
'Store perc (65) Total amount of sales ',
'Mall store, aid (005) Total amount of sales',
'Increase in the value of sales / Additional seling (22) Total amount of sales',
'Dividends (0233) Amount of income tax',
'Other income (098) Total amount of Sales',
'Other income (0245) Amount of Income Tax',
],
'Sales':[10,10,9,7,5,5,5],
}
df = pd.DataFrame(data, columns = ['Names',
'Sales',
])
df
This data have some specific columns that I need to be selected in the separate data frame. Keywords for this selection are words Total amount of Sales or Total amount of sales . These words are placed after the second brackets ). Also please take into account that text is no trimmed so empty spaces are possible.
So can anybody help me how to solve this ?
>Solution :
Use Series.str.contains without test cases with case=False in boolean indexing:
df1 = df[df['Names'].str.contains('Total amount of Sales', case=False)]
print (df1)
Names Sales
0 Store (007) Total amount of Sales 10
1 Store perc (65) Total amount of sales 10
2 Mall store, aid (005) Total amount of sales 9
3 Increase in the value of sales / Additional se... 7
5 Other income (098) Total amount of Sales 5
Or if need test sales or Sales use:
df2 = df[df['Names'].str.contains('Total amount of [Ss]ales')]
