I have a list of strings from which I wish to extract information around amount, percentages etc. Being new to regex I have been struggling with the process. Below are my input & desired output & the piece of code that I tried using.
Input list:
['0.09% of the first GBP£250 million of the Company’s Net Asset Value;',
'0.08% of the next GBP£250 million of the Company’s Net Asset Value;',
"0.06% of the next GBP£500 million of the Company's Net Asset Value; and",
'in accordance with the formula GBP£22,000 + 365, Minimum fee to be' ]
Code:
import re
def extract_pounds(text):
regex = "£(\w+)"
return re.findall(regex, str(text))
for word in empty_df:
pounds = extract_pounds(word)
print(pounds)
I am getting the following output which is far from being close to my desired output:
['250']
['250']
['500']
['22']
['22']
Desired output:
Tier Amount Minimum Fee
0.09% first GBP£250 million GBP£22,000
0.08% next GBP£250 million
0.06% next GBP£500 million
>Solution :
With pandas, you can try something like this :
import re
import pandas
pat = r"([\d.]+%) of the (\w+ GBP£\d+ \w+)"
df = pd.Series(lst[:-1]).str.extract(pat).set_axis(["Tier", "Amount"], axis=1)
df.loc[0, "Minimum Fee"] = re.search("GBP£\d+,\d+", lst[-1]).group(0)
Output :
print(df)
Tier Amount Minimum Fee
0 0.09% first GBP£250 million GBP£22,000
1 0.08% next GBP£250 million NaN
2 0.06% next GBP£500 million NaN