I’ve got a dataset that looks like this:0 DATE DECRIPTION DEBIT CREDIT BALANCE
1 28/04/2022 Rent Due 1,150.00 £ -£1,150.00
2 17/05/2022 Payment Received from Berns & Co 1,...
3 27/05/2022 Payment 1,150.00 £ £1,150.00
4 28/05/2022 Rent Due 1,150.00 £ £0.00
Essentially a series.
def clean_data(s):
if "Rent Due" in s:
print("Rent Due")
s = s.split("Rent Due")
elif "Payment Received from Berns & Co" in s:
print("Berns")
s = s.split("Payment Received from Berns & Co", )
elif "Payment" in s:
s = s.split("Payment")
print("Payment")
elif "Electricity Charges" in s:
s = s.split("Electricity Charges")
print(s)
return s
bv = data.apply(clean_data)
But I don’t know how to use the split function in apply to transform the data to columns.
The first row is supposed to be the column names
But I could do that using:
bv.columns = [‘DATE’, ‘DESCRIPTION’, ‘DEBIT’, ‘CREDIT’]
Can anyone help?
>Solution :
Use Series.str.split and Series.str.rsplit with n parameters:
#create one column DataFrame and remove first row
df = data.to_frame('data').iloc[1:]
df[['DATE','new']] = df['data'].str.split(n=1, expand=True)
df[['DECRIPTION','CREDIT','col','BALANCE']] = df.pop('new').str.rsplit(n=3, expand=True)
print (df)
data DATE \
1 28/04/2022 Rent Due 1,150.00 £ -£1,150.00 28/04/2022
2 17/05/2022 Payment Received from Berns & Co 1,... 17/05/2022
3 27/05/2022 Payment 1,150.00 £ £1,150.00 27/05/2022
4 28/05/2022 Rent Due 1,150.00 £ £0.00 28/05/2022
DECRIPTION CREDIT col BALANCE
1 Rent Due 1,150.00 £ -£1,150.00
2 Payment Received from Berns & Co 1,150.00 £ -£1,150.00
3 Payment 1,150.00 £ £1,150.00
4 Rent Due 1,150.00 £ £0.00