Let’s say I have this string list:
a = ['6306\nHLIAN\nVARIOUS',
'10215\nSPINA',
'10279\nPIPERI-\nΜYTER',
'38003\nCORN\nSWEET',
'10234ROKA',
'10232\nANTH',
'8682PIPER\nYPAITH',
'8676\nMAROYL',
'10211\nΚAROT\nROOT',
'8685AGG\nYPAU']
I want to remove the digits and keep the first piece of words. So, I want result:
['HLIAN',
'SPINA',
'PIPERI',
'CORN',
'ROKA',
'ANTH',
'PIPER',
'MAROYL',
'ΚAROT',
'AGG']
I tried something like this:
from string import digits
def clean_list(data):
remove_digits = str.maketrans('', '', digits)
no_digs = [s.translate(remove_digits) for s in data]
results = []
for x in no_digs:
if '\n' in x:
if x.count('\n') == 2:
results.append(x.split('\n')[-2])
elif x.count('\n') == 1:
results.append(x.split('\n')[1])
else:
results.append(x)
return results
and I am receiving:
['HLIAN',
'SPINA',
'PIPERI-',
'CORN',
'ROKA',
'ANTH',
'YPAITH',
'MAROYL',
'ΚAROT',
'YPAU']
I can’t catch the '8682PIPER\nYPAITH', and '8685AGG\nYPAU' because they have one \n and two words between.
Also, it would be nice if the 'PIPERI-' would come without the - symbol (it can be done in a next step though).
>Solution :
Just strip() the strings after you remove the digits and take the first string after splitting by \n
def clean_list(data):
remove_digits = str.maketrans('', '', digits)
no_digs = [s.translate(remove_digits).strip() for s in data]
results = [x.split('\n')[0] for x in no_digs]
return results
You can add replace('-', '') to the results of the split to get rid of the -.
results = [x.split('\n')[0].replace('-', '') for x in no_digs]