I have created a list of sequence names and sequences from a fasta file. Does anybody know how I can remove the ‘>’ character from the sequence names list? I have tried using strip, replace, map. The list provides the following output:
>chrI
>chrII
>chrIII
where it should be:
chrI
chrII
chrIII
fp = open(r'demo_fasta_file_2022.fas', 'r')
def read_fasta(fp):
sequence_names, sequences = None, []
for line in fp:
line = line.rstrip()
if line.startswith(">"):
if sequence_names: yield (sequence_names, ''.join(sequences))
sequence_names, sequences = line, []
else:
sequences.append(line)
if sequence_names: yield (sequence_names, ''.join(sequences))
with open('demo_fasta_file_2022.fas') as fp:
for sequence_names, sequences in read_fasta(fp):
print(sequence_names)
>Solution :
Just slice:
print(line[1:])
If you are unsure of the presence of ‘>’, use:
if line.startswith(">"):
print(line[1:])
else:
print(line)