Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to substitute a sequence of the same character and of variable length in bash?

I am quite sure this question already have an answer, but i can’t find it so if there is one, please link it in the comments.

Otherwise, the problem i need to solve is how to substitute a sequence of the same character, that may occur once or more and we don’t know how many times at max, with a single character, in order to organise strings with a known delimiter.

Also, in my specific case i have to substitute the * but i can do a preprocessing to substitute it with an easier-to-handle character.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is a quite bad solution and it assumes that the max length of the pattern is known. But, of course, this is not true.

cat example_file.txt | sed 's/\*\*\*\*\*\*\*\*/_/g' | sed 's/\*\*\*\*\*\*\*/_/g' | sed 's/\*\*\*\*\*\*/_/g' | sed 's/\*\*\*\*\*/_/g' | sed 's/\*\*\*\*/_/g' | sed 's/\*\*\*/_/g' | sed 's/\*\*/_/g' | sed 's/\*/_/g' > clean_file.txt

with example_file.txt containing something like:

>SH1111056.09FU|KC881085_refs|k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Clavicipitaceae;g__Neotyphodium;s__Neotyphodium_siegelii;|foliar_endophyte*litter_saprotroph*class1_clavicipitaceous_endophyte**leaf/fruit/seed**non-aquatic*arthropod-associated*filamentous_mycelium******
>SH1115797.09FU|UDB031565_refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Hymenochaetales;f__Hymenochaetaceae;g__Fomitiporia;s__Fomitiporia_hippophaeicola;|plant_pathogen*wood_saprotroph**wood_pathogen*wood*white_rot*non-aquatic**filamentous_mycelium*polyporoid*poroid****
>SH0879139.09FU|KF945456|k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Lamiales;f__Acanthaceae;g__Ruellia;s__Ruellia_brandbergensis;|ND**************
>SH0991532.09FU|UDB07658019|k__Fungi;p__Ascomycota;c__Dothideomycetes;o__Venturiales;f__Venturiaceae;g__Sympodiella;s__Sympodiella_sp;|litter_saprotroph****leaf/fruit/seed**non-aquatic**filamentous_mycelium******
>SH0991546.09FU|UDB07657573|k__Fungi;p__Ascomycota;c__Dothideomycetes;o__Venturiales;f__Venturiaceae;g__Sympodiella;s__Sympodiella_sp;|litter_saprotroph****leaf/fruit/seed**non-aquatic**filamentous_mycelium******

EDIT:

the expected output, assuming the * is substituted with _ would be this:

>SH1111056.09FU|KC881085_refs|k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Clavicipitaceae;g__Neotyphodium;s__Neotyphodium_siegelii;|foliar_endophyte_litter_saprotroph_class1_clavicipitaceous_endophyte_leaf/fruit/seed_non-aquatic_arthropod-associated_filamentous_mycelium_
>SH1115797.09FU|UDB031565_refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Hymenochaetales;f__Hymenochaetaceae;g__Fomitiporia;s__Fomitiporia_hippophaeicola;|plant_pathogen_wood_saprotroph_wood_pathogen_wood_white_rot_non-aquatic_filamentous_mycelium_polyporoid_poroid_
>SH0879139.09FU|KF945456|k__Viridiplantae;p__Anthophyta;c__Eudicotyledonae;o__Lamiales;f__Acanthaceae;g__Ruellia;s__Ruellia_brandbergensis;|ND_
>SH0991532.09FU|UDB07658019|k__Fungi;p__Ascomycota;c__Dothideomycetes;o__Venturiales;f__Venturiaceae;g__Sympodiella;s__Sympodiella_sp;|litter_saprotroph_leaf/fruit/seed_non-aquatic_filamentous_mycelium_
>SH0991546.09FU|UDB07657573|k__Fungi;p__Ascomycota;c__Dothideomycetes;o__Venturiales;f__Venturiaceae;g__Sympodiella;s__Sympodiella_sp;|litter_saprotroph_leaf/fruit/seed_non-aquatic_filamentous_mycelium_

>Solution :

check this out

tr -s '*' '_' < example_file.txt > clean_file.txt  

or

cat example_file.txt | tr -s '*' '_' > clean_file.txt

the output

>SH1111056.09FU|KC881085_refs|k_Fungi;p_Ascomycota;c_Sordariomycetes;o_Hypocreales;f_Clavicipitaceae;g_Neotyphodium;s_Neotyphodium_siegelii;|foliar_endophyte_litter_saprotroph_class1_clavicipitaceous_endophyte_leaf/fruit/seed_non-aquatic_arthropod-associated_filamentous_mycelium_
>SH1115797.09FU|UDB031565_refs|k_Fungi;p_Basidiomycota;c_Agaricomycetes;o_Hymenochaetales;f_Hymenochaetaceae;g_Fomitiporia;s_Fomitiporia_hippophaeicola;|plant_pathogen_wood_saprotroph_wood_pathogen_wood_white_rot_non-aquatic_filamentous_mycelium_polyporoid_poroid_
>SH0879139.09FU|KF945456|k_Viridiplantae;p_Anthophyta;c_Eudicotyledonae;o_Lamiales;f_Acanthaceae;g_Ruellia;s_Ruellia_brandbergensis;|ND_
>SH0991532.09FU|UDB07658019|k_Fungi;p_Ascomycota;c_Dothideomycetes;o_Venturiales;f_Venturiaceae;g_Sympodiella;s_Sympodiella_sp;|litter_saprotroph_leaf/fruit/seed_non-aquatic_filamentous_mycelium_
>SH0991546.09FU|UDB07657573|k_Fungi;p_Ascomycota;c_Dothideomycetes;o_Venturiales;f_Venturiaceae;g_Sympodiella;s_Sympodiella_sp;|litter_saprotroph_leaf/fruit/seed_non-aquatic_filamentous_mycelium_
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading