Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove all newlines per "block" of text

My goal is to have some sort of script which can turn "blocks" into single line strings.
For example, turning this,

ジナ「あんまり、おそくならないようにね。
   さ、行ってらっしゃい。

ジナ「あっ、そうそう。
   はい、おこづかい。
   お祭り楽しんでらっしゃい。

into this.

ジナ「あんまり、おそくならないようにね。さ、行ってらっしゃい。

ジナ「あっ、そうそう。はい、おこづかい。お祭り楽しんでらっしゃい。

For an english example, turning this,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

MOM: Run along now, and be back
   before dinner.

MOM: Oh, I almost forgot!
   Here's your allowance, dear!
   Have fun at the fair!

into this.

MOM: Run along now, and be back before dinner.

MOM: Oh, I almost forgot! Here's your allowance, dear! Have fun at the fair!

However this would add the additional (and unnecessary) challenge of adding an extra space for each word, which doesn’t need to be done for the Japanese text, simply use it as a way of understanding what I wish to happen I suppose.
I’m assuming I’d need a sed/awk script because while I considered regex, it just seems I’d need a more powerful tool. Any solution would be wonderful though!

>Solution :

Sounds like you want to change the output records separator (ORS) to two newlines, and change the field separator (FS) to a single space. So just do that:

$ cat input
MOM: Run along now, and be back
   before dinner.

MOM: Oh, I almost forgot!
   Here's your allowance, dear!
   Have fun at the fair!
$ awk '{$1=$1}1' RS= OFS=' ' ORS='\n\n'  input
MOM: Run along now, and be back before dinner.

MOM: Oh, I almost forgot! Here's your allowance, dear! Have fun at the fair

Setting RS to the empty string causes awk to treat a blank line (a line with no text, not including lines that are only whitespace) as the record separator, which seems to be what you mean by a "block".

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading