Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to conditionally replace words with sed?

My file is in the form:

EMPLOYEE
  FIRST NAME: JOHN
  LAST NAME: DOE
  POSITION: ACCOUNT MANAGER
  
EMPLOYEE
  FIRST NAME: BIG
  LAST NAME: BOSS
  POSITION: CEO

Well, it’s a bit more complex than that, but it is enough to have a solution for it.

I try to fix the casing to title case while keeping the alignment and fields names unchanged:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

EMPLOYEE
  FIRST NAME: John
  LAST NAME: Doe
  POSITION: Account Manager
  
EMPLOYEE
  FIRST NAME: Big
  LAST NAME: Boss
  POSITION: CEO

I have used this so far:

sed -E '/^\s{0,}(FIRST NAME|LAST NAME|POSITION)/ { s/((^\s{0,})(FIRST NAME|LAST NAME|POSITION))/\1/; T; s/(\b[A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g; }' employees.list

But it seems not to avoid changing the casing of the field names (FIRST NAME, LAST NAME, POSITION), so these become:

EMPLOYEE
  First Name: John
  Last Name: Doe
  Position: Account Manager
  
EMPLOYEE
  First Name: Big
  Last Name: Boss
  Position: Ceo 

(did not yet go to handle content like CEO).

Is this achievable with sed? If so, how?

>Solution :

{0,}?? Just *.

What is really hard is that you want to apply the "first uppercase rest lowercase" regex on part of the string. What I usually do, is put part of the input into hold space separated by newline, then remove it. Then I can work on the interesting part, finally grab the hold space and res-huffle the output.

sed -E '
    /: CEO/{p;d}
    /^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)/{
        # empty s// reuses last regex
        # add a newline betweej <this>: <and this>
        s//\1\n/
        # hold current line with the newline
        h
        # Remove the first part.
        # `\s*` in regex above super nicely "catches" newline added above.
        s///
        # capitalize
        s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
        # join with a newline and hold space
        G
        # use the capitalized part with the <prefix:> part.
        s/([^\n]*)\n([^\n]*).*/\2\1/
    }
'

Outputs:

EMPLOYEE
  FIRST NAME: John
  LAST NAME: Doe
  POSITION: Account Manager
  
EMPLOYEE
  FIRST NAME: Big
  LAST NAME: Boss
  POSITION: CEO

Overall, consider a real programming languages, more like awk or python etc.


Actually, you can capitalize all words and then just re-uppercase the first part, but you would have to how to exclude the EMPLOYEE line. So you can just do this:

sed -E '
    /: CEO/{p;d}
    /^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/{
        s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
        s/^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/\U\1\E\3/i
    }
'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading