How to conditionally replace words with sed?

November 1, 2023

My file is in the form:

EMPLOYEE
  FIRST NAME: JOHN
  LAST NAME: DOE
  POSITION: ACCOUNT MANAGER
  
EMPLOYEE
  FIRST NAME: BIG
  LAST NAME: BOSS
  POSITION: CEO

Well, it’s a bit more complex than that, but it is enough to have a solution for it.

I try to fix the casing to title case while keeping the alignment and fields names unchanged:

EMPLOYEE
  FIRST NAME: John
  LAST NAME: Doe
  POSITION: Account Manager
  
EMPLOYEE
  FIRST NAME: Big
  LAST NAME: Boss
  POSITION: CEO

I have used this so far:

sed -E '/^\s{0,}(FIRST NAME|LAST NAME|POSITION)/ { s/((^\s{0,})(FIRST NAME|LAST NAME|POSITION))/\1/; T; s/(\b[A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g; }' employees.list

But it seems not to avoid changing the casing of the field names (FIRST NAME, LAST NAME, POSITION), so these become:

EMPLOYEE
  First Name: John
  Last Name: Doe
  Position: Account Manager
  
EMPLOYEE
  First Name: Big
  Last Name: Boss
  Position: Ceo

(did not yet go to handle content like CEO).

Is this achievable with sed? If so, how?

>Solution :

{0,}?? Just *.

What is really hard is that you want to apply the "first uppercase rest lowercase" regex on part of the string. What I usually do, is put part of the input into hold space separated by newline, then remove it. Then I can work on the interesting part, finally grab the hold space and res-huffle the output.

sed -E '
    /: CEO/{p;d}
    /^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)/{
        # empty s// reuses last regex
        # add a newline betweej <this>: <and this>
        s//\1\n/
        # hold current line with the newline
        h
        # Remove the first part.
        # `\s*` in regex above super nicely "catches" newline added above.
        s///
        # capitalize
        s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
        # join with a newline and hold space
        G
        # use the capitalized part with the <prefix:> part.
        s/([^\n]*)\n([^\n]*).*/\2\1/
    }
'

Outputs:

EMPLOYEE
  FIRST NAME: John
  LAST NAME: Doe
  POSITION: Account Manager
  
EMPLOYEE
  FIRST NAME: Big
  LAST NAME: Boss
  POSITION: CEO

Overall, consider a real programming languages, more like awk or python etc.

Actually, you can capitalize all words and then just re-uppercase the first part, but you would have to how to exclude the EMPLOYEE line. So you can just do this:

sed -E '
    /: CEO/{p;d}
    /^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/{
        s/\b([A-Za-z])([A-Za-z]*)\b/\U\1\L\2/g
        s/^(\s*(FIRST NAME|LAST NAME|POSITION):\s*)(.*)/\U\1\E\3/i
    }
'