Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add one space character (without changing any other characters) to "one character strings" using awk, sed, or grep?

I obtained this text file using sed and awk (leap.log):

Template_frcmod
MASS

Pd 0.000         0.000 

BOND
Pd-c
Pd-3e
c-Pd
4p-ca
o-3e
n-3e
Pd-4e
3p-ca
o-4e
n-4e

ANGLE
Pd-c-Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c-Pd-4p
c-Pd-3e
c-Pd-1c
c-Pd-3p
c-Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o-3e-n
3e-n-c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o-4e-n
4e-n-c3
ca-3p-ca

DIHE

 Pd-4p-ca-ca
 Pd-3e-n-c3
 c-Pd-3e-o
 c-Pd-3e-n
 c-Pd-4e-o
 c-Pd-4e-n
 4p-Pd-3e-o
 4p-Pd-3e-n
 o-3e-n-c3
 o-3e-Pd-1c
 n-3e-Pd-1c
 ca-4p-ca-ca
 ca-ca-4p-ca
 Pd-3p-ca-ca
 Pd-4e-n-c3
 1c-Pd-4e-o
 1c-Pd-4e-n
 3p-Pd-4e-o
 3p-Pd-4e-n
 o-4e-n-c3
 ca-3p-ca-ca
 ca-ca-3p-ca

IMPROPER

NONBON

Now I have a problem with "one character" atom names:

c-Pd-4p

in this line and all other similar lines (which contain one character atom names), "c" must be two characters: "c " (with a space) :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

c -Pd-4p

or in this line:
4e-n-c3 "n" must be "n " 4e-n -c3
or this line:
"Pd-c" must be "Pd-c "
exc.. all atom names which contains one char must be two chars and get a space char.

When I try to change "c" to "c " "1c" become "1c ":
Pd-1c-Pd –> Pd-1c -Pd but I don’t want to change 2 char atom names. It must be stay the same.

When try to this command:

awk 'BEGIN{FS="-"}{ if(length($2) == 1 ) $2= $2" " } {print $0}' leap.log

This time the "-" signs disappeared. What should I do to add all one character atom names with a space?

Expected results (comments jut for this question real file will have not comments):

Template_frcmod
MASS

Pd 0.000         0.000 

BOND
Pd-c  #Also the last "c" must be "c " 
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e

ANGLE
Pd-c -Pd
Pd-3e-o 
Pd-3e-n 
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n 
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o 
Pd-4e-n 
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca

DIHE

Pd-4p-ca-ca
Pd-3e-n-c3
c -Pd-3e-o #Also the last "o" must be "o "
c -Pd-3e-n #Also the last "n" must be "n " 
c -Pd-4e-o #Also the last "o" must be "o "
c-Pd-4e-n  #Also the last "n" must be "n "  
4p-Pd-3e-o #Also the last "o" must be "o " 
4p-Pd-3e-n #Also the last "n" must be "n " 
o -3e-n-c3
o -3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca

IMPROPER

NONBON

>Solution :

Assumptions:

  • only lines of interest are also the only lines that contain a -
  • for the lines of interest there will only be one field containing a -
  • need to test all - delimited strings and all such strings with length()==1 are to have a space ( ) appended on the end of the field
  • leading white space in a line can be ignored/removed

One awk idea:

awk '
/-/ { n=split($1,arr,"-")                          # split field #1 into arr[] array based on "-" delimiter
      x=delim=""
      for (i=1;i<=n;i++) {                         # loop through array
          # piece together our new field
          x=x delim arr[i] ( length(arr[i]) == 1 ? " " : "")
          delim="-"
      }
      $1=x                                         # replace field #1 with value in variable "x"
    }
1
' leap.log

This generates:

Template_frcmod
MASS

Pd 0.000         0.000

BOND
Pd-c
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e

ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca

DIHE

Pd-4p-ca-ca
Pd-3e-n -c3
c -Pd-3e-o
c -Pd-3e-n
c -Pd-4e-o
c -Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o -3e-n -c3
o -3e-Pd-1c
n -3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n -c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca

IMPROPER

NONBON
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading