Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Delete pattern from a line

I have a file containing list of website and it’s quoted in such order:

      "ProxyHost": "ie.review.visa.com",
      "ProxyHost": "ocasta.zendesk.com",
      "ProxyHost": "dev.zemanta.com",
      "ProxyHost": "bharian.api.useinsider.com",
      "ProxyHost": "optout.service.mycard.visa.com",
      "ProxyHost": "ir.newrelic.com",
      "ProxyHost": "metabase.yoast.com",
4:      "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18:      "ProxyHost": "pls.law.harvard.edu",
32:      "ProxyHost": "view.jquery.com",
46:      "ProxyHost": "www.rmf.harvard.edu",
60:      "ProxyHost": "execed.sph.harvard.edu",
74:      "ProxyHost": "note.microsoft.com",
102:      "ProxyHost": "librarylab.law.harvard.edu",
116:      "ProxyHost": "api.jquery.com",
130:      "ProxyHost": "pmsdn.

The target is: to Remove any in-front string until : " and if possible; also deletes ", at the end of line. It might require double execution but let’s focus on the main problem. The expected result would look something like this:

librarylab.law.harvard.edu
or
librarylab.law.harvard.edu",

Any remaining left-over ", can be deleted easily using search-replace. Here’s what i have tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sed "s/^*\:$//"
sed "s/^.*\:$//"
sed "/^.*#\://"
sed "s/^*\:$/d"
sed "s/^.*\:$/d"
sed -e "/^*/,s/\:/d"
and so-on...

All above give no changes into target file. I’m honestly confused; here’s what i understand:

  • ^* or ^.* : Mark Any first string.
  • \:$ or #\:
    : Mark end string :
  • /d or // : to Delete

Any help would be cherished.

>Solution :

If the last line with the unclosed ", is a typo, then you might use

sed -E 's~^([0-9]+:)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]+)",?$~\2~' file

In the replacement use capture group 2 denoted as \2 as capture group 1 is used for the optional part at the beginning.

The pattern matches:

  • ^ Start of string
  • ([0-9]+:)? Optionally capture 1+ digits and : in group 1
  • [[:space:]]* Match optional spaces
  • "[^"]*" Match from "....."
  • :[[:space:]]* Match : and optional spaces
  • "([^"]+)" Match " then capture in group 2 all between the double quotes and match the ending double quote
  • ,? Match an optional comma
  • $ End of string

Output

ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn. 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading