Delete pattern from a line

I have a file containing list of website and it’s quoted in such order:

      "ProxyHost": "ie.review.visa.com",
      "ProxyHost": "ocasta.zendesk.com",
      "ProxyHost": "dev.zemanta.com",
      "ProxyHost": "bharian.api.useinsider.com",
      "ProxyHost": "optout.service.mycard.visa.com",
      "ProxyHost": "ir.newrelic.com",
      "ProxyHost": "metabase.yoast.com",
4:      "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18:      "ProxyHost": "pls.law.harvard.edu",
32:      "ProxyHost": "view.jquery.com",
46:      "ProxyHost": "www.rmf.harvard.edu",
60:      "ProxyHost": "execed.sph.harvard.edu",
74:      "ProxyHost": "note.microsoft.com",
102:      "ProxyHost": "librarylab.law.harvard.edu",
116:      "ProxyHost": "api.jquery.com",
130:      "ProxyHost": "pmsdn.

The target is: to Remove any in-front string until : " and if possible; also deletes ", at the end of line. It might require double execution but let’s focus on the main problem. The expected result would look something like this:

librarylab.law.harvard.edu
or
librarylab.law.harvard.edu",

Any remaining left-over ", can be deleted easily using search-replace. Here’s what i have tried:

sed "s/^*\:$//"
sed "s/^.*\:$//"
sed "/^.*#\://"
sed "s/^*\:$/d"
sed "s/^.*\:$/d"
sed -e "/^*/,s/\:/d"
and so-on...

All above give no changes into target file. I’m honestly confused; here’s what i understand:

^* or ^.* : Mark Any first string.
\:$ or #\:
: Mark end string :
/d or // : to Delete

Any help would be cherished.

>Solution :

If the last line with the unclosed ", is a typo, then you might use

sed -E 's~^([0-9]+:)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]+)",?$~\2~' file

In the replacement use capture group 2 denoted as \2 as capture group 1 is used for the optional part at the beginning.

The pattern matches:

^ Start of string
([0-9]+:)? Optionally capture 1+ digits and : in group 1
[[:space:]]* Match optional spaces
"[^"]*" Match from "....."
:[[:space:]]* Match : and optional spaces
"([^"]+)" Match " then capture in group 2 all between the double quotes and match the ending double quote
,? Match an optional comma
$ End of string

Output

ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.

Delete pattern from a line

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

I Need Help I am Getting Indentation error

Series of n iterations using a for-loop in R

How to Output a Set as Simple Strings?

How to remove an element from a list, referencing it by weakref?

How do i shift this ul component to above?

Word frequency over time : How to count the word frequency by date?

Keep Up to Date with the Most Important News

Delete pattern from a line

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

I Need Help I am Getting Indentation error

Series of n iterations using a for-loop in R

How to Output a Set as Simple Strings?

How to remove an element from a list, referencing it by weakref?

How do i shift this ul component to above?

Word frequency over time : How to count the word frequency by date?

Discover more from Dev solutions