I have a file containing list of website and it’s quoted in such order:
"ProxyHost": "ie.review.visa.com",
"ProxyHost": "ocasta.zendesk.com",
"ProxyHost": "dev.zemanta.com",
"ProxyHost": "bharian.api.useinsider.com",
"ProxyHost": "optout.service.mycard.visa.com",
"ProxyHost": "ir.newrelic.com",
"ProxyHost": "metabase.yoast.com",
4: "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18: "ProxyHost": "pls.law.harvard.edu",
32: "ProxyHost": "view.jquery.com",
46: "ProxyHost": "www.rmf.harvard.edu",
60: "ProxyHost": "execed.sph.harvard.edu",
74: "ProxyHost": "note.microsoft.com",
102: "ProxyHost": "librarylab.law.harvard.edu",
116: "ProxyHost": "api.jquery.com",
130: "ProxyHost": "pmsdn.
The target is: to Remove any in-front string until : " and if possible; also deletes ", at the end of line. It might require double execution but let’s focus on the main problem. The expected result would look something like this:
librarylab.law.harvard.edu
or
librarylab.law.harvard.edu",
Any remaining left-over ", can be deleted easily using search-replace. Here’s what i have tried:
sed "s/^*\:$//"
sed "s/^.*\:$//"
sed "/^.*#\://"
sed "s/^*\:$/d"
sed "s/^.*\:$/d"
sed -e "/^*/,s/\:/d"
and so-on...
All above give no changes into target file. I’m honestly confused; here’s what i understand:
^*or^.*: Mark Any first string.\:$or#\:
: Mark end string:/dor//: to Delete
Any help would be cherished.
>Solution :
If the last line with the unclosed ", is a typo, then you might use
sed -E 's~^([0-9]+:)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]+)",?$~\2~' file
In the replacement use capture group 2 denoted as \2 as capture group 1 is used for the optional part at the beginning.
The pattern matches:
^Start of string([0-9]+:)?Optionally capture 1+ digits and:in group 1[[:space:]]*Match optional spaces"[^"]*"Match from".....":[[:space:]]*Match:and optional spaces"([^"]+)"Match"then capture in group 2 all between the double quotes and match the ending double quote,?Match an optional comma$End of string
Output
ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.