Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Grep title of a page which is written with spaces

I am trying to get the meta title of some website…

some people write title like

`<title>AllHeart Web INC, IT Services Digital Solutions Technology
</title>
`

`<title>AllHeart Web INC, IT Services Digital Solutions Technology</title>`

`<title>
AllHeart Web INC, IT Services Digital Solutions Technology
</title>`

some like more ways… my current focus on above 3 ways…

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I wrote a simple code, it only capture 2nd way of title written, but i am not sure how can I grep the other ways,

`curl -s https://allheartweb.com/ | grep -o '<title>.*</title>'`

I also made a code (very bad i guess)

where i can grep number of line like

`
% curl -s https://allheartweb.com/ | grep -n '<title>'                   
7:<title>AllHeart Web INC, IT Services Digital Solutions Technology

% curl -s https://allheartweb.com/ | grep -n '</title>' 
8:</title>
`

and store it and run loop to get title item… which i guess a bad idea…

any help I can get all possible of getting title?

>Solution :

Try this:

curl -s https://allheartweb.com/ | tr -d '\n' | grep -m 1 -oP '(?<=<title>).+?(?=</title>)'

You can remove newlines from HTML via tr because they have no meaning in the title. The next step returns the first match of the shortest string enclosed in <title> </title>.

This is quite a simple approach of course. xmllint would be better but that’s not available to all platforms by default.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading