Home Extract only title from hyperlink in pandas column

Questions

Extract only title from hyperlink in pandas column

April 22, 2022

I have pandas column with hyperlinks and I want to extract only the name of the domain, excluding ".com", "http//", "www."

The following code works for most of my cases but there is one where it does not return the desired string:

docs['link_title'] = docs['hyperlink'].str.extract(r'(?<=\.)(.*?)(?=\.)')

Below are examples of hyperlinks and the results:

http://www.traveldailymedia.com/240881/qantas-launches-uk-agent-incentive/
-> "traveldailymedia"

https://www.instagram.com/p/BKDJcO-htRs/ -> "instagram"

But this is an example where I don’t get the title of the domain:

http://dtinews.vn/en/news/018/46981/vietnam-to-buy-40-airbus-planes.html
-> "vn/en/news/018/46981/vietnam-to-buy-40-airbus-planes"

Because there is no leading dot (".") it does not get the name which is "dtinews".

I would appreciate help with the regex here or some alternative to my approach.

>Solution :

You can use tldextract:

import tldextract
import pandas as pd
docs = pd.DataFrame({'hyperlink':["http://www.traveldailymedia.com/240881/qantas-launches-uk-agent-incentive/","https://www.instagram.com/p/BKDJcO-htRs/","http://dtinews.vn/en/news/018/46981/vietnam-to-buy-40-airbus-planes.html"]})
docs['link_title'] = docs['hyperlink'].apply(lambda x: tldextract.extract(x).domain)

Output:

>>> docs['link_title']
0    traveldailymedia
1           instagram
2             dtinews

pandas

byMR

Published April 22, 2022

Add a comment

Extract array from javascript object

byMR

April 22, 2022

Questions

Send an array to a Controller action via UrlParams

byMR

April 22, 2022

Questions

Comparing two lists and printing same value Python

byMR

April 22, 2022

Questions

Access most recent value in JSON nested object with JS

byMR

April 22, 2022

Questions

Warning: A component is changing an uncontrolled input to be controlled

byMR

April 22, 2022

Questions

Correct way of converting unicode to emoji

byMR

April 22, 2022

Extract only title from hyperlink in pandas column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Extract array from javascript object

Send an array to a Controller action via UrlParams

Comparing two lists and printing same value Python

Access most recent value in JSON nested object with JS

Warning: A component is changing an uncontrolled input to be controlled

Correct way of converting unicode to emoji

Keep Up to Date with the Most Important News

Extract only title from hyperlink in pandas column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Extract array from javascript object

Send an array to a Controller action via UrlParams

Comparing two lists and printing same value Python

Access most recent value in JSON nested object with JS

Warning: A component is changing an uncontrolled input to be controlled

Correct way of converting unicode to emoji

Discover more from Dev solutions