Is there a way to loop/iterate a series of str_extract calls in R?

I feel like there should be an easy way to do this but I’ve hit a dead end. I have a large text dataset, and I want to know which countries are mentioned in each document. Sometimes it will say "afghanistan", sometimes "afghan", but since those are referring to the same country I want to… Read More Is there a way to loop/iterate a series of str_extract calls in R?

Python Pandas Extract text between a word and a symbol

I am trying to extract text between a word and a symbol. Here is the input table. And my expected output is like this. I do not want to have the word ‘Team:’ and ‘<>’ in the output. I tried something like this but it keeps the ‘Team:’ and ‘<>’ in the output: data[new col]=data[‘Team’].str.extract(r'(Team:\s[a-zA-Z\s]+<>)… Read More Python Pandas Extract text between a word and a symbol

extract the domain name from the urls in another list

extract the domain name from the urls in another list. Also you need to extract the ending string which the url ends with. For example, https://www.example.com/market.php — In this example, domain name is http://www.example.com and the ending string is php Extract the domains and the ending string # List of urls url_list = [‘https://blog.hubspot.com/marketing/parts-url&#8217;, ‘https://www.almabetter.com/enrollments&#8217;,… Read More extract the domain name from the urls in another list

Extract Numeric info from Pandas column using regex

I am trying to extract the highlighted "numeric information" from a Pandas DataFrame column: Text Dimensions: 23"/60 Dimensions: 23" / 60 Dimensions: 48" Dimensions: 22.5X8.25 Dimensions: 80IN Dimensions: 567 S Dimensions: 22.5X8.25 Dimensions: 26INNP Dimensions: 24" x 55" with pipe 16 x 7 I am using regex and is as follows: regex = r"(\d([^\s]*)\s.\s\d*[^\s])|(\d([^\s])*)" I… Read More Extract Numeric info from Pandas column using regex

extract strings from HTML tag pandas

How do I extract the following strings using str.extract or regex or any efficient way using python pandas in this tags below <a href="http://twitter.com/download/iphone&quot; rel="nofollow">Twitter for iPhone</a> <a href="http://twitter.com&quot; rel="nofollow">Twitter Web Client</a> <a href="http://vine.co&quot; rel="nofollow">Vine – Make a Scene</a> <a href="https://about.twitter.com/products/tweetdeck&quot; rel="nofollow">TweetDeck</a> am using: .str.extract(‘(>[A-Za-z])<‘) I want this output: Twitter for iPhone Twitter Web Client… Read More extract strings from HTML tag pandas

Getting XML values from a Oracle CLOB database column

By using either PL/SQL or SQL I’m trying to extract specific XML values from a database CLOB column. Table : PDI_SUBMITTED_XML (PSX_AGREEMENT NUMBER(10), PSX_DOCUMENT CLOB) For example I’m trying to extract the value "Broker Region" from the BranchName tag below from the actual CLOB contents. <?xml version="1.0" encoding="UTF-8"?> <tns:AgreementWrapper xmlns:tns="http://ws.pancredit.com/wsdl&quot; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt; <tns:Agreement> <tns:AdminFee>199</tns:AdminFee> <tns:AdminFeeFinanced>true</tns:AdminFeeFinanced> <tns:Affordability>… Read More Getting XML values from a Oracle CLOB database column