I have a file test_input.htm with a table:
<table>
<thead>
<tr>
<th>Acronym</th>
<th>Full Term</th>
<th>Definition</th>
<th>Product </th>
</tr>
</thead>
<tbody>
<tr>
<td>a1</td>
<td>term</td>
<td>
<p>texttext.</p>
<p>Source: PRISMA-GLO</p>
</td>
<td>
<p>PRISMA</p>
<p>SDDS-NG</p>
</td>
</tr>
<tr>
<td>a2</td>
<td>term</td>
<td>
<p>texttext.</p>
<p>Source: PRISMA-GLO</p>
</td>
<td>
<p>PRISMA</p>
</td>
</tr>
<tr>
<td>a3</td>
<td>term</td>
<td>
<p>texttext.</p>
<p>Source: PRISMA-GLO</p>
</td>
<td>
<p>SDDS-NG</p>
</td>
</tr>
<tr>
<td>a4</td>
<td>term</td>
<td>
<p>texttext.</p>
<p>Source: SD-GLO</p>
</td>
<td>
<p>SDDS-NG</p>
</td>
</tr>
</tbody>
</table>
I would like to write only table rows to file test_output.htm that contain the keyword PRISMA in column 4 (Product).
The follwing script gives me all table rows that contain the keyword PRISMA in any of the 4 columns:
from bs4 import BeautifulSoup
file_input = open('test_input.htm')
results = BeautifulSoup(file_input.read(), 'html.parser')
inhalte = results.find_all('tr')
with open('test_output.htm', 'a') as f:
data = [[td.findChildren(text=True) for td in inhalte]]
for line in inhalte: #if you see a line in the table
if line.get_text().find('PRISMA') > -1 : #and you find the specific string
f.write("%s\n" % str(line))
I really tried hard but could not figure out how to restict the search to column 4.
The following did not work:
data = [[td.findChildren(text=True) for td in tr.findAll('td')[4]] for tr in inhalte]
I would really appreciate if someone could help me find the solution.
>Solution :
Select more specific to get the elements you expect – For example use css selectors to achieve your task. Following line will only select tr from table thats fourth td contains PRISMA:
soup.select('table tr:has(td:nth-of-type(4):-soup-contains("PRISMA"))')
Example
from bs4 import BeautifulSoup
file_input = open('test_input.htm')
soup = BeautifulSoup(file_input.read(), 'html.parser')
with open('test_output.htm', 'a') as f:
for line in soup.select('table tr:has(td:nth-of-type(4):-soup-contains("PRISMA"))'):
f.write("%s\n" % str(line))