Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I need to pass the result of soup.find_all to another soup.find_all function to filter the HTML code for a project

I have this HTML code for example:

                    <table class="nested4">
                    <tr>
                        <td colspan="1"></td>
                        <td colspan="2">
                            <h2 class="zeroMargin" id="govtMsg" visible="false"></h2>
                        </td>
                        <td colspan="2">
                            <h2 class="zeroMargin "> Net Metering Conn. </h2>
                        </td>
                        <td colspan="2">
                            <h2 class="zeroMargin" hidden> Life Line Consumer</h2>
                        </td>
                    </tr>
                    <tr>
                        <td colspan="2">
                            <p style="margin: 0; text-align: left; padding-left: 5px">
                                <span>NAME & ADDRESS</span>
                                <br />
                                <span>MUHAMMAD AMIN                 </span>
                                <br />
                                <span>S/O MUHAMMAD KHAN             </span>
                                <br />
                                <span>H-NO.38 MARGALLA ROAD         </span>
                                <br />
                                <span>F-6/3 ISLAMABAD3              </span>
                                <br />
                                <span></span>
                                
                                
                            </p>
                        </td>
                        <td colspan="3" style="text-align: left">
                            <h2 class="color-red">Say No To Corruption</h2>
                            

                            <span style="font-size: 8pt; color: #78578e"> MCO Date : 10-Aug-2018</span>
                            <br />

                            

                        </td>
                        <td>
                            <h3 style="font-size: 14pt;"> </h3>
                            <h2>  <br /> </h2>
                        </td>
                    </tr>
                    <tr>
                        <td style="margin-top: 0;" class="border-b">
                            
                            
                            
                            <br />
                            
                        </td>
                        <td colspan="1" style="margin-top: 0;" class="border-b">
                        </td>
                        <td colspan="1" style="margin-top: 0;" class="border-b">
                            
                        </td>
                    </tr>
                    <tr style="height: 7%;" class="border-tb">
                        <td style="width: 130px" class="border-r">
                            <h4>METER NO</h4>
                        </td>
                        <td style="width: 90px" class="border-r">
                            <h4>PREVIOUS READING</h4>
                        </td>
                        <td style="width: 90px" class="border-r">
                            <h4>PRESENT READING</h4>
                        </td>
                        <td style="width: 60px" class="border-r">
                            <h4>MF</h4>
                        </td>
                        <td style="width: 60px" class="border-r">
                            <h4>UNITS</h4>
                        </td>
                        <td>
                            <h4>STATUS</h4>
                        </td>
                    </tr>
                    <tr style="height: 30px" class="content">
                        <td class="border-r">
                            3-P   I 3301539<br> I 3301539<br> E 3301539<br> E 3301539<br>
                        </td>
                        <td class="border-r">
                            78693<br>16823<br>19740<br>8<br>
                        </td>
                        <td class="border-r">
                            80086<br>17210<br>20139<br>8<br>
                        </td>
                        <td class="border-r">
                            1<br>1<br>1<br>1<br>
                        </td>
                        <td class="border-r">
                            1393<br>387<br>399<br>0<br>
                        </td>
                        <td>
                            
                        </td>
                    </tr>
                    <tr id="roshniMsg" style="height: 30px" class="content">
<td colspan="6">
                            <div style="width: 452pt">
                                <img style="max-width: 100%; max-height: 35%" src="/images/companies/iesco/roshniMsg.jpg"
                                    alt="Roshni Message" />
                            </div>
                        </td>
                     </tr>     
    </table>

From this table I want to extract the paragraph and from there I want to get all the span tags in that paragraph.
I used soup.find_all() to get the table but I don’t know how to use this function iteratively to pass it back to the original soup object so that I could find the paragraph and, moreover the span tags in that paragraph.

This is the code Python code I wrote:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

soup = BeautifulSoup(string, 'html.parser')
#Getting the table tag
results = soup.find_all('table', attrs={'class':'nested4'})
#Getting the paragragh tag 
results = soup.find_all('p', attrs={'style':'margin: 0; text-align: left; padding-left: 5px'})
#Getting all the span tags
results = soup.find_all('span', attrs={})

I just want help on how to get the paragraphs within the table. And then how to get the spans within the paragraph as I am getting the spans in all of the original HTML code. I don’t know how to pass the bs4 object list back to the soup object to use soup.find_all iteratively.

>Solution :

from bs4 import BeautifulSoup

html = '''
<table class="nested4">
                    <tr>
                        <td colspan="1"></td>
                        <td colspan="2">
                            <h2 class="zeroMargin" id="govtMsg" visible="false"></h2>
                        </td>
                        <td colspan="2">
                            <h2 class="zeroMargin "> Net Metering Conn. </h2>
                        </td>
                        <td colspan="2">
                            <h2 class="zeroMargin" hidden> Life Line Consumer</h2>
                        </td>
                    </tr>
                    <tr>
                        <td colspan="2">
                            <p style="margin: 0; text-align: left; padding-left: 5px">
                                <span>NAME & ADDRESS</span>
                                <br />
                                <span>MUHAMMAD AMIN                 </span>
                                <br />
                                <span>S/O MUHAMMAD KHAN             </span>
                                <br />
                                <span>H-NO.38 MARGALLA ROAD         </span>
                                <br />
                                <span>F-6/3 ISLAMABAD3              </span>
                                <br />
                                <span></span>
                                
                                
                            </p>
                        </td>
                        <td colspan="3" style="text-align: left">
                            <h2 class="color-red">Say No To Corruption</h2>

'''
soup = BeautifulSoup(html, 'html.parser')
spans = soup.select_one('table.nested4').select('span')
for span in spans:
    print(span.text)

This returns:

NAME & ADDRESS
MUHAMMAD AMIN                 
S/O MUHAMMAD KHAN             
H-NO.38 MARGALLA ROAD         
F-6/3 ISLAMABAD3  

 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading