Skip to content Skip to sidebar Skip to footer

Parsing Tables With Img Tags In Python With BeautifulSoup

I am using BeautifulSoup to parse an html page. I need to work on the first table in the page. That table contains a few rows. Each row then contains some 'td' tags and one of the

Solution 1:

You have a nested table, so you need to check where you are in the tree, prior to parsing tr/td/img tags.

from bs4 import BeautifulSoup
f = open('test.html', 'rb')
html = f.read()
f.close()
soup = BeautifulSoup(html)

tables = soup.find_all('table')

for table in tables:
     if table.find_parent("table") is not None:
         for tr in table.find_all('tr'):
                 for td in table.find_all('td'):
                         for img in td.find_all('img'):
                                 print img['id']
                                 print img['src']
                                 print img['title']
                                 print img['alt']

It returns the following based on your example:

img_id
img_src
img_title
img_alt

Post a Comment for "Parsing Tables With Img Tags In Python With BeautifulSoup"