Skip to content Skip to sidebar Skip to footer

Beautifulsoup "scraping" Using Their Name And Their Id

I'm using beautifulsoup but I'm unsure how to correctly make use of find, findall and the other functions... If I have:
Using: soup.find_all('d

Solution 1:

Take a look at the following code:

from bs4 import BeautifulSoup

html = """
<h3 id="me"></h3>
<li id="test1"></li>
<li custom="test2321"></li>
<li id="test1" class="tester"></li>
<ul class="here"></ul>
"""

soup = BeautifulSoup(html)

# This tells BS to look at all the h3 tags, and find the ones that have an ID of me# This however should not be done because IDs are supposed to be unique, so# soup.find_all(id="me") should be used
one = soup.find_all("h3", {"id": "me"})
print one

# Same as above, if something has an ID, just use the ID
two = soup.find_all("li", {"id": "test1"})  # ids should be uniqueprint two

# Tells BS to look at all the li tags and find the node with a custom attribute
three = soup.find_all("li", {"custom": "test2321"})
print three

# Again ID, should have been enough
four = soup.find_all("li", {"id": "test1", "class": "tester"})
print four

# Look at ul tags, and find the one with a class attribute of "here"
four = soup.find_all("ul", {"class": "here"})
print four

Output:

[<h3id="me"></h3>]
[<liid="test1"></li>, <liclass="tester"id="test1"></li>]
[<licustom="test2321"></li>]
[<liclass="tester"id="test1"></li>]
[<ulclass="here"></ul>]

This should provide the required documentation.

Solution 2:

From help:

In [30]: soup.find_all?
Type:       instancemethod
String Form:
<bound method BeautifulSoup.find_all 
File:       /usr/lib/python2.7/site-packages/bs4/element.py
Definition: soup.find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)
Docstring:
Extracts a list of Tag objects that match the given
criteria.  You can specify the name of the Tag and any
attributes you want the Tag to have.

The value of a key-value pair in the 'attrs' map can be a
string, a list of strings, a regular expression object, or a
callable that takes a stringand returns whether or not the
string matches for some custom definition of 'matches'. The
same is true of the tag name.

So, you could pass attributes as dictionary, or just as named argument:

In [31]: soup.find_all("li", custom="test2321")
Out[31]: [<li custom="test2321"></li>]

In [32]: soup.find_all("li", {"id": "test1", "class": ""})
Out[32]: [<li id="test1"></li>]

Post a Comment for "Beautifulsoup "scraping" Using Their Name And Their Id"